
Because of its breadth MUMmer can, at first glance, be an overwhelming sea of scripts and subroutines. This document attempts to walk the user through some of the more useful modules of the package, and provides example data and expected outputs to assure the correct and productive operation of MUMmer. All example data is real DNA sequence from various eukaryotic and prokaryotic organisms, and can be found in its entirety in the data directory. Although the input sequences are only subsections of their respective genomes, they have been carefully selected to permit speedy and informative walk-throughs. It is not necessary to download all of the data at once, as each subsection will have separate links to the relevant files.
For further information regarding any of the MUMmer programs or their output formats, please refer to the online MUMmer manual.
MapView is a utility script for displaying sequence alignments as provided 
  by NUCmer or PROmer. It takes the output from show-coords and converts 
  it to a FIG, PDF or PS image file. By default, it produces FIG files which can 
  be viewed with the common system utility xfig or converted to PDF 
  or PS with the fig2dev utility (neither programs are included with 
  MUMmer). mapview is useful for mapping multiple query contigs (e.g. 
  from a draft sequencing project) against an annotated reference sequence. Exons 
  and other features can also be plotted with the NUCmer or PROmer alignments, 
  aiding in exon refinement and analysis. Individual MUMmer hits are plotted according 
  to their percent identity, making regions of high or low similarity easily distinguishable.
In the following sections, a short example is given that demonstrates how to 
  use mapview. Since nucmer and promer 
  have a near identical user interface, the alignments for this example will be 
  generated using promer. This example aligns a few query sequences 
  to a single reference sequence using promer, and then uses mapview 
  to plot the resulting areas of conservation and the reference sequence annotation.
D_melanogaster_2Rslice.cdsD_melanogaster_2Rslice.fastaD_melanogaster_2Rslice.utrD_pseudoobscura_contigs.fastaPlease complete the PROmer walk-though in order to generate 
  the alignment between the Drosophila melanogaster chromosome 2R segment 
  and the 2 contigs from Drosophila pseudoobscura. The PROmer walk-through 
  will generate the .coords file that is necessary to continue with 
  the rest of this tutorial. If already familiar with the promer 
  alignment script, simply continue this tutorial using the supplied promer.coords 
  file. Note that when generating the .coords file with show-coords 
  it is important to use the -l -r options (and optionally the -k 
  option) in order to generate the proper input format for mapview.
The output of show-coords is then used by MapView to create a 
  FIG, PDF or PS file.
mapview -n 1 -p mapview promer.coords 
The -n option is used to set the number of output files to 1. 
  By default, MapView partitions its output among 10 files in order to keep the 
  figures for large comparisons small. Since we are only comparing a small slice 
  of the actual chromosome, only 1 file will be needed. The output of this command 
  will be a single file named mapview_0.fig. A more informative plot 
  can be generated by supplying a UTR and CDS coordinate file in GFF 
  format. These files contain annotation information that will be plotted 
  along side the PROmer alignments, thus making it possible to compare the conserved 
  regions with annotated exon positions.
mapview -n 1 -p mapview promer.coords D_melanogaster_2Rslice.utr D_melanogaster_2Rslice.cds
This will generate a single file, mapview_0.fig, that will have 
  the annotation information displayed above the blue reference rectangle. Below, 
  you can see this file displayed with the xfig viewer. The only difference between 
  this file and the file produced without the UTR and CDS files are the annotation 
  rectangles above the blue rectangle at the very top of the figure. 
 
 
In order to generate a PDF format, use the same command plus the -f pdf 
  option.
mapview -n 1 -f pdf -p mapview promer.coords D_melanogaster_2Rslice.utr 
  D_melanogaster_2Rslice.cds 
This will generate the same image, mapview_0.pdf, but in PDF format.
 
 
The above MapView FIG shows a 220 kbp slice of D. melanogaster chromosome 2L and its alignment to D. pseudoobscura. The alignment, generated by PROmer, shows all regions of conserved amino acid sequence. The blue rectangle spanning the figure represents the reference (D. melanogaster), with annotated genes shown above it and the PROmer alignments shown below it. Alternative splice variants of the same gene are stacked vertically. Exons are shown as boxes, with intervening introns connecting them. The 5' and 3' UTRs are colored pink and blue to indicate the gene's direction of translation. PROmer matches are shown twice, once just below the reference genome, where all matches are collapsed into red boxes, and in a larger display showing the separate matches within each contig, where the contigs are colored differently to indicate contig boundaries. The vertical position of the matches indicates their percent identity, ranging from 50% at the bottom of the display to 100% just below the red rectangles. Percent identity is of the amino acid translations used by PROmer. Matches from the same query sequence are connected by lines of the same color.
mummer is a suffix tree algorithm designed to find maximal exact 
  matches of some minimum length between two input sequences. The match lists 
  produced by mummer can be used alone to generate alignment dot 
  plots, or can be passed on to the clustering algorithms for the identification 
  of longer non-exact regions of conservation. These match lists have great versatility 
  because they contain huge amounts of information and can be passed forward to 
  other interpretation programs for clustering, analysis, searching, etc.
In the following sections, a short example is given that demonstrates how to 
  use mummer. This example compares a single query sequence to a 
  single reference sequence using mummer, and then uses mummerplot 
  to generate a dot plot representation of the comparison.
mummer can handle multiple reference and multiple query sequences, 
  however a dotplot of more that two sequences can be confusing, so for the case 
  of this example we will be dealing with a single reference and a single query 
  sequence.
mummer -mum -b -c H_pylori26695_Eslice.fasta H_pyloriJ99_Eslice.fasta 
  > mummer.mums
This command will find all maximal unique matches (-mum) between 
  the reference and query on both the forward and reverse strands (-b) 
  and report all the match positions relative to the forward strand (-c). 
  Output is to stdout, so we will redirect it into a file named mummer.mums. 
  This file lists all of the MUMs of the default length or greater between the 
  two input sequences.
A dotplot of all the MUMs between two sequences can reveal their macroscopic similarity.
mummerplot -x "[0,275287]" -y "[0,265111]" -postscript 
  -p mummer mummer.mums
This command will plot all of the MUMs in the mummer.mums file 
  in postscript format (-postscript) between the given ranges for 
  the X and Y axes. When plotting mummer output, it is necessary 
  to use the lengths of the input sequences to set the plot ranges, otherwise 
  the plot will be automatically scaled around the minimum and maximum data points. 
  The four output files are prefixed by the string specified with the -p 
  option. The plot files contains the data points, mummer.gp 
  is a gnuplot script for plotting the data points in the plot files, 
  and mummer.ps is the postscript plot generated by the gnuplot script. 
  Below, you can see the mummer.ps file displayed with ghostview. 
  Note that with newer versions of mummerplot the color and thickness 
  of the plot lines may be different.
 
 Most image manipulation programs can edit the postscript output, or it can 
  be sent directly to a printer with the lpr command. If you would 
  rather use the default terminal for gnuplot, simply remove the -postscript 
  option from the mummerplot call.
 
The above postscript plot represents the set of all MUMs between the two input sequences used in this example. Forward MUMs are plotted as red lines/dots while reverse MUMs are plotted as green lines/dots (blue may be used for reverse matches in newer versions). A line of dots with slope == 1 represents an undisturbed segment of conservation between the two sequences, while a line of slope == -1 represents an inverted segment of conservation between the two sequences. The green segment in the upper left quadrant of the graph shows both an inversion and translocation, as it is of negative slope and inconsistently located relative to the rest of the plot which falls on a line approximated by f(x) = x. However the green segment in the upper right quadrant of the graph shows only an inversion, as it is of negative slope but is consistent in location with the rest of the plot. Generally, the closer a plot is to an imaginary line f(x) = x (or -x) the fewer macroscopic differences exist between the two sequences.
nucmer is the MUMmer's most user-friendly alignment script for 
  standard DNA sequence alignment. It is a robust pipeline that allows for multiple 
  reference and multiple query sequences to be aligned in a many vs. many fashion. 
  For instance, a very common use for nucmer is to determine the 
  position and orientation of a set of sequence contigs in relation to a finished 
  sequence, however it can be just as effective in comparing two finished sequences 
  to one another.
In the following sections, a short example is given that demonstrates how to 
  use nucmer. This example aligns a set of draft sequence contigs 
  to a finished sequence using nucmer; displays the alignment coordinates 
  using show-coords; and tiles them across the reference using show-tiling.
Like mummer, nucmer can handle multiple reference 
  and query sequences, however it is most commonly used to map a set of query 
  sequences to a single reference sequence. This example will demonstrate that 
  functionality, as a number of B. anthracis draft contigs will be mapped 
  to the final assembly.
nucmer -maxmatch -c 100 -p nucmer B_anthracis_Mslice.fasta B_anthracis_contigs.fasta
To assure all contigs were mapped, all maximal matches were used as alignment 
  anchors (-maxmatch) and because of the sequence similarity the 
  minimum cluster size was bumped up to 100 (-c 100). The two output 
  files are prefixed by the string specified with the -p option. 
  nucmer.delta is an 
  encoded file that represents the alignment between the two inputs. At this stage, 
  the alignment of the two inputs is complete, however it is necessary to parse 
  the nucmer.delta file with the provided utilities in order to extract 
  useful information from the comparison.
To view a summary of all the alignments produced by NUCmer, we need to run 
  the nucmer.delta file through the show-coords utility.
show-coords -r -c -l nucmer.delta > nucmer.coords
This command will list the coordinates, percent identities and other useful 
  statistics of each alignment in a table. Each line of the table represents an 
  individual pairwise alignment, and each line is sorted by its starting reference 
  coordinate (-r). Additional information, like alignment coverage 
  (-c) and sequence length (-l) can be added to the 
  table with the appropriate options. Output is to stdout, so we 
  have redirected it into the file, nucmer.coords.
To view a summary of all the SNPs and indels between the two sequence sets, 
  we need to run the nucmer.delta file through the show-snps 
  utility.
show-snps -C nucmer.delta > nucmer.snps
This will generate a report of all the SNPs internal to the alignments contained 
  in the nucmer.delta file. Each line of the table represents a single 
  mismatch in the pairwise alignment. With the -C option, only SNPs 
  from uniquely aligned regions will be reported. Additional information can be 
  added or removed with the command line switches described in the manual. Output 
  is to stdout, so we have redirected it into the file, nucmer.snps.
To produce a minimal tiling of contigs across the reference sequence, we need 
  to run the nucmer.delta file through the show-tiling 
  utility.
show-tiling nucmer.delta > nucmer.tiling
This command will list the contigs and positions that generate the maximal alignment coverage across the reference sequence using the fewest contigs possible. This output can aid the closure of a draft genome when a closely related organism has already be finished.
nucmer and show-tiling output can both be viewed 
  with mummerplot, however these plots would offer little more information 
  in regards to this example. mapview can also be used to display 
  the output of show-coords, as is shown in the mapview 
  walkthrough.
promer is a close relative to the NUCmer script. It follows the 
  exact same steps as NUCmer and even uses most of the same programs in its pipeline, 
  with one exception - all matching and alignment routines are performed on the 
  six frame amino acid translation of the DNA input sequence. This provides promer 
  with a much higher sensitivity than nucmer because protein sequences 
  tends to diverge much slower than their underlying DNA sequence. Therefore, 
  on the same input sequences, promer may find many conserved regions 
  that nucmer will not, simply because the DNA sequence is not as 
  highly conserved as the amino acid translation.
In the following sections, a short example is given that demonstrates how to 
  use promer. This example aligns a few query sequences to single 
  reference sequence using promer; displays the alignment coordinates 
  using show-coords; and prints a pairwise alignment of one of the 
  contigs using show-aligns.
Like mummer, promer can handle multiple reference 
  and query sequences, however it is most commonly used to map a set of query 
  sequences to a single reference sequence. This example will demonstrate that 
  functionality, as two D. pseudoobscura draft contigs will be mapped 
  to the final D. melanogaster assembly.
promer -p promer D_melanogaster_2Rslice.fasta D_pseudoobscura_contigs.fasta
Default parameters were used to align the two inputs, however if the alignment 
  is too sensitive or not sensitive enough the minimum match length and cluster 
  sizes can be adjusted accordingly. The two output files are prefixed by the 
  string specified with the -p option. promer.delta is an encoded file that represents 
  the alignment between the two inputs. At this stage, the alignment of the two 
  inputs is complete, however it is necessary to parse the promer.delta 
  file with the provided utilities in order to extract useful information from 
  the comparison.
To view a summary of all the alignments produced by PROmer, we need to run 
  the promer.delta file through the show-coords utility.
show-coords -r -c -l -L 100 -I 50 promer.delta > promer.coords
This command will list the coordinates, percent identities and other useful 
  statistics of each alignment in a table. Each line of the table represents an 
  individual pairwise alignment, and each line is sorted by its starting reference 
  coordinate (-r). Additional information, like alignment coverage 
  (-c) and sequence length (-l) can be added to the 
  table with the appropriate options. And minimum length (-L) and 
  minimum percent identity (-I) cutoffs can be specified to reduce 
  poor alignments. Output is to stdout, so we have redirected it 
  into the file, promer.coords. If this file is planned for input 
  to mapview, it is important to always use the -r -c 
  -l options.
To view all the pairwise alignments between two of the input sequences, we 
  need to run the promer.delta file through the show-coords 
  utility. 
show-aligns promer.delta "D_melanogaster_2Rslice" "3214968" 
  > promer.aligns
This command will print all of the pairwise alignments stored in the promer.delta 
  file for the sequences "D_melanogaster_2Rslice" and "3214968". 
  Output is to stdout, so we have redirected it into the file, promer.aligns. 
  If the alignments do not fit within your screen width, or you would like them 
  to be printed on longer lines, the screen width can be adjusted with the -w 
  option. Since show-aligns only displays the alignments between 
  two sequences, it will have to be run separately for each desired pair of sequences.
promer and show-tiling output can both be viewed 
  with mummerplot, however these plots would offer little more information 
  in regards to this example. mapview can also be used to display 
  the output of show-coords, as is shown in the mapview 
  walkthrough which uses the promer.coords file generated in 
  this example to generate a plot of the alignment.
run-mummer1 is a legacy script from the original MUMmer1.0 release. 
  It has been updated to utilize the new suffix tree code of version 3.0, however 
  all other programs called from this script are identical to the original MUMmer 
  release back in 1999. Even though it is an outdated program, it still has some 
  advantages over the newer alignment scripts (nucmer, promer, 
  run-mummer3). Like all of the alignment scripts, run-mummer1 
  is a three step process - matching, clustering and extension. However, unlike 
  the newer alignment scripts, run-mummer1 uses the gaps 
  program for its clustering step. The gaps program does not allow 
  for rearrangements like mgaps, instead if finds the single longest 
  increasing subset of matches across the full length of both sequences. This 
  makes it well suited for SNP and small indel identification between small (< 
  10 Mbp), very similar sequences with few to no rearrangements.
In the following sections, a short example is given that demonstrates how to 
  use run-mummer1. This example aligns a single query sequence to 
  a single reference sequence using run-mummer1.
run-mummer1 is only suited for a single reference and query sequence 
  that have few to zero inversions or translocations. This example aligns two 
  such sequences.
run-mummer1 H_pylori26695_Bslice.fasta H_pyloriJ99_Bslice.fasta mummer1
To adjust the minimum match length for the comparison, the user must manually 
  edit the run-mummer1 script. Output files are prefixed by the string 
  specified at the end of the command line call. mummer1.align displays 
  the alignments of each gap between adjacent MUMs, mummer1.errorsgaps 
  lists each MUM and the number of errors between it and the previous MUM, mummer1.gaps 
  lists the ordered set of MUMs and the gap distance to the previous MUM, and 
  mummer1.out simply lists all of the MUMs greater than or equal 
  to the minimum match length.
There are no visualization tools designed for run-mummer1 output. 
  To view a MUM dotplot, run mummer by itself on two individual sequence 
  as demonstrated in the mummer walkthrough.
run-mummer3 is the simplest pipeline of the latest MUMmer3.0 programs. 
  It runs the same matching and clustering algorithm as nucmer and 
  promer, however it uses a different extension technique and does 
  not perform the important pre- and post-processing steps of NUC/PROmer. Because 
  of its simplistic form, run-mummer3 can only handle a single reference 
  sequence, but like run-mummer1 its error-focused output makes it 
  a handy tool for detecting SNPs and other small errors. The only major difference 
  between run-mummer3 and run-mummer1 is the new version's 
  ability to handle multiple query sequences and its tolerance of large rearrangements. 
  This makes run-mummer3 well suited for error detection between 
  highly similar sequences that may have large rearrangements, inversions etc.
In the following sections, a short example is given that demonstrates how to 
  use run-mummer3. This example aligns a single query sequence to 
  a single reference sequence using run-mummer3.
run-mummer3 can only handle a single reference sequence, but it 
  is capable of dealing with multiple query sequences. However, this example aligns 
  a single query sequence to a single reference sequence. Unlike run-mumer1, 
  run-mummer3 can handle inversions and translocations, but not with 
  the same grace as nucmer.
run-mummer3 H_pylori26695_Bslice.fasta H_pyloriJ99_Bslice.fasta mummer3
To adjust any of the alignment parameters, the user must manual edit the run-mummer3 
  scripts. Do not, however, add the -c option to the mummer 
  invocation, as it will confuse the next steps in the pipeline. It may be easier 
  to reverse complement the sequence yourself and run the script twice (once for 
  forward, second for reverse) with the -b option removed. Try adding 
  the -D option to the combineMUMs command line in the 
  script to output a format that is easier to parse for SNPs and small indels. 
  Output files are prefixed by the string specified at the end of the command 
  line call. mummer3.align displays the alignments of each gap between 
  adjacent MUMs, mummer3.errorsgaps lists each MUM and the number 
  of errors between it and the previous MUM, mummer3.gaps lists the 
  ordered set of MUMs and the gap distance to the previous MUM, and mummer3.out 
  simply lists all of the MUMs greater than or equal to the minimum match length.
The mummer3.out file is identical to the output of mummer 
  on a 1 vs many search, so it may be plotted as demonstrated in the mummer 
  walkthrough.
Please address questions and bug reports via Email to:
VERSION 3.17 - May 2005