Difference between revisions of "MUMmer"

From arccwiki
Jump to: navigation, search
(Created page with " * Homepage: [https://mummer4.github.io/ MUMmer]: Version 4.0.0 beta2: A system for rapidly aligning large DNA sequences to one another <br/> ''MUMmer is very fast and easy to...")
 
 
Line 1: Line 1:
  
* Homepage: [https://mummer4.github.io/ MUMmer]: Version 4.0.0 beta2: A system for rapidly aligning large DNA sequences to one another
+
Homepage: [https://mummer4.github.io/ MUMmer]: Version 4.0.0 beta2: A system for rapidly aligning large DNA sequences to one another<br/>
<br/>
+
''MUMmer is very fast and easy to run. The current version, release 4.x, can find all 20-bp maximal exact matches between two bacterial genomes in just a few seconds on a typical desktop or laptop computer. MUMmer handles the 100s or 1000s of contigs from a draft genome with ease, and will align them to another set of contigs using the nucmer utility included with the system. The promer utility takes this a step further by generating alignments based upon the six-frame translations of both input sequences.''
''MUMmer is very fast and easy to run. The current version, release 4.x, can find all 20-bp maximal exact matches between two bacterial genomes in just a few seconds on a typical desktop or laptop computer.''
+
<br/>
+
''MUMmer handles the 100s or 1000s of contigs from a draft genome with ease, and will align them to another set of contigs using the nucmer utility included with the system. The promer utility takes this a step further by generating alignments based upon the six-frame translations of both input sequences.''
+
 
<br/>
 
<br/>
 
[https://mummer4.github.io/manual/manual.html Manual]
 
[https://mummer4.github.io/manual/manual.html Manual]
Line 66: Line 63:
 
     --full-help                          Detailed help
 
     --full-help                          Detailed help
 
  -V, --version                            Version
 
  -V, --version                            Version
 +
  
 
[]$ promer --help
 
[]$ promer --help
Line 121: Line 119:
 
     -x|matrix      Set the alignment matrix number to 1 [BLOSUM 45], 2 [BLOSUM
 
     -x|matrix      Set the alignment matrix number to 1 [BLOSUM 45], 2 [BLOSUM
 
                     62] or 3 [BLOSUM 80] (default 2)
 
                     62] or 3 [BLOSUM 80] (default 2)
 +
 +
 +
[]$ annotate --help
 +
Usage: annotate <gapfile> <datafile>
 +
 +
 +
[]$ combineMUMs --help
 +
combineMUMs: invalid option -- '-'
 +
Unrecognized option --
 +
USAGE:  combineMUMs <RefSequence> <MatchSequences> <GapsFile>
 +
 +
Combines MUMs in <GapsFile> by extending matches off
 +
ends and between MUMs.  <RefSequence> is a fasta file
 +
of the reference sequence.  <MatchSequences> is a
 +
multi-fasta file of the sequences matched against the
 +
reference
 +
 +
Options:
 +
-D      Only output to stdout the difference positions
 +
          and characters
 +
-n      Allow matches only between nucleotides, i.e., ACGTs
 +
-N num  Break matches at <num> or more consecutive non-ACGTs
 +
-q tag  Used to label query match
 +
-r tag  Used to label reference match
 +
-S      Output all differences in strings
 +
-t      Label query matches with query fasta header
 +
-v num  Set verbose level for extra output
 +
-W file Reset the default output filename witherrors.gaps
 +
-x      Don't output .cover files
 +
-e      Set error-rate cutoff to e (e.g. 0.02 is two percent)
 
</pre>
 
</pre>
 
<br/>
 
<br/>

Latest revision as of 17:18, 11 October 2019

Homepage: MUMmer: Version 4.0.0 beta2: A system for rapidly aligning large DNA sequences to one another
MUMmer is very fast and easy to run. The current version, release 4.x, can find all 20-bp maximal exact matches between two bacterial genomes in just a few seconds on a typical desktop or laptop computer. MUMmer handles the 100s or 1000s of contigs from a draft genome with ease, and will align them to another set of contigs using the nucmer utility included with the system. The promer utility takes this a step further by generating alignments based upon the six-frame translations of both input sequences.
Manual

Module: Example

[]$ module spider mummer
------------------------
  mummer: mummer/4.0
------------------------
    This module can be loaded directly: module load mummer/4.0
module load mummer/4.0

Using:

Based on simple checks as defined in the Manual.

[]$ module load mummer/4.0

[]$ nucmer --help
Usage: nucmer [options] ref:path qry:path+

nucmer generates nucleotide alignments between two mutli-FASTA input
files. The out.delta output file lists the distance between insertions
and deletions that produce maximal scoring alignments between each
sequence. The show-* utilities know how to read this format.

By default, nucmer uses anchor matches that are unique in in the
reference but not necessarily unique in the query. See --mum and
--maxmatch for different bevahiors.

Options (default value in (), *required):
     --mum                                Use anchor matches that are unique in both the reference and query (false)
     --maxmatch                           Use all anchor matches regardless of their uniqueness (false)
 -b, --breaklen=uint32                    Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (200)
 -c, --mincluster=uint32                  Sets the minimum length of a cluster of matches (65)
 -D, --diagdiff=uint32                    Set the maximum diagonal difference between two adjacent anchors in a cluster (5)
 -d, --diagfactor=double                  Set the maximum diagonal difference between two adjacent anchors in a cluster as a differential fraction of the gap length (0.12)
     --noextend                           Do not perform cluster extension step (false)
 -f, --forward                            Use only the forward strand of the Query sequences (false)
 -g, --maxgap=uint32                      Set the maximum gap between two adjacent matches in a cluster (90)
 -l, --minmatch=uint32                    Set the minimum length of a single exact match (20)
 -L, --minalign=uint32                    Minimum length of an alignment, after clustering and extension (0)
     --nooptimize                         No alignment score optimization, i.e. if an alignment extension reaches the end of a sequence, it will not backtrack to optimize the alignment score and instead terminate the alignment at the end of the sequence (false)
 -r, --reverse                            Use only the reverse complement of the Query sequences (false)
     --nosimplify                         Don't simplify alignments by removing shadowed clusters. Use this option when aligning a sequence to itself to look for repeats (false)
 -p, --prefix=PREFIX                      Write output to PREFIX.delta (out)
     --delta=PATH                         Output delta file to PATH (instead of PREFIX.delta)
     --sam-short=PATH                     Output SAM file to PATH, short format
     --sam-long=PATH                      Output SAM file to PATH, long format
     --save=PREFIX                        Save suffix array to files starting with PREFIX
     --load=PREFIX                        Load suffix array from file starting with PREFIX
     --batch=BASES                        Proceed by batch of chunks of BASES from the reference
 -t, --threads=NUM                        Use NUM threads (# of cores)
 -U, --usage                              Usage
 -h, --help                               This message
     --full-help                          Detailed help
 -V, --version                            Version


[]$ promer --help
  USAGE: promer  [options]  <Reference>  <Query>

  DESCRIPTION:
    promer generates amino acid alignments between two mutli-FASTA DNA input
    files. The out.delta output file lists the distance between insertions
    and deletions that produce maximal scoring alignments between each
    sequence. The show-* utilities know how to read this format. The DNA
    input is translated into all 6 reading frames in order to generate the
    output, but the output coordinates reference the original DNA input.

  MANDATORY:
    Reference       Set the input reference multi-FASTA DNA file
    Query           Set the input query multi-FASTA DNA file

  OPTIONS:
    --mum           Use anchor matches that are unique in both the reference
                    and query
    --mumcand       Same as --mumreference
    --mumreference  Use anchor matches that are unique in in the reference
                    but not necessarily unique in the query (default behavior)
    --maxmatch      Use all anchor matches regardless of their uniqueness

    -b|breaklen     Set the distance an alignment extension will attempt to
                    extend poor scoring regions before giving up, measured in
                    amino acids (default 60)
    -c|mincluster   Sets the minimum length of a cluster of matches, measured in
                    amino acids (default 20)
    --[no]delta     Toggle the creation of the delta file (default --delta)
    --depend        Print the dependency information and exit
    -d|diagfactor   Set the clustering diagonal difference separation factor
                    (default .11)
    --[no]extend    Toggle the cluster extension step (default --extend)
    -g|maxgap       Set the maximum gap between two adjacent matches in a
                    cluster, measured in amino acids (default 30)
    -h
    --help          Display help information and exit.
    -l|minmatch     Set the minimum length of a single match, measured in amino
                    acids (default 6)
    -m|masklen      Set the maximum bookend masking lenth, measured in amino
                    acids (default 8)
    -o
    --coords        Automatically generate the original PROmer1.1 ".coords"
                    output file using the "show-coords" program
    --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                    extension reaches the end of a sequence, it will backtrack
                    to optimize the alignment score instead of terminating the
                    alignment at the end of the sequence (default --optimize)

    -p|prefix       Set the prefix of the output files (default "out")
    -V
    --version       Display the version information and exit
    -x|matrix       Set the alignment matrix number to 1 [BLOSUM 45], 2 [BLOSUM
                    62] or 3 [BLOSUM 80] (default 2)


[]$ annotate --help
Usage: annotate <gapfile> <datafile> 


[]$ combineMUMs --help
combineMUMs: invalid option -- '-'
Unrecognized option --
USAGE:  combineMUMs <RefSequence> <MatchSequences> <GapsFile>

Combines MUMs in <GapsFile> by extending matches off
ends and between MUMs.  <RefSequence> is a fasta file
of the reference sequence.  <MatchSequences> is a
multi-fasta file of the sequences matched against the
reference

Options:
-D      Only output to stdout the difference positions
          and characters
-n      Allow matches only between nucleotides, i.e., ACGTs
-N num  Break matches at <num> or more consecutive non-ACGTs 
-q tag  Used to label query match
-r tag  Used to label reference match
-S      Output all differences in strings
-t      Label query matches with query fasta header
-v num  Set verbose level for extra output
-W file Reset the default output filename witherrors.gaps
-x      Don't output .cover files
-e      Set error-rate cutoff to e (e.g. 0.02 is two percent)


  • This software is dependent on the following modules:
    • swset/2018.05
    • gcc/7.3.0
    • gnuplot/5.2.2-py27
    • The module load mummer/4.0 line will automatically load these modules for you.


Back to HPC Installed Software