Difference between revisions of "Kraken"

From arccwiki
Jump to: navigation, search
(Created page with " [https://ccb.jhu.edu/software/kraken2/index.shtml?t=manual KRAKEN2]: Kraken taxonomic sequence classification system <br/> ''Kraken 2 is the newest version of Kraken, a taxo...")
(No difference)

Revision as of 19:44, 10 October 2019


KRAKEN2: Kraken taxonomic sequence classification system
Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm.
Manual

Module: Example

[]$ module spider kraken
-------------------------
  kraken:
-------------------------
     Versions:
        kraken/1.0-py27
        kraken/2.0
module load kraken/2.0

Using:

Note: Under the System Requirements within the Dependencies section, it talks about Multithreading is handled using OpenMP. ... Unlike Kraken 1, Kraken 2 does not use an external k-mer counter. However, by default, Kraken 2 will attempt to use the dustmasker or segmasker programs provided as part of NCBI's BLAST suite to mask low-complexity regions (see Masking of Low-complexity Sequences).
Example based on Standard Kraken 2 Database. With respect to the above, you'll notice in the example below that it also uses the gpu-blast/1.1

[]$ salloc -A <enter-your-project> --time=6:00:00 -N 1 --cpus-per-task=32 --mem=0
[]$ module load kraken/2.0
[]$ module load gpu-blast/1.1
[]$ srun kraken2-build --standard --threads 32 --db KDB

Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 341 projects (530 sequences, 872.20 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 17072 projects (36839 sequences, 68.61 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library..

Notes:

  • The threads option value must match the cpus-per-task value.
  • If you do not load the gpu-blast/1.1 module, you will see the error below:
Downloading taxonomy tree data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 341 projects (530 sequences, 872.20 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library...which: no dustmasker in (/pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/apps/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/apps/u/gcc/4.8.5/gcc/7.3.0-xegsmw4/bin:/apps/s/arcc/0.1/bin:/apps/s/slurm/18.08/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/apps/u/opt/singularity/2.5.2/bin:/home/salexan5/.local/bin:/home/salexan5/bin)
Unable to find dustmasker in path, can't mask low-complexity sequences


  • This software is dependent on the following modules:
    • swset/2018.05
    • gcc/7.3.0
    • The module load kraken/2.0 line will automatically load these modules for you.


Back to HPC Installed Software