Kraken

From arccwiki
Jump to: navigation, search

KRAKEN2: Version 2.0.8: Kraken taxonomic sequence classification system
Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm.
Manual

Module: Example

[]$ module spider kraken
-------------------------
  kraken:
-------------------------
     Versions:
        kraken/1.0-py27
        kraken/2.0
module load kraken/2.0

Using:

Note: Under the System Requirements within the Dependencies section, it talks about Multithreading is handled using OpenMP. ... Unlike Kraken 1, Kraken 2 does not use an external k-mer counter. However, by default, Kraken 2 will attempt to use the dustmasker or segmasker programs provided as part of NCBI's BLAST suite to mask low-complexity regions (see Masking of Low-complexity Sequences).

Example 1:

Example based on Standard Kraken 2 Database. With respect to the above, you'll notice in the example below that it also uses the gpu-blast/1.1

[]$ salloc -A <enter-your-project> --time=6:00:00 -N 1 --cpus-per-task=32 --mem=0
[]$ module load kraken/2.0
[]$ module load gpu-blast/1.1
[]$ srun kraken2-build --standard --threads 32 --db KDB

Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 341 projects (530 sequences, 872.20 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 17072 projects (36839 sequences, 68.61 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 9331 projects (11953 sequences, 314.52 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
mv: try to overwrite ‘assembly_summary.txt’, overriding mode 0444 (r--r--r--)? y
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 1 project (639 sequences, 3.27 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Downloading UniVec_Core data from server... done.
Adding taxonomy ID of 28384 to all sequences... done.
Masking low-complexity regions of downloaded library... done.
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map complete. [0.049s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 42273822720 bytes
Capacity estimation complete. [10m27.202s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 15 bits reserved for taxid.
Completed processing of 53095 sequences, 73070206125 bp
Writing data to disk...  complete.
Database files completed. [1h7m47.355s]
Database construction complete. [Total: 1h18m8.140s]

Notes:

  • The threads option value must match the cpus-per-task value.
  • If you do not load the gpu-blast/1.1 module, you will see the error below:
Downloading taxonomy tree data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 341 projects (530 sequences, 872.20 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library...which: no dustmasker in (/pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/pfs/tsfs1/apps/el7-x86_64/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/apps/u/gcc/7.3.0/kraken/2.0/kraken2/bin:/apps/u/gcc/4.8.5/gcc/7.3.0-xegsmw4/bin:/apps/s/arcc/0.1/bin:/apps/s/slurm/18.08/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/apps/u/opt/singularity/2.5.2/bin:/home/salexan5/.local/bin:/home/salexan5/bin)
Unable to find dustmasker in path, can't mask low-complexity sequences


  • This software is dependent on the following modules:
    • swset/2018.05
    • gcc/7.3.0
    • The module load kraken/2.0 line will automatically load these modules for you.


Back to HPC Installed Software