QIIME Quantitative Insights Into Microbial Ecology
- Automatically track your analyses with decentralized data provenance — no more guesswork on what commands were run!
- Interactively explore your data with beautiful visualizations that provide new perspectives.
- Easily share results with your team, even those members without QIIME 2 installed.
- Plugin-based system — your favorite microbiome methods all in one place.
- ARCC currently does not monitor current/latest versions of software. If you require an update to a version please remember to put a request in.
- Qiime2 does have a collection of plugins. The ones available can be seen by typing
qiimefrom the command line once the module has been loaded, as shown below. If you require a plugin not listed please put a request into ARCC and we can explore how best to make this available.
- We are still learning to what extent qiime2 is parallelized. At this moment we believe that it can only run on a single node. Some plugins can make use of multiple cores on that node, which can be found by reading the documentation relating to that plugin. Since there is no consistent syntax across the plugins on how to make use of this, if you can not work it out yourself please contact ARCC and we'll be happy to help.
- If the usage of the software increases and the demand warrants the managing of central reference database then ARCC is happy to discuss and explore.
- The 2017.10 and 2019.1 versions of qiime2 installed on Teton has been built using the Singularity container and NOT Conda. The current implication is that any online articles that talk about using Conda will NOT work.
- The 2019.10 version of qiime2 has been installed inline with the typical Conda installation process.
- If and when things change, we will update this page.
$ module spider qiime2 ------------------------------------------ qiime2: ------------------------------------------ Versions: qiime2/2017.10 qiime2/2019.1 qiime2/2019.10 ------------------------------------------ For detailed information about a specific "qiime2" module (including how to load the modules) use the module's full name. For example: $ module spider qiime2/2019.10 ------------------------------------------
$ module load qiime2/2019.10 $ qiime Usage: qiime [OPTIONS] COMMAND [ARGS]... QIIME 2 command-line interface (q2cli) -------------------------------------- To get help with QIIME 2, visit https://qiime2.org. To enable tab completion in Bash, run the following command or add it to your .bashrc/.bash_profile: source tab-qiime To enable tab completion in ZSH, run the following commands or add them to your .zshrc: autoload bashcompinit && bashcompinit && source tab-qiime Options: --version Show the version and exit. --help Show this message and exit. Commands: info Display information about current deployment. tools Tools for working with QIIME 2 files. dev Utilities for developers and advanced users. alignment Plugin for generating and manipulating alignments. composition Plugin for compositional data analysis. cutadapt Plugin for removing adapter sequences, primers, and other unwanted sequence from sequence data. dada2 Plugin for sequence quality control with DADA2. deblur Plugin for sequence quality control with Deblur. demux Plugin for demultiplexing & viewing sequence quality. diversity Plugin for exploring community diversity. emperor Plugin for ordination plotting with Emperor. feature-classifier Plugin for taxonomic classification. feature-table Plugin for working with sample by feature tables. fragment-insertion Plugin for extending phylogenies. gneiss Plugin for building compositional models. longitudinal Plugin for paired sample and time series analyses. metadata Plugin for working with Metadata. phylogeny Plugin for generating and manipulating phylogenies. quality-control Plugin for quality control of feature and sequence data. quality-filter Plugin for PHRED-based filtering and trimming. sample-classifier Plugin for machine learning prediction of sample metadata. taxa Plugin for working with feature taxonomy annotations. vsearch Plugin for clustering and dereplicating with vsearch.
Batch / Interactive Session Example:
After logging onto teton either:
1) Create an interactive session: In the example below change arcc to your project name, and modify the time you think you need, the example below is set for 60 minutes.
[...@tlog1 qiime2]$ salloc --account=arcc --time=60:00 salloc: Granted job allocation 3489587 [...@m067 qiime2]$ [...@m067 qiime2]$ module load qiime2/2019.1 [...@m067 qiime2]$ qiime feature-table filter-samples \ --i-table data/R1-5_table_forwards.qza \ --m-metadata-file data/metadata_R1-5.txt \ --p-where "GroupStatus='JLinfected' OR GroupStatus='JLcontrol'" \ --o-filtered-table data/output_results.qza
2) Submit a job: Below is an example of a batch file:
#!/bin/bash #SBATCH --account=arcc #SBATCH --time=00:30:00 #SBATCH --nodes=1 #SBATCH --mem=0 #SBATCH --output=qiime_%A.out #SBATCH --chdir=/project/arcc/salexan5/qiime2 module load qiime2/2019.10 srun qiime feature-table filter-samples \ --i-table data/R1-5_table_forwards.qza \ --m-metadata-file data/metadata_R1-5.txt \ --p-where "GroupStatus='JLinfected' OR GroupStatus='JLcontrol'" \ --o-filtered-table data/output_results.qza wait
Out of the box, qiime2 does not automatically run in parallel, but some of the plugins/commands can be configured to use multiple cores.
One example is classify-sklearn which is a pre-fitted sklearn-based taxonomy classifier. This command has the
--p-n-jobs option that allows multiple cores to be used. An example skeleton batch is demonstrated below (remember to add account/time and other SBATCH parameters):
#SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=32 #SBATCH --mem=0 #SBATCH --partition=teton-hugemem module load qiime2/2019.10 srun qiime feature-classifier classify-sklearn --i-classifier input_file.qza \ --i-reads rep-seqs-single.qza \ --o-classification output_file.qza \ --p-n-jobs -1
- There are no hard and fast rules on how to configure your batch files as in most cases it will depend on the size of your data and extent of analysis.
- You will need to read and understand how to use the plugin/command as they can vary.
- Memory is still probably going to be a major factor in how many
cpus-per-tasks you choose.
- In the example above we were only able to use 32 cores because we ran the job on one of the
teton-hugemempartition nodes. Using a standard teton node we were only able to use 2 cores. The latter still gave us an improvement of running for 9 hours and 45 minutes, compared to 17 hours with only a single core. But, using 32 cores on a hugemem node, the job ran in 30 minutes!
- Remember, hugemem nodes can be popular, so you might actually end up queuing for days to ran a job in half an hour when you could have jumped on a teton node immediately and already have the longer running job finished.
- Depending on the size of data/analysis you might be able to use more cores on a teton node.
- You will need to perform/track analysis to understand what works for your data/analysis. Do not just use a hugemem node!
- If you have any questions, or need assistance, don't hesitate to contact arcc, we're happy to help with this type of analysis.
Back to HPC Installed Software