Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2

J Lu, SL Salzberg - Microbiome, 2020 - Springer
Microbiome, 2020Springer
Background For decades, 16S ribosomal RNA sequencing has been the primary means for
identifying the bacterial species present in a sample with unknown composition. One of the
most widely used tools for this purpose today is the QIIME (Quantitative Insights Into
Microbial Ecology) package. Recent results have shown that the newest release, QIIME 2,
has higher accuracy than QIIME, MAPseq, and mothur when classifying bacterial genera
from simulated human gut, ocean, and soil metagenomes, although QIIME 2 also proved to …
Background
For decades, 16S ribosomal RNA sequencing has been the primary means for identifying the bacterial species present in a sample with unknown composition. One of the most widely used tools for this purpose today is the QIIME (Quantitative Insights Into Microbial Ecology) package. Recent results have shown that the newest release, QIIME 2, has higher accuracy than QIIME, MAPseq, and mothur when classifying bacterial genera from simulated human gut, ocean, and soil metagenomes, although QIIME 2 also proved to be the most computationally expensive. Kraken, first released in 2014, has been shown to provide exceptionally fast and accurate classification for shotgun metagenomics sequencing projects. Bracken, released in 2016, then provided users with the ability to accurately estimate species or genus relative abundances using Kraken classification results. Kraken 2, which matches the accuracy and speed of Kraken 1, now supports 16S rRNA databases, allowing for direct comparisons to QIIME and similar systems.
Methods
For a comprehensive assessment of each tool, we compare the computational resources and speed of QIIME 2’s q2-feature-classifier, Kraken 2, and Bracken in generating the three main 16S rRNA databases: Greengenes, SILVA, and RDP. For an evaluation of accuracy, we evaluated each tool using the same simulated 16S rRNA reads from human gut, ocean, and soil metagenomes that were previously used to compare QIIME, MAPseq, mothur, and QIIME 2. We evaluated accuracy based on the accuracy of the final genera read counts assigned by each tool. Finally, as Kraken 2 is the only tool providing per-read taxonomic assignments, we evaluate the sensitivity and precision of Kraken 2’s per-read classifications.
Results
For both the Greengenes and SILVA database, Kraken 2 and Bracken are up to 100 times faster at database generation. For classification, using the same data as previous studies, Kraken 2 and Bracken are up to 300 times faster, use 100x less RAM, and generate results that more accurate at 16S rRNA profiling than QIIME 2’s q2-feature-classifier.
Conclusion
Kraken 2 and Bracken provide a very fast, efficient, and accurate solution for 16S rRNA metataxonomic data analysis.
Video Abstract
Springer