About Kraken 2
Kraken 2 is the newest version of Kraken, a taxonomic classification system
using exact k-mer matches to achieve high accuracy and fast classification speeds.
This classifier matches each k-mer within a query sequence to the lowest
common ancestor (LCA) of all genomes containing the given k-mer.
The k-mer assignments inform the classification algorithm.
[see: Kraken 1's Webpage for more details].
Kraken 2 provides significant improvements to Kraken 1, with faster database build times, smaller database sizes, and faster classification speeds. These improvements were achieved by the following updates to the Kraken classification program:
- Storage of Minimizers: Instead of storing/querying entire k-mers, Kraken 2 stores minimizers (l-mers) of each k-mer. The length of each l-mer must be ≤ the k-mer length. Each k-mer is treated by Kraken 2 as if its LCA is the same as its minimizer's LCA.
- Introduction of Spaced Seeds: Kraken 2 also uses spaced seeds to store and query minimizers to improve classification accuracy.
- Database Structure: While Kraken 1 saved an indexed and sorted list of k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash table is a probabilistic data structure that allows for faster queries and lower memory requirements. However, this data structure does have a <1% chance of returning the incorrect LCA or returning an LCA for a non-inserted minimizer. Users can compensate for this possibility by using Kraken's confidence scoring thresholds.
- Protein Databases: Kraken 2 allows for databases built from amino acid sequences. When queried, Kraken 2 performs a six-frame translated search of the query sequences against the database.
- 16S Databases: Kraken 2 also provides support for databases not based on NCBI's taxonomy. Currently, these include the 16S databases: Greengenes, SILVA, and RDP.
News
Please Refer to the Kraken 2 Github Wiki for most recent news/updates
- 09/28/2022 - Kraken 2 protocol paper published in Nature Protocols
Metagenome analysis using the Kraken software suite - 11/28/2019 - Kraken 2 paper published in Genome Biology
Improved metagenomic analysis with Kraken 2 - 09/07/2019 - Kraken 2 preprint released in biorxiv
Improved metagenomic analysis with Kraken 2 - 04/25/2019 - v2.0.8-beta release
Fixes for NCBI file structure change, defaults for k-mer length, minimizer length changed to increase accuracy. - 04/23/2019 - Kraken 2 FTP Download Site Created
See Downloads Page for details. - 04/10/2019 - MiniKraken UPDATED Release for Kraken 2
See Downloads Page for more details. - 02/28/2019 - FTP added and Other Fixes
Support added for ftp downloads (--use-ftp option added for kraken2-build commands). See CHANGELOG.md for additional improvements. - 11/01/2018 - MiniKraken Released for Kraken 2
MiniKraken databases released to the public. See Downloads Page for more details. - 08/11/2018 - v2.0.7-beta release
Support for Minikraken building based on database size added. Other updates/fixes. See CHANGELOG.md for details - 06/26/2018 - v2.0.6-beta release
Initial public release of Kraken 2.
Downloads
As of September 2020, we have created a Amazon Web Services site to host many of the most widely-used Kraken2 indices, available at https://github.com/BenLangmead/aws-indexes.
Publications
The Kraken 2 paper has been published in Genome Biology as of November 28th, 2019: Improved metagenomic analysis with Kraken 2 (2019).
The original Kraken paper was published in Genome Biology in 2014: Kraken: ultrafast metagenomic sequence classification using exact alignments.
The Kraken 2 protocol paper has been published in Nature Protocols as of September 2022: Metagenome analysis using the Kraken software suite.
Kraken 2 and Other Tools
The following tools are compatible with both Kraken 1 and Kraken 2. The tools are designed to assist users in analyzing and visualizing Kraken results.
Bracken allows users to estimate relative abundances within a specific sample from Kraken 2 classification results. Bracken uses a Bayesian model to estimate abundance at any standard taxonomy level, including species/genus-level abundance.
Pavian has also been developed as a comprehensive visualization program that can compare Kraken 2 classifications across multiple samples.
KrakenTools is a suite of scripts to assist in the analysis of Kraken results. KrakenTools is an ongoing project led by Jennifer Lu.
Authors/Contributors
Derrick Wood
Jennifer Lu
Ben Langmead
For technical issues, bug reports, and code contributions, please use Kraken2's GitHub repository.
Page Updated: 2022/09/29