CCB » Software » Kraken

About Kraken

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.

Kraken is written in C++ and Perl, and is designed for use with the Linux operating system. We have also successfully compiled and run it under the Mac OS.

Downloads and Documents

Accuracy and speed

Although we tested Kraken on real sequence data from isolated genomes, the biggest challenge for an exact alignment approach is that of maintaining sensitivity in the face of high divergence from the training data (in this case, Kraken's genomic library). To address the concern of Kraken's sensitivity with such sequences, we created a simulated metagenomic dataset containing simulated 100 bp reads with high sequencing error (2.1% SNP rate, 1.1% indel rate). Below are the results of using various classifiers on this dataset, with accuracy evaluated on a per-read basis (these results used January 2013 data to build each classifier's reference library):

Classifier Genus
Naïve Bayes Classifier 97.64 97.64 7
PhymmBL 96.11 96.11 76
PhymmBL (conf. > 0.65) 99.08 95.45 76
Megablast w/ best hit 96.93 93.67 4511
Kraken 99.90 91.25 1307161
Kraken (quick operation) 99.92 89.54 4101162
MiniKraken (Kraken w/ 4GB DB) 99.95 65.87 1441476
MiniKraken (quick operation) 99.98 65.31 2693119
MetaPhlAn n/a n/a 370770

Removing low-complexity sequences

When analyzing a metagenomics sample using a large Kraken database -- including the standard DB described in the manual -- the primary source of false positive hits is low-complexity sequences in the genomes themselves; e.g., a string of 31 or more consecutive A's. These can largely be eliminated by first running the 'dust' program on all genomes and then building the database from these 'dusted' genomes. We strongly recommend running this program, which requires a custom database build, as described in the manual. DUST is included with the BLAST program from NCBI and is described in Morgulis et al. 2006 (

Kraken and other tools

Kraken is also available as part of the MetAMOS metagenomic assembly and analysis pipeline. MetAMOS also provides support for using the Krona visualization package with Kraken's results.

Illumina has also released a Kraken BaseSpace app for researchers using the BaseSpace platform.


Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15:R46.


We encourage users to share their questions and experiences with Kraken. We have established a users group (Kraken-users on Google Groups) for discussion.


Derrick Wood