Centrifuge is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples, with better sensitivity than and comparable accuracy to other leading systems. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (e.g., 4.3 GB for ~4,100 bacterial genomes) yet provides very fast classification speed, allowing it to process a typical DNA sequencing run within an hour. Together these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers.
Open Source Software

Site Map

Getting Help

Releases

Indexes

Related Tools

  • Pavian: Tool for interactive analysis of pathogen and metagenomics data
  • HISAT2: Graph-based alignment to a population of genomes
  • Bowtie2: Ultrafast read alignment

Publications

Contributors

Links

Have a look at https://github.com/fbreitwieser/pavian for visual analysis of results generated with Centrifuge!

Due to the rapid spread of SARS-CoV-2 and its devastating effects, we provide additional Centrifuge indices in the hope that they will be useful for biomedical research related to the virus. The first two indexes include 106 complete SARS-CoV-2 genomes downloaded from GenBank as follows: (3/29/2020)

  • h+v+c: human genome and viral genomes including 106 SARS-CoV-2 complete genomes (download link)
  • h+p+v+c: human genome, prokaryotic genomes, and viral genomes including 106 SARS-CoV-2 complete genomes (download link)
  • Additional indexes including nt index are also available at Genexa (Note: the indexes include one reference SARS-CoV-2 genome.)

Centrifuge 1.0.4-beta release 6/5/2018

  • Support running multiple samples while loading index only once
  • Fix a bug of misassignment if a read comes near the boundary of a genome
  • centrifuge-kreport uses the lowest common ancestor taxonomy id for multiple assigned reads by default

Centrifuge 1.0.3 release 2/23/2018

  • Fix several bugs.
  • Output unclassified reads.
  • Make the options about output the sequences (--un,--al,--un-conc,--al-conc) work.

Centrifuge 1.0.3-beta release 12/06/2016

  • Fixed Perl hash bangs (thanks to Andreas Sjödin / @druvus).
  • Updated nt database building to work with new accession code scheme.
  • Added option --tab-fmt-cols to specify output format columns.
  • A minor fix for traversing the hitmap up the taxonomy tree.

Centrifuge paper published at Genome Research 11/16/2016

Centrifuge 1.0.2-beta release 5/25/2016

  • Fixed a runtime error during abundance analysis.
  • Changed a default report file name from centrifuge_report.csv to centrifuge_report.tsv.

Centrifuge preprint is available here at bioRxiv 5/24/2016

Centrifuge 1.0.1-beta release 3/8/2016

  • Centrifuge is now able to work directly with SRA data: both downloaded on demand over internet and prefetched to local disks.
    • For example, you can run Centrifuge with SRA data (SRR353653) as follows.
      centrifuge -x /path/to/index --sra-acc SRR353653
    • This eliminates the need to download SRA reads manually and to convert them into fasta/fastq format without affecting the run time.
  • We provide a Centrifuge index (nt index) for NCBI nucleotide non-redundant sequences collected from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes, totaling ~109 billion bps. Centrifuge is a very good alternative to Megablast (or Blast) for searching through this huge database.
  • Fixed Centrifuge's scripts related to sequence downloading and index building.

Centrifuge 1.0.0-beta release 2/19/2016 - first release

  • The first release of Centrifuge features a dramatically reduced database size, higher classification accuracy and sensitivity, and comparably rapid classification speed.
  • Please refer to the manual for details on how to run Centrifuge and interpret Centrifuge’s classification results.
  • We provide several standard indexes designed to meet the needs of most users (see the side panel - Indexes)
    • For compressed indexes, we first combined bacterial genomes belonging to the same species and removed redundant sequences, and built indexes using the combined sequences. As a result, those compressed indexes are much smaller than uncompressed indexes. Centrifuge classifies reads at the species level when using the compressed indexes and at the strain level (or the genome level) when using the uncompressed indexes.

The Centrifuge source code is available in a public GitHub repository (7/14/2015).