CCB » Software » EuPathDB


EuPathDB-Clean is a set of 245 eukaryotic pathogen genomes originally downloaded from the EuPathDB Resource Project established by the National Institute of Allergy and Infectious Disease (NIAID/NIH). The 245 draft genomes were processed to remove all contaminating and low-complexity sequences.

Here we provide the 245 cleaned eukaryotic pathogen genome FASTA files that can be used with any metagenomics classification/analysis programs. For use, download each of the .tgz files and untar the FASTA files using:
  • for file in *.tgz; do tar -xvzf $file -C my_eupathDB_folder/; done

Downloads

EupathDB Library: eupathDB.tar.gz (2.2 GB)
This file contains all 245 genomes in a single folder. Each genome is a multi-fasta file with Kraken and Kraken2-compatible headers. To build a Kraken/Kraken2 database with these files, unzip the folder and place it directly within the library/ folder.
  • For example, to build a database with human, bacteria, and eupathDB:
  • tar -xzvf eupathDB.tar.gz [creates library folder with eupathDB files]
  • mv library/ $DBNAME/library/
  • kraken2-build --download-library bacteria --db $DBNAME
  • kraken2-build --download-library human --db $DBNAME
  • kraken2-build --build --db $DBNAME

  • (It is NOT required to use kraken2-build --add-to-library for these files.)

EupathDB Kraken2 Database: eupathDB_kraken2.tar.gz (5.6 GB)
This folder is a pre-built Kraken2 database of the 245 eupathDB genomes. It contains three files: hash.k2d, opts.k2d, and taxo.k2d.
  • For example, to use this pre-built database:
  • tar -xzvf eupathDB_kraken2.tar.gz
  • kraken2 --db eupathDB_kraken2 MYSAMPLE.FNA > MYSAMPLE.KRAKEN2

  • (Do not run kraken2-build --build on this folder.)
  • (This database is NOT compatible with Kraken 1)

For more information on Kraken and Kraken 2, see: Kraken's Website and Kraken2's Website

Class Genus Composition * Number of Files Download Uncompressed
AmoebaDB Acanthamoeba, Entamoeba, Naegleria 29 AmoebaDB_Clean.tgz (326 MB) 1.37 GB
CryptoDB Chromera, Cryptosporidium,
Gregarina, Vitrella
11 CryptoDB_Clean.tgz (86.5 MB) 359 MB
FungiDB Ajellomyces, Aspergillus, Candida,
Coccidioides, Cryptococus, Fusarium,
Rhizopus, Saccharomyces, Trichoderma
87 FungiDB_Clean.tgz (931 MB) 3.5 GB
GiardiaDB Giardia, Spironucleus 6 GiardiaDB_Clean.tgz (21 MB) 72 MB
MicrosporidiaDB Anncaliia, Encephalitozoon, Mitosporidia, Nematocida 25 MicrosporidiaDB_Clean.tgz (35 MB) 199 MB
PiroplasmaDB Babesia, Cytauxzoon, Theileria 8 PiroplasmaDB_Clean.tgz (21 MB) 76 MB
PlasmoDB Plasmodium 9 PlasmoDB_Clean.tgz (27 MB) 216 MB
ToxoDB Cyclospora, Eimeria, Hammondia,
Neospora, Sarcocystis, Toxoplasma
30 ToxoDB_Clean.tgz (456 MB) 1.8 Gb
TrichDB Trichomonas 1 TrichDB_Clean.tgz (43 MB) 180 MB
TriTrypDB Leishmania, Leptomonas, Trypanosoma 39 TriTrypDB_Clean.tgz (336 MB) 1.3 GB
Total EuPathDB-Clean 245 2.3 GB 9.1 GB
*The genuses listed for FungiDB and MicrosporidiaDB are a subset of all genuses represented. For full list of genuses contained in these classes, see the publication supplementary table.

Publications

The publication associated with the data is located at:
J. Lu, S.L. Salzberg. (2018). "Removing contaminants from databases of draft genomes." PLoS Comput Biol https://doi.org/10.1371/journal.pcbi.1006277

Authors/Contributors

Jennifer Lu ( jlu26 jhmi edu ) Ph.D. Candidate
Steven Salzberg, Ph.D.

Page Updated: 2019/02/28

Back to top