CCB » Software » EuPathDB

EuPathDB-Clean is a set of 245 eukaryotic pathogen genomes originally downloaded from the EuPathDB Resource Project established by the National Institute of Allergy and Infectious Disease (NIAID/NIH). The 245 draft genomes were processed to remove all contaminating and low-complexity sequences.

Here we provide the 245 cleaned eukaryotic pathogen genome FASTA files that can be used with any metagenomics classification/analysis programs. For use, download each of the .tgz files and untar the FASTA files using:
  • for file in *.tgz; do tar -xvzf $file -C my_eupathDB_folder/; done


Class Genus Composition * Number of Files Download Uncompressed
AmoebaDB Acanthamoeba, Entamoeba, Naegleria 29 AmoebaDB_Clean.tgz (649 MB) 20.0 GB
CryptoDB Chromera, Cryptosporidium,
Gregarina, Vitrella
11 CryptoDB_Clean.tgz (172 MB) 717 MB
FungiDB Ajellomyces, Aspergillus, Candida,
Coccidioides, Cryptococus, Fusarium,
Rhizopus, Saccharomyces, Trichoderma
87 FungiDB_Clean.tgz (1.8 GB) 6.9 GB
GiardiaDB Giardia, Spironucleus 6 GiardiaDB_Clean.tgz (41 MB) 144 MB
MicrosporidiaDB Anncaliia, Encephalitozoon, Mitosporidia, Nematocida 25 MicrosporidiaDB_Clean.tgz (69 MB) 398 MB
PiroplasmaDB Babesia, Cytauxzoon, Theileria 8 PiroplasmaDB_Clean.tgz (41 MB) 152 MB
PlasmoDB Plasmodium 9 PlasmoDB_Clean.tgz (54 MB) 533 MB
ToxoDB Cyclospora, Eimeria, Hammondia,
Neospora, Sarcocystis, Toxoplasma
30 ToxoDB_Clean.tgz (910 MB) 3.6 Gb
TrichDB Trichomonas 1 TrichDB_Clean.tgz (85 MB) 360 MB
TriTrypDB Leishmania, Leptomonas, Trypanosoma 39 TriTrypDB_Clean.tgz (671 MB) 2.6 GB
Total EuPathDB-Clean 245 4.5 GB 36 GB
*The genuses listed for FungiDB and MicrosporidiaDB are a subset of all genuses represented. For full list of genuses contained in these classes, see the publication supplementary table.

To make a Kraken database from the above genomes, download the genomes and add them directly to your database's library/ folder and download this file and place it directly in your database folder. From there, you can build the database as normal.


The preprint is available at the following:
J. Lu, S.L. Salzberg. (2018). "Removing Contaminants from Metagenomic Databases." bioRxiv


Jennifer Lu ( jlu26 jhmi edu ) Ph.D. Candidate
Steven Salzberg, Ph.D.

Page Updated: 2018/01/30

Back to top