CCB » Software » EuPathDB


EuPathDB-Clean is a set of 245 eukaryotic pathogen genomes originally downloaded from the EuPathDB Resource Project established by the National Institute of Allergy and Infectious Disease (NIAID/NIH). The 245 draft genomes were processed to remove all contaminating and low-complexity sequences.

Here we provide the 245 cleaned eukaryotic pathogen genome FASTA files that can be used with any metagenomics classification/analysis programs. For use, download each of the .tgz files and untar the FASTA files using:
  • for file in *.tgz; do tar -xvzf $file -C my_eupathDB_folder/; done

Downloads

Class Genus Composition * Number of Files Download Uncompressed
AmoebaDB Acanthamoeba, Entamoeba, Naegleria 29 AmoebaDB_Clean.tgz (649 MB) 20.0 GB
CryptoDB Chromera, Cryptosporidium,
Gregarina, Vitrella
11 CryptoDB_Clean.tgz (172 MB) 717 MB
FungiDB Ajellomyces, Aspergillus, Candida,
Coccidioides, Cryptococus, Fusarium,
Rhizopus, Saccharomyces, Trichoderma
87 FungiDB_Clean.tgz (1.8 GB) 6.9 GB
GiardiaDB Giardia, Spironucleus 6 GiardiaDB_Clean.tgz (41 MB) 144 MB
MicrosporidiaDB Anncaliia, Encephalitozoon, Mitosporidia, Nematocida 25 MicrosporidiaDB_Clean.tgz (69 MB) 398 MB
PiroplasmaDB Babesia, Cytauxzoon, Theileria 8 PiroplasmaDB_Clean.tgz (41 MB) 152 MB
PlasmoDB Plasmodium 9 PlasmoDB_Clean.tgz (54 MB) 533 MB
ToxoDB Cyclospora, Eimeria, Hammondia,
Neospora, Sarcocystis, Toxoplasma
30 ToxoDB_Clean.tgz (910 MB) 3.6 Gb
TrichDB Trichomonas 1 TrichDB_Clean.tgz (85 MB) 360 MB
TriTrypDB Leishmania, Leptomonas, Trypanosoma 39 TriTrypDB_Clean.tgz (671 MB) 2.6 GB
Total EuPathDB-Clean 245 4.5 GB 36 GB
*The genuses listed for FungiDB and MicrosporidiaDB are a subset of all genuses represented. For full list of genuses contained in these classes, see the publication supplementary table.

To make a Kraken database from the above genomes, download the genomes and add them directly to your database's library/ folder and download this seqid2taxid.map file and place it directly in your database folder. From there, you can build the database as normal.

Publications

The preprint is available at the following:
J. Lu, S.L. Salzberg. (2018). "Removing Contaminants from Metagenomic Databases." bioRxiv https://doi.org/10.1101/261859

Authors/Contributors

Jennifer Lu ( jlu26 jhmi edu ) Ph.D. Candidate
Steven Salzberg, Ph.D.

Page Updated: 2018/01/30

Back to top