Home Datasets Genome Assemblers Recipes Genome Assemblies Publication
Home Datasets Genome Assemblers Recipes Genome Assemblies Publication
    
What is GAGE-B?
GAGE-B is an evaluation of contiguity and accuracy of assemblies of bacterial organisms that are generated by some of most commonly used genome assemblers. GAGE-B follows the standards set by GAGE.
 
Why do we need GAGE-B?
There is a need to determine which genome assembler is the best one for assembling bacterial organisms:
  • New bacterial strains have been sequenced on a daily basis due to development of Human Microbiome Project, among others.
  • Goal of assembling bacterial organisms is different from the goal of assembling giant organisms. While assembling giant genomes aims at getting a draft global assembly, assembling bacterial organisms aims at accurate local assemblies to enable comparison of different species/strains in order to find differences between pathogenic and non-pathogenic strains.
  • Some genome assemblers perform better on large genomes while other assemblers work better on small organisms; however, all assemblers are being used to assemble bacterial organisms.
GAGE-B also aims at answering the following questions:
  • Which parameters should be used to generate the optimal assemblies?
  • How does coverage level impact assembly?
  • Can high covereage by a single library replace multi-library assemblies?
  • Are better assemblies generate by HiSeq or MiSeq Illumina reads?
 
How does GAGE-B work?
GAGE-B is run by a team of genome assembly experts. Twelve bacterial organisms were selected to be assembled by eight of leading genome assemblers. For each organisms, each assembler is run with various combination of input parameters, and the assembly with the highest contig N50 size is determined the best assembly by a particular assembler. Assemblies by all assemblers are compared for their contiguity and accuracy.
 
What genomes are assembled?
Eight genomes were assembled from HiSeq Illumina reads:
  • Aeromonas hydrophila SSU
  • Bacillus cereus VD 118
  • Bacteroides fragilis HMW 615
  • Mycobacterium abscessus 6G-0125-R
  • Rhodobacter sphaeroides 2.4.1
  • Staphylococcus aureus M0927
  • Vibrio cholerae CP1032(5)
  • Xanthomonas axonopodis pv. Manihotis UA 323
Four genomes were assembled from MiSeq Illumina reads:
  • Bacillus cereus ATCC 10987
  • Mycobacterium abscessus 6G-0125-R
  • Rhodobacter sphaeroides 2.4.1
  • Vibrio cholerae CP1032(5)
 
Acknowledgements
This work was supported in part by the U.S. National Institute of Health under grants R01-HG006677 and R01-HG006102, and the USDA's National Institute of Food and Agriculture (NIFA) under grants 2009-35205-05209 and 2011-67009-30030. We thank the sequencing centers who submitted raw data to the Sequence Read Archive for the genomes used in this study.