Overview
GlimmerHMM is a new gene finder based on a Generalized Hidden Markov Model (GHMM). Although the gene finder conforms to the overall mathematical framework of a GHMM, additionally it incorporates splice site models adapted from the GeneSplicer program and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models for the coding and noncoding models . Currently, GlimmerHMM's GHMM structure includes introns of each phase, intergenic regions, and four types of exons (initial, internal, final, and single). A basic user manual can be consulted here.
System requirements
GlimmerHMM is released as source code and was tested on Linux RedHat 6.x+, Sun Solaris, and Alpha OSF1, but should work on any Unix system.
Accuracy
GlimmerHMM has been trained on several species including Arabidopsis thaliana, Coccidioides species, Cryptococcus neoformans, and Brugia malayi.
Nuc Sens | Nuc Prec | Nuc Accur | Exon Sens |
Exon Prec | Exact Genes | Size of test set | |
D.rerio | 93% | 78% | 86% | 77% | 69% | 24% | 549 genes |
C.elegans | 96% | 95% | 96% | 82% | 81% | 42% | 1886 genes |
Arabidopsis | 97% | 99% | 98% | 84% | 89% | 60% | 809 genes |
Cryptococcus | 96% | 99% | 98% | 86% | 88% | 53% | 350 genes |
Coccidioides | 99% | 99% | 99% | 84% | 86% | 60% | 503 genes |
Brugia | 93% | 98% | 95% | 78% | 83% | 25% | 477 genes |
In the tables, "Sens" is sensitivity, which is the percentage of true exons/nucl predicted by the system. "Prec" is precision, which is the percentage of the system's predictions that are correct. GlimmerHMM has been trained on the human genome as well. The table below presents its performace compared to Genscan on 963 human RefSeq genes selected randomly from all 24 chromosomes, non-overlapping with the training set. The test set contains 1000 bp of untranslated sequence on either side (5' or 3') of the coding portion of each gene.
Nuc Sens | Nuc Prec | Nuc Acc | Exon Sens | Exon Prec | Exon Acc | Exact Genes | |
GlimmerHMM | 86% | 72% | 79% | 72% | 62% | 67% | 17% |
Genscan | 86% | 68% | 77% | 69% | 60% | 65% | 13% |
Obtaining GlimmerHMM
This software is OSI Certified Open Source Software.
To download the complete GlimmerHMM system, just click here .
After downloading, uncompress the distribution file by typing:% tar -xzf GlimmerHMM-3.0.4.tar.gz
A directory named 'GlimmerHMM/' will be created which contains the executable, training data sets, and other supporting files.
Contact Information
You can contact us about GlimmerHMM at: mpertea jhu edu
References
Majoros, W.H., Pertea, M.,and Salzberg, S.L.
TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders
Bioinformatics 2004 2878-2879.
Pertea, M. and S. L. Salzberg (2002). "Computational gene finding in plants." Plant Molecular Biology 48(1-2): 39-48.
The Arabidopsis Genome Initiative, (2000) "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana", Nature. Dec 14; 408(6814):796-815.
Pertea, M., S. L. Salzberg, et al. (2000). "Finding genes in Plasmodium falciparum." Nature 404(6773): 34; discussion 34-5.
Salzberg, S. L., M. Pertea, et al. (1999). "Interpolated Markov models for eukaryotic gene finding." Genomics 59(1): 24-31.
Acknowledgements
The development of GlimmerHMM was supported by the NIH under grants R01-LM06845 and R01-LM007938.