CCB » Software » GlimmerHMM

Overview

GlimmerHMM is a new gene finder based on a Generalized Hidden Markov Model (GHMM). Although the gene finder conforms to the overall mathematical framework of a GHMM, additionally it incorporates splice site models adapted from the GeneSplicer program and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models for the coding and noncoding models . Currently, GlimmerHMM's GHMM structure includes introns of each phase, intergenic regions, and four types of exons (initial, internal, final, and single). A basic user manual can be consulted here.

System requirements

GlimmerHMM is released as source code and was tested on Linux RedHat 6.x+, Sun Solaris, and Alpha OSF1, but should work on any Unix system.

Accuracy

GlimmerHMM has been trained on several species including Arabidopsis thaliana, Coccidioides species, Cryptococcus neoformans, and Brugia malayi.

  Nuc Sens Nuc Prec Nuc Accur Exon Sens
Exon Prec Exact Genes Size of test set
D.rerio 93% 78% 86% 77% 69% 24% 549 genes
C.elegans 96% 95% 96% 82% 81% 42% 1886 genes
Arabidopsis 97% 99% 98% 84% 89% 60% 809 genes
Cryptococcus 96% 99% 98% 86% 88% 53% 350 genes
Coccidioides 99% 99% 99% 84% 86% 60% 503 genes
Brugia 93% 98% 95% 78% 83% 25% 477 genes

In the tables, "Sens" is sensitivity, which is the percentage of true exons/nucl predicted by the system.  "Prec" is precision, which is the percentage of the system's predictions that are correct. GlimmerHMM has been trained on the human genome as well. The table below presents its performace compared to Genscan on 963 human RefSeq genes selected randomly from all 24 chromosomes, non-overlapping with the training set. The test set contains 1000 bp of untranslated sequence on either side (5' or 3') of the coding portion of each gene.

  Nuc Sens Nuc Prec Nuc Acc Exon Sens Exon Prec Exon Acc Exact Genes
GlimmerHMM 86% 72% 79% 72% 62% 67% 17%
Genscan 86% 68% 77% 69% 60% 65% 13%


Obtaining GlimmerHMM

This software is  OSI Certified Open Source Software.

To download the complete GlimmerHMM system, just click here .

After downloading, uncompress the distribution file by typing:
 % tar -xzf GlimmerHMM-3.0.4.tar.gz 

A directory named 'GlimmerHMM/' will be created which contains the executable, training data sets, and other supporting files.

Contact Information

You can contact us about GlimmerHMM at: mpertea jhu edu

References

Majoros, W.H., Pertea, M.,and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 2004 2878-2879.

Pertea, M. and S. L. Salzberg (2002). "Computational gene finding in plants." Plant Molecular Biology 48(1-2): 39-48.

The Arabidopsis Genome Initiative, (2000) "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana", Nature. Dec 14; 408(6814):796-815.

Pertea, M., S. L. Salzberg, et al. (2000). "Finding genes in Plasmodium falciparum." Nature 404(6773): 34; discussion 34-5.

Salzberg, S. L., M. Pertea, et al. (1999). "Interpolated Markov models for eukaryotic gene finding." Genomics 59(1): 24-31.

Acknowledgements

The development of GlimmerHMM was supported by the NIH under grants R01-LM06845 and R01-LM007938.