CCB » CBCC » Glimmer » Glimmer 2

Glimmer 2

This page describes the old version 2.13 of Glimmer. Glimmer version 3 has reduced the number of false-positive predictions and has improved the accuracy of start codon predictions. For more about Glimmer3 and download information, click on this link to Glimmer3. For Glimmer3 performance comparisons with Glimmer2 click on Glimmer3 vs Glimmer2.

Glimmer v2.13 can still be downloaded as glimmer213.tar.gz.


About Glimmer

Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. Glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of some of these genomes and others is available at the Comprehensive Microbial Resource site.
For our eukaryotic gene finders go to the GlimmerHMM site .

Glimmer 2.13's Accuracy

Organism Notes Genes confirmed
by homology
Found by
GLIMMER 2.13
Total genes annotated Total genes predicted
A. ferrooxidans 2 2054 2026 98.6% 3215
3178
A. fulgidus 2 1129 1128 99.9% 2431
2475
B. anthracis 2 3458 3444 99.6% 5507
5395
B. subtilis 3
4063
3979
97.9%
5231
4747
B. wolbachia 2 712 710 99.7%
1299
1226
C. crescentus 2 2205 2186 99.1% 3763
3890
C. jejuni 1 1341 1340 99.9% 1886
1869
C. perfringens 2 2153 2144 99.6% 2974
2863
C. tepidum 2 1304 1299 99.6% 2281
2165
D. ethenogenes 2 1141 1127 98.8%
1591
1544
E. coli 2
861
855
99.3%
4174
4121
F. succinogenes 2 2113 2105 99.6% 3256
3210
G. sulfurreducens 2 2462 2433 98.8% 3468
3711
H. influenza 2 1132 1131 99.9% 1740
1785
H. pylori 2 892 886 99.3% 1587
1678
L. monocytogenes 2 2084 2079 99.8% 2847
2778
M. capsulatus 2 2132 2093 98.2% 3002
3434
M. tuberculosis 2 2191 2177 99.4% 4245
4245
N. meningitidis 2 1202 1180 98.2% 2154
2494
P. fluorescens 2 4819 4790 99.4% 6148
6968
P. gingivalis 2 1254 1251 99.8% 1988
2052
P. ruminicola 2 2042 2040 99.9% 2872
2907
S. agalactiae 2 1487 1483 99.7% 2053
2083
S. gordonii 2 1702 1700 99.9% 2090
2086
S. pneumoniae 2 1425 1410 98.9% 2236
2115
T. denticola 3 1610 1603 99.6% 2786
2743
T. maritima 3 1101 1095 99.5% 1872
1988
T. neapolitana 3 1561 1556 99.7% 1906
1964
T. pallidum 3 576 571 99.1% 1039
1069
V. spinosum 2 3801 3791 99.7% 9111
7332
Wolbachia 2 746 745 99.9% 1271
1256
Average Percentage Found - 99.36% -
-

The table above shows Glimmer 2.13's accuracy for 31 complete bacterial and archaeal genomes. Accuracy figures reflect Glimmer's default settings for most parameters, while the minimum gene length varies from 90bp to 150 bp for different organisms. The majority of the genes missed were very short, either below the minimum or very close to it.

To determine accuracy here we consider only "confirmed genes," which we define as genes that have a significant database match to a gene in another organism.

Notes:
The above results were obtained by different training procedures:
(1) Glimmer was trained by first extracting all non-overlapping open reading frames (using the long-orfs program that comes with the system).
(2) Glimmer was trained on all genes with a significant protein match based on translated Blast searches.
(3) Glimmer was trained on all annotated open reading frames.

The number of matches above represents both those orfs where Glimmer finds only the correct stop codon and those for which Glimmer finds both start and stop codons with the correct coordinates. The number of start codons found can be improved for most genomes by using the Perl program RBS-finder on the original Glimmer output. This program moves the start codons according to the position of likely ribosome binding sites. Glimmer3 will have this RBS functionality built into the code as well as many other improvements.

A sample of Glimmer 2.13 output for H. pylori is contained here . The Glimmer 2.13 output format is explained in this readme file .

Speed

For genomes under 2 megabases (e.g., H. pylori and H. influenzae), Glimmer 2.13 requires under 30 seconds for training (the build-icm program) on a Linux PC powered by a 600 MHz Pentium processor and for the biggest genome in the list above, V. spinosum (8.5 Mb) it takes under 90 seconds. It then takes only 20 seconds (the glimmer2 program) to find all the genes in small genomes, up to 5 minutes for bigger genomes.

Note to Glimmer users: it is always preferable to train Glimmer on a sample of genes from the same genome that you are finding genes in. This is easy to do with any bacterial genome, using the long-orfs program to extract long open reading frames that can be used to bootstrap the system. (This is explained in the readme files that come with Glimmer.) If you wish to search for genes in a short fragment of DNA, Glimmer needs to be trained on a longer sequence. The best strategy is to train on a closely similar genome.

Obtaining Glimmer

This software is OSI Certified Open Source Software.

To download the complete Glimmer2.13 system , just click here .

After downloading, uncompress the distribution file by typing:

  % tar -xzf glimmer213.tar.gz
  
A directory will be created which contains the source files, training data sets, and other supporting files. See the included "readme" files for more information or look at this README .

References

For a description of Glimmer 1.0 and 2.0 see our papers:
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER Nucleic Acids Research, 27:23 (1999), 4636-4641.
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models Nucleic Acids Research 26:2 (1998), 544-548.

Acknowledgements

Glimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845.