Glimmer | Glimmer 2

Glimmer 2

This page describes the old version 2.13 of Glimmer. Glimmer version 3 has reduced the number of false-positive predictions and has improved the accuracy of start codon predictions. For more about Glimmer3 and download information, click on this link to Glimmer3. For Glimmer3 performance comparisons with Glimmer2 click on Glimmer3 vs Glimmer2.

Glimmer v2.13 can still be downloaded as glimmer213.tar.gz.

About Glimmer

Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. Glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of some of these genomes and others is available at the Comprehensive Microbial Resource site.
For our eukaryotic gene finders go to the GlimmerHMM site .

Glimmer 2.13's Accuracy

Organism	Notes	Genes confirmed by homology	Found by GLIMMER 2.13		Total genes annotated	Total genes predicted
A. ferrooxidans	2	2054	2026	98.6%	3215	3178
A. fulgidus	2	1129	1128	99.9%	2431	2475
B. anthracis	2	3458	3444	99.6%	5507	5395
B. subtilis	3	4063	3979	97.9%	5231	4747
B. wolbachia	2	712	710	99.7%	1299	1226
C. crescentus	2	2205	2186	99.1%	3763	3890
C. jejuni	1	1341	1340	99.9%	1886	1869
C. perfringens	2	2153	2144	99.6%	2974	2863
C. tepidum	2	1304	1299	99.6%	2281	2165
D. ethenogenes	2	1141	1127	98.8%	1591	1544
E. coli	2	861	855	99.3%	4174	4121
F. succinogenes	2	2113	2105	99.6%	3256	3210
G. sulfurreducens	2	2462	2433	98.8%	3468	3711
H. influenza	2	1132	1131	99.9%	1740	1785
H. pylori	2	892	886	99.3%	1587	1678
L. monocytogenes	2	2084	2079	99.8%	2847	2778
M. capsulatus	2	2132	2093	98.2%	3002	3434
M. tuberculosis	2	2191	2177	99.4%	4245	4245
N. meningitidis	2	1202	1180	98.2%	2154	2494
P. fluorescens	2	4819	4790	99.4%	6148	6968
P. gingivalis	2	1254	1251	99.8%	1988	2052
P. ruminicola	2	2042	2040	99.9%	2872	2907
S. agalactiae	2	1487	1483	99.7%	2053	2083
S. gordonii	2	1702	1700	99.9%	2090	2086
S. pneumoniae	2	1425	1410	98.9%	2236	2115
T. denticola	3	1610	1603	99.6%	2786	2743
T. maritima	3	1101	1095	99.5%	1872	1988
T. neapolitana	3	1561	1556	99.7%	1906	1964
T. pallidum	3	576	571	99.1%	1039	1069
V. spinosum	2	3801	3791	99.7%	9111	7332
Wolbachia	2	746	745	99.9%	1271	1256
Average Percentage Found			-	99.36%	-	-

The table above shows Glimmer 2.13's accuracy for 31 complete bacterial and archaeal genomes. Accuracy figures reflect Glimmer's default settings for most parameters, while the minimum gene length varies from 90bp to 150 bp for different organisms. The majority of the genes missed were very short, either below the minimum or very close to it.

To determine accuracy here we consider only "confirmed genes," which we define as genes that have a significant database match to a gene in another organism.

Notes:
The above results were obtained by different training procedures:
(1) Glimmer was trained by first extracting all non-overlapping open reading frames (using the long-orfs program that comes with the system).
(2) Glimmer was trained on all genes with a significant protein match based on translated Blast searches.
(3) Glimmer was trained on all annotated open reading frames.

The number of matches above represents both those orfs where Glimmer finds only the correct stop codon and those for which Glimmer finds both start and stop codons with the correct coordinates. The number of start codons found can be improved for most genomes by using the Perl program RBS-finder on the original Glimmer output. This program moves the start codons according to the position of likely ribosome binding sites. Glimmer3 will have this RBS functionality built into the code as well as many other improvements.

A sample of Glimmer 2.13 output for H. pylori is contained here . The Glimmer 2.13 output format is explained in this readme file .

Speed

For genomes under 2 megabases (e.g., H. pylori and H. influenzae), Glimmer 2.13 requires under 30 seconds for training (the build-icm program) on a Linux PC powered by a 600 MHz Pentium processor and for the biggest genome in the list above, V. spinosum (8.5 Mb) it takes under 90 seconds. It then takes only 20 seconds (the glimmer2 program) to find all the genes in small genomes, up to 5 minutes for bigger genomes.

Note to Glimmer users: it is always preferable to train Glimmer on a sample of genes from the same genome that you are finding genes in. This is easy to do with any bacterial genome, using the long-orfs program to extract long open reading frames that can be used to bootstrap the system. (This is explained in the readme files that come with Glimmer.) If you wish to search for genes in a short fragment of DNA, Glimmer needs to be trained on a longer sequence. The best strategy is to train on a closely similar genome.

Obtaining Glimmer

This software is OSI Certified Open Source Software.

To download the complete Glimmer2.13 system , just click here .

After downloading, uncompress the distribution file by typing:

  % tar -xzf glimmer213.tar.gz

A directory will be created which contains the source files, training data sets, and other supporting files. See the included "readme" files for more information or look at this README .

References

For a description of Glimmer 1.0 and 2.0 see our papers:
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER Nucleic Acids Research, 27:23 (1999), 4636-4641.
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models Nucleic Acids Research 26:2 (1998), 544-548.

Acknowledgements

Glimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845.