Glimmer 2This page describes the old version 2.13 of Glimmer. Glimmer version 3 has reduced the number of false-positive predictions and has improved the accuracy of start codon predictions. For more about Glimmer3 and download information, click on this link to Glimmer3. For Glimmer3 performance comparisons with Glimmer2 click on Glimmer3 vs Glimmer2.
Glimmer v2.13 can still be downloaded as glimmer213.tar.gz.
About GlimmerGlimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach, described in our Nucleic Acids Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a combination of Markov models from 1st through 8th-order, weighting each model according to its predictive power. Glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in their IMMs.
Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of some of these genomes and others is available at the Comprehensive Microbial Resource site.
For our eukaryotic gene finders go to the GlimmerHMM site .
Glimmer 2.13's Accuracy
The table above shows Glimmer 2.13's accuracy for 31 complete bacterial and archaeal genomes. Accuracy figures reflect Glimmer's default settings for most parameters, while the minimum gene length varies from 90bp to 150 bp for different organisms. The majority of the genes missed were very short, either below the minimum or very close to it.To determine accuracy here we consider only "confirmed genes," which we define as genes that have a significant database match to a gene in another organism.
The above results were obtained by different training procedures:
(1) Glimmer was trained by first extracting all non-overlapping open reading frames (using the long-orfs program that comes with the system).
(2) Glimmer was trained on all genes with a significant protein match based on translated Blast searches.
(3) Glimmer was trained on all annotated open reading frames.
The number of matches above represents both those orfs where Glimmer finds only the correct stop codon and those for which Glimmer finds both start and stop codons with the correct coordinates. The number of start codons found can be improved for most genomes by using the Perl program RBS-finder on the original Glimmer output. This program moves the start codons according to the position of likely ribosome binding sites. Glimmer3 will have this RBS functionality built into the code as well as many other improvements.
A sample of Glimmer 2.13 output for H. pylori is contained here . The Glimmer 2.13 output format is explained in this readme file .
SpeedFor genomes under 2 megabases (e.g., H. pylori and H. influenzae), Glimmer 2.13 requires under 30 seconds for training (the build-icm program) on a Linux PC powered by a 600 MHz Pentium processor and for the biggest genome in the list above, V. spinosum (8.5 Mb) it takes under 90 seconds. It then takes only 20 seconds (the glimmer2 program) to find all the genes in small genomes, up to 5 minutes for bigger genomes.
Note to Glimmer users: it is always preferable to train Glimmer on a sample of genes from the same genome that you are finding genes in. This is easy to do with any bacterial genome, using the long-orfs program to extract long open reading frames that can be used to bootstrap the system. (This is explained in the readme files that come with Glimmer.) If you wish to search for genes in a short fragment of DNA, Glimmer needs to be trained on a longer sequence. The best strategy is to train on a closely similar genome.
Obtaining GlimmerThis software is OSI Certified Open Source Software.
To download the complete Glimmer2.13 system , just click here .
After downloading, uncompress the distribution file by typing:
% tar -xzf glimmer213.tar.gzA directory will be created which contains the source files, training data sets, and other supporting files. See the included "readme" files for more information or look at this README .
ReferencesFor a description of Glimmer 1.0 and 2.0 see our papers:
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER Nucleic Acids Research, 27:23 (1999), 4636-4641.
S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models Nucleic Acids Research 26:2 (1998), 544-548.
AcknowledgementsGlimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845.