Glimmer

Glimmer3 vs Glimmer2

This page describes the differences between Glimmer version 2.13 and Glimmer version 3, and also gives some results comparing the performance of the two versions.

What Changed from Glimmer2 to Glimmer3

Glimmer3 makes several algorithmic changes to reduce the number of false positive predictions and to improve the accuracy of start-site predictions. Changes also have been made in some program parameters and options, and in output formats. Some specific differences are:

Glimmer2 used a set of rules to attempt to resolve overlaps between candidate orfs. When the overlap could not be resolved, both orfs were included in the prediction list, resulting in a high false-positive rate.
Glimmer3 uses a dynamic programming algorithm to select the highest-scoring set of predictions consistent with the maximimum allowed overlap. This reduces the number of false positive predictions with little or no increase in the number of false negative predictions.
Glimmer3 scores orfs in the reverse direction, i.e., from stop to start. This improves the accuracy of scores near the start codon because the trailing context of the ICM is within the coding region.
The long-orfs program now uses an amino-acid distribution model to filter the set of candidate orfs before a set of long, non-overlapping orfs is selected.
The make system and directory structure has been revised to separate source, object and executable files.
Program options are now specified before required parameters (Unix style), rather than after (DOS style).
The glimmer3 program produces two separate output files: a .detail file with information about all orfs (like the first part of Glimmer2 output); and a .predict file containing just the final predictions (like the last part of Glimmer2 output). glimmer3 requires a third parameter which is used to prefix the names of these files.
Glimmer3 prediction coordinates now include the stop codon, and hence will differ from Glimmer2 values by 3.
The glimmer3 program will process a multi-fasta sequence file. The outputs for each sequence are preceded by the fasta-header line in both the .detail and .predict files.

For more information on Glimmer3 see the Version 3.02 Release Notes

Glimmer3 vs. Glimmer2.13 Accuracy

Below are links to some comparisons of the results of Glimmer3 and Glimmer2 on 30 microbial genomes from RefSeq at GenBank.

Table 1. Probability models trained on genes with annotated function. Predictions compared to the same set.
Table 2. Probability models trained on genes with annotated function. Predictions compared to all annotated genes.
Table 3. Probability models trained on the output of the long-orfs program. Predictions compared to genes with annotated function.
Table 4. Probability models trained on the output of the long-orfs program. Predictions compared to all annotated genes.
Table 5. Glimmer2.13 long-orfs output and Glimmer3 long-orfs output compared to all annotated genes.

Obtaining Glimmer

Glimmer is OSI Certified Open Source Software .

Click here for Glimmer3.02.
Click here for Glimmer2.13.

References

For descriptions of the three major versions of Glimmer see our papers:

S. Salzberg, A. Delcher, S. Kasif, and O. White. Microbial gene identification using interpolated Markov models, Nucleic Acids Research 26:2 (1998), 544-548.
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER, Nucleic Acids Research 27:23 (1999), 4636-4641.
A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics 23:6 (2007), 673-679.

Acknowledgements

Glimmer is currently supported by the National Library of Medicine at NIH under grant R01-LM007938. It was previously supported by the National Science Foundation under grants IRI-9530462 and IIS-9902923, and by the National Institutes of Health under grant R01-LM06845.