Glimmer3 vs Glimmer2
This page describes the differences between Glimmer version 2.13 and
Glimmer version 3, and also gives some results comparing the
performance of the two versions.
|
What Changed from Glimmer2 to Glimmer3
Glimmer3 makes several algorithmic changes to reduce the number of
false positive predictions and to improve the accuracy of start-site
predictions. Changes also have been made in some program parameters and
options, and in output formats. Some specific differences are:
- Glimmer2 used a set of rules to attempt to resolve
overlaps between candidate orfs. When the overlap could not be
resolved, both orfs were included in the prediction list, resulting in
a high false-positive rate.
Glimmer3 uses a dynamic programming algorithm to select the
highest-scoring set of predictions consistent with the maximimum
allowed overlap. This reduces the number of false positive predictions
with little or no increase in the number of false negative predictions.
- Glimmer3 scores orfs in the reverse direction, i.e.,
from stop to start. This improves the accuracy of scores near the start
codon because the trailing context of the ICM is within the coding
region.
- The long-orfs program now uses an
amino-acid distribution model to filter the set of candidate orfs
before a set of long, non-overlapping orfs is selected.
- The make system and directory structure
has been revised to separate source, object and executable files.
- Program options are now specified before required
parameters (Unix style), rather than after (DOS style).
- The glimmer3 program produces two
separate output files: a .detail file with
information about all orfs (like the first part of Glimmer2 output);
and a .predict file containing just the final
predictions (like the last part of Glimmer2 output). glimmer3
requires a third parameter which is used to prefix the names of these
files.
- Glimmer3 prediction coordinates now include the stop
codon, and hence will differ from Glimmer2 values by 3.
- The glimmer3 program will process a multi-fasta
sequence file. The outputs for each sequence are preceded by the
fasta-header line in both the .detail and
.predict files.
For more information on Glimmer3 see the
Version 3.02 Release Notes
|
Glimmer3 vs. Glimmer2.13 Accuracy
Below are links to some comparisons of the results of Glimmer3 and
Glimmer2 on 30 microbial genomes from
RefSeq at GenBank.
- Table 1. Probability models
trained on genes with annotated function. Predictions compared to the
same set.
- Table
2. Probability models trained on genes with annotated function.
Predictions compared to all annotated genes.
- Table 3. Probability models
trained on the output of the long-orfs
program. Predictions compared to genes with annotated function.
- Table 4. Probability models
trained on the output of the long-orfs
program. Predictions compared to all annotated genes.
- Table 5. Glimmer2.13
long-orfs output and Glimmer3
long-orfs output compared to all annotated
genes.
|
Obtaining Glimmer
Glimmer is
OSI Certified
Open Source Software .
Click here for Glimmer3.02.
Click here for Glimmer2.13.
|
References
For descriptions of the three major versions of Glimmer see our papers:
- S. Salzberg, A. Delcher, S. Kasif, and O. White.
Microbial gene identification using
interpolated Markov models, Nucleic Acids
Research 26:2 (1998), 544-548.
- A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.
Improved microbial gene identification
with GLIMMER, Nucleic Acids
Research 27:23 (1999), 4636-4641.
- A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg.
Identifying bacterial genes and endosymbiont
DNA with Glimmer, Bioinformatics 23:6 (2007), 673-679.
|
Acknowledgements
Glimmer is currently supported by the National
Library of Medicine at NIH under grant R01-LM007938. It was
previously supported by the National
Science Foundation under grants IRI-9530462 and IIS-9902923, and by
the National Institutes of Health
under grant R01-LM06845.
|