Glimmer3
|
Table 2: Train on Non-Hypothetical Genes & Test on
All Annotated Genes
|
This table shows the accuracy of Glimmer3 predictions on
30 microbial genomes from
RefSeq at GenBank.
For each genome the ICM model was trained on the set of annotated
genes without the word
"hypothetical" in their function description, without
frameshifts or in-frame stop codons, and at least 90bp long.
Both Glimmer3 and Glimmer2 were run with
the same indicated options and the results compared to a test
set consisting of all annotated genes without frameshifts or
in-frame stops and at least 90bp long.
The results are compared to the
Glimmer2 predictions, but without those that were:
-
marked as "NearReject", or
-
contained within another prediction and have a lower score
than the containing prediction.
The -g option is the minimum gene
length and the -o option is the maximum overlap allowed.
The -g value was obtained by finding the shortest gene in the training
set. The -o value was chosen as either the maximum overlap between
genes in the training set or 110, whichever was smaller.
Both Glimmer3 and Glimmer2 were run with the same options.
|
"Matches" are predictions that had the same
reading frame and stop codon as an annotated gene in the test set.
"Correct Starts" are predictions that are matches and also
have the same start codon as the matched gene.
"Extra" are predictions that are not matches.
|
It is important to note that
the test set here includes genes labelled as "hypothetical" whose
validity may be suspect. In fact, the annotation for some of
these genomes
was produced using Glimmer2, which will artificially inflate
the number of matches for that program.
|
Genome |
Glimmer3 Predictions |
vs. Glimmer2.13 |
|
Organism |
Length |
GC% |
# Genes |
Matches |
Correct Starts |
Extra |
Matches |
Correct Starts |
Extra |
Options |
Archaeoglobus fulgidus |
2.18Mb |
48.6 |
2398 |
2323 |
96.9% |
1670 |
69.6% |
144 |
0 |
-118 |
-64 |
-g 141 -o 67 |
Bacillus anthracis |
5.23Mb |
35.4 |
5308 |
5075 |
95.6% |
4355 |
82.0% |
394 |
+29 |
+1114 |
-101 |
-g 111 -o 30 |
Bacillus subtilis |
4.21Mb |
43.5 |
4095 |
3997 |
97.6% |
3438 |
84.0% |
449 |
+25 |
+1045 |
-537 |
-g 102 -o 93 |
Campylobacter jejuni |
1.78Mb |
30.3 |
1836 |
1786 |
97.3% |
1634 |
89.0% |
115 |
-2 |
+162 |
-61 |
-g 111 -o 104 |
Carboxydothermus hydrogenoformans |
2.40Mb |
42.0 |
2606 |
2458 |
94.3% |
2178 |
83.6% |
159 |
-32 |
+462 |
-160 |
-g 111 -o 70 |
Caulobacter crescentus |
4.02Mb |
67.2 |
3737 |
3500 |
93.7% |
2254 |
60.3% |
246 |
+45 |
-287 |
-233 |
-g 123 -o 110 |
Chlorobium tepidum |
2.15Mb |
56.5 |
2252 |
1909 |
84.8% |
1306 |
58.0% |
145 |
-23 |
-121 |
-163 |
-g 114 -o 30 |
Clostridium perfringens |
3.03Mb |
28.6 |
2660 |
2629 |
98.8% |
2387 |
89.7% |
52 |
+5 |
+480 |
-16 |
-g 111 -o 30 |
Colwellia psychrerythraea |
5.37Mb |
38.0 |
4902 |
4562 |
93.1% |
3828 |
78.1% |
212 |
-79 |
+357 |
-165 |
-g 93 -o 30 |
Dehalococcoides ethenogenes |
1.47Mb |
48.9 |
1579 |
1485 |
94.0% |
1255 |
79.5% |
57 |
+1 |
+161 |
-43 |
-g 111 -o 30 |
Escherichia coli |
4.64Mb |
50.8 |
4231 |
4126 |
97.5% |
3639 |
86.0% |
340 |
+22 |
+900 |
-600 |
-g 99 -o 110 |
Geobacter sulfurreducens |
3.81Mb |
60.9 |
3438 |
3261 |
94.9% |
2673 |
77.7% |
170 |
+22 |
+789 |
-375 |
-g 126 -o 110 |
Haemophilus influenzae |
1.83Mb |
38.1 |
1649 |
1636 |
99.2% |
1445 |
87.6% |
173 |
-1 |
+153 |
-108 |
-g 111 -o 101 |
Helicobacter pylori |
1.67Mb |
38.9 |
1556 |
1521 |
97.8% |
1276 |
82.0% |
158 |
-1 |
+68 |
-90 |
-g 111 -o 68 |
Listeria monocytogenes |
2.91Mb |
38.0 |
2819 |
2750 |
97.6% |
2476 |
87.8% |
60 |
-3 |
+591 |
-45 |
-g 111 -o 30 |
Methylococcus capsulatus |
3.30Mb |
63.6 |
2958 |
2831 |
95.7% |
2066 |
69.8% |
405 |
+51 |
+477 |
-347 |
-g 93 -o 110 |
Mycobacterium tuberculosis |
4.40Mb |
65.6 |
4189 |
3884 |
92.7% |
2355 |
56.2% |
425 |
-1 |
-293 |
-365 |
-g 111 -o 110 |
Neisseria meningitidis |
2.27Mb |
51.5 |
2055 |
1842 |
89.6% |
1532 |
74.5% |
704 |
+13 |
+323 |
-596 |
-g 93 -o 89 |
Porphyromonas gingivalis |
2.34Mb |
48.3 |
1909 |
1783 |
93.4% |
1281 |
67.1% |
302 |
-20 |
-76 |
-221 |
-g 114 -o 62 |
Pseudomonas fluorescens |
7.07Mb |
63.3 |
6134 |
6013 |
98.0% |
4592 |
74.9% |
361 |
+26 |
+951 |
-732 |
-g 108 -o 110 |
Pseudomonas putida |
6.18Mb |
61.5 |
5349 |
5118 |
95.7% |
3797 |
71.0% |
394 |
+28 |
+307 |
-614 |
-g 114 -o 101 |
Ralstonia solanacearum |
3.72Mb |
67.0 |
3435 |
3319 |
96.6% |
2665 |
77.6% |
245 |
+127 |
+963 |
-398 |
-g 99 -o 110 |
Staphylococcus epidermidis |
2.62Mb |
32.1 |
2487 |
2333 |
93.8% |
2110 |
84.8% |
87 |
-7 |
+461 |
-32 |
-g 111 -o 75 |
Streptococcus agalactiae |
2.16Mb |
35.6 |
2122 |
2045 |
96.4% |
1863 |
87.8% |
76 |
+8 |
+336 |
-29 |
-g 114 -o 30 |
Streptococcus pneumoniae |
2.16Mb |
39.7 |
2093 |
1918 |
91.6% |
1637 |
78.2% |
217 |
-20 |
+91 |
-31 |
-g 114 -o 47 |
Thermotoga maritima |
1.86Mb |
46.2 |
1854 |
1813 |
97.8% |
1426 |
76.9% |
81 |
-4 |
+127 |
-115 |
-g 114 -o 56 |
Treponema denticola |
2.84Mb |
37.9 |
2761 |
2584 |
93.6% |
2246 |
81.3% |
89 |
-32 |
+392 |
-133 |
-g 111 -o 68 |
Treponema pallidum |
1.14Mb |
52.8 |
1034 |
986 |
95.4% |
651 |
63.0% |
143 |
-15 |
+11 |
-243 |
-g 111 -o 110 |
Ureaplasma parvum |
0.75Mb |
25.5 |
614 |
607 |
98.9% |
524 |
85.3% |
13 |
0 |
+14 |
-16 |
-g 111 -o 30 |
Wolbachia endosymbiont |
1.08Mb |
34.2 |
805 |
790 |
98.1% |
650 |
80.7% |
374 |
-4 |
+32 |
-95 |
-g 126 -o 30 |
Averages: |
|
95.3% |
|
77.5% |
|
+5.3 |
+329 |
-224 |
|
|
Notes:
-
Not all the genomes necessarily have carefully/accurately annotated
start sites, so the results for number of correct starts may
be suspect.
-
Since Ureaplasma uses NCBI translation table
4 (TGA is not a stop codon), Glimmer3 must be run with an
appropriate option ("-z 4" or "-Z taa,tag") and Glimmer2 must
be modified and recompiled as described in its readme files
in order to obtain these results.
|