Glimmer3
|
Table 1: Train & Test on
Non-Hypothetical Genes
|
This table shows the accuracy of Glimmer3 predictions on
30 microbial genomes from
RefSeq at GenBank.
For each genome the set of genes is those without the word
"hypothetical" in their function description, without
frameshifts or in-frame stop codons, and at least 90bp long. The ICM model was
trained on these genes and then Glimmer3 was run with
the indicated options. The results are compared to the
Glimmer2 predictions, but without those that were:
-
marked as "NearReject", or
-
contained within another prediction and have a lower score
than the containing prediction.
The -g option is the minimum gene
length and the -o option is the maximum overlap allowed.
The -g value was obtained by finding the shortest gene in the training
set. The -o value was chosen as either the maximum overlap between
genes in the training set or 110, whichever was smaller.
Both Glimmer3 and Glimmer2 were run with the same options.
|
"Matches" are predictions that had the same
reading frame and stop codon as an annotated gene in the test set.
"Correct Starts" are predictions that are matches and also
have the same start codon as the matched gene.
"Extra" are predictions that are not matches. Note that
many of these would match genes labelled "hypothetical" that
were excluded from the test set.
|
The columns labelled "vs. Glimmer2.13" are the
Glimmer3 value minus the corresponding Glimmer2.13 value.
For example, an entry of "+2" in the "Matches" column means
that Glimmer3 had 2 more matches than Glimmer2.13 on the
genome for that row.
|
Genome |
Glimmer3 Predictions |
vs. Glimmer2.13 |
|
Organism |
Length |
GC% |
# Genes |
Matches |
Correct Starts |
Extra |
Matches |
Correct Starts |
Extra |
Options |
Archaeoglobus fulgidus |
2.18Mb |
48.6 |
1165 |
1162 |
99.7% |
875 |
75.1% |
1305 |
-2 |
-33 |
-62 |
-g 141 -o 67 |
Bacillus anthracis |
5.23Mb |
35.4 |
3132 |
3129 |
99.9% |
2768 |
88.4% |
2340 |
0 |
+763 |
-72 |
-g 111 -o 30 |
Bacillus subtilis |
4.21Mb |
43.5 |
1576 |
1567 |
99.4% |
1429 |
90.7% |
2879 |
+3 |
+449 |
-515 |
-g 102 -o 93 |
Campylobacter jejuni |
1.78Mb |
30.3 |
1233 |
1233 |
100.0% |
1149 |
93.2% |
668 |
0 |
+124 |
-63 |
-g 111 -o 104 |
Carboxydothermus hydrogenoformans |
2.40Mb |
42.0 |
1753 |
1752 |
99.9% |
1590 |
90.7% |
865 |
0 |
+428 |
-192 |
-g 111 -o 70 |
Caulobacter crescentus |
4.02Mb |
67.2 |
2192 |
2187 |
99.8% |
1552 |
70.8% |
1559 |
+10 |
+73 |
-198 |
-g 123 -o 110 |
Chlorobium tepidum |
2.15Mb |
56.5 |
1292 |
1289 |
99.8% |
949 |
73.5% |
765 |
+2 |
+35 |
-188 |
-g 114 -o 30 |
Clostridium perfringens |
3.03Mb |
28.6 |
1504 |
1503 |
99.9% |
1385 |
92.1% |
1178 |
+1 |
+269 |
-12 |
-g 111 -o 30 |
Colwellia psychrerythraea |
5.37Mb |
38.0 |
3063 |
3060 |
99.9% |
2663 |
86.9% |
1714 |
-1 |
+422 |
-243 |
-g 93 -o 30 |
Dehalococcoides ethenogenes |
1.47Mb |
48.9 |
1069 |
1059 |
99.1% |
929 |
86.9% |
483 |
+2 |
+146 |
-44 |
-g 111 -o 30 |
Escherichia coli |
4.64Mb |
50.8 |
3603 |
3553 |
98.6% |
3150 |
87.4% |
913 |
+15 |
+806 |
-593 |
-g 99 -o 110 |
Geobacter sulfurreducens |
3.81Mb |
60.9 |
2351 |
2340 |
99.5% |
1974 |
84.0% |
1091 |
+13 |
+617 |
-366 |
-g 126 -o 110 |
Haemophilus influenzae |
1.83Mb |
38.1 |
1170 |
1170 |
100.0% |
1054 |
90.1% |
639 |
0 |
+130 |
-109 |
-g 111 -o 101 |
Helicobacter pylori |
1.67Mb |
38.9 |
915 |
914 |
99.9% |
805 |
88.0% |
765 |
+1 |
+57 |
-92 |
-g 111 -o 68 |
Listeria monocytogenes |
2.91Mb |
38.0 |
1966 |
1965 |
99.9% |
1797 |
91.4% |
845 |
0 |
+442 |
-48 |
-g 111 -o 30 |
Methylococcus capsulatus |
3.30Mb |
63.6 |
2015 |
2005 |
99.5% |
1542 |
76.5% |
1231 |
+21 |
+384 |
-317 |
-g 93 -o 110 |
Mycobacterium tuberculosis |
4.40Mb |
65.6 |
2217 |
2205 |
99.5% |
1493 |
67.3% |
2104 |
+3 |
+46 |
-369 |
-g 111 -o 110 |
Neisseria meningitidis |
2.27Mb |
51.5 |
1232 |
1217 |
98.8% |
1042 |
84.6% |
1329 |
+2 |
+258 |
-585 |
-g 93 -o 89 |
Porphyromonas gingivalis |
2.34Mb |
48.3 |
1200 |
1198 |
99.8% |
933 |
77.8% |
887 |
0 |
+84 |
-241 |
-g 114 -o 62 |
Pseudomonas fluorescens |
7.07Mb |
63.3 |
4535 |
4503 |
99.3% |
3577 |
78.9% |
1871 |
+10 |
+881 |
-716 |
-g 108 -o 110 |
Pseudomonas putida |
6.18Mb |
61.5 |
3633 |
3596 |
99.0% |
2825 |
77.8% |
1916 |
+13 |
+491 |
-599 |
-g 114 -o 101 |
Ralstonia solanacearum |
3.72Mb |
67.0 |
2512 |
2487 |
99.0% |
2061 |
82.0% |
1077 |
+57 |
+760 |
-328 |
-g 99 -o 110 |
Staphylococcus epidermidis |
2.62Mb |
32.1 |
1650 |
1649 |
99.9% |
1511 |
91.6% |
771 |
+1 |
+345 |
-40 |
-g 111 -o 75 |
Streptococcus agalactiae |
2.16Mb |
35.6 |
1441 |
1438 |
99.8% |
1336 |
92.7% |
683 |
+3 |
+258 |
-24 |
-g 114 -o 30 |
Streptococcus pneumoniae |
2.16Mb |
39.7 |
1359 |
1355 |
99.7% |
1214 |
89.3% |
780 |
+1 |
+169 |
-52 |
-g 114 -o 47 |
Thermotoga maritima |
1.86Mb |
46.2 |
1092 |
1090 |
99.8% |
892 |
81.7% |
804 |
-1 |
+105 |
-118 |
-g 114 -o 56 |
Treponema denticola |
2.84Mb |
37.9 |
1463 |
1463 |
100.0% |
1332 |
91.0% |
1210 |
+3 |
+286 |
-168 |
-g 111 -o 68 |
Treponema pallidum |
1.14Mb |
52.8 |
575 |
572 |
99.5% |
425 |
73.9% |
557 |
-1 |
+75 |
-257 |
-g 111 -o 110 |
Ureaplasma parvum |
0.75Mb |
25.5 |
327 |
327 |
100.0% |
300 |
91.7% |
293 |
+1 |
+25 |
-17 |
-g 111 -o 30 |
Wolbachia endosymbiont |
1.08Mb |
34.2 |
628 |
627 |
99.8% |
528 |
84.1% |
537 |
0 |
+40 |
-99 |
-g 126 -o 30 |
Averages: |
|
99.6% |
|
84.3% |
|
+5.2 |
+297 |
-224 |
|
|
Notes:
-
Not all the genomes necessarily have carefully/accurately annotated
start sites, so the results for number of correct starts may
be suspect.
-
Since Ureaplasma uses NCBI translation table
4 (TGA is not a stop codon), Glimmer3 must be run with an
appropriate option ("-z 4" or "-Z taa,tag") and Glimmer2 must
be modified and recompiled as described in its readme files
in order to obtain these results.
|