Glimmer3
|
Table 4: Train on long-orfs Output & Test on
All Annotated Genes
|
This table shows the accuracy of Glimmer3 predictions on
30 microbial genomes from
RefSeq at GenBank.
The ICM training data for the Glimmer3 runs was obtained by:
- running the Glimmer3 version of the long-orfs program
(with -t 1.15 option) to find an initial set of orfs
- building an ICM for those orfs and running Glimmer3
- using these initial Glimmer3 predictions to determine start
codon frequency and find a ribosome-binding site model.
The Glimmer2 training data was created by running the Glimmer2
version of the long-orfs program.
|
Both Glimmer3 and Glimmer2 were run with
the same indicated options and the results compared to a test
set consisting of all annotated genes without frameshifts or
in-frame stops and at least 90bp long. Note that many genes
in this test set are hypothetical, which often means the only evidence
for them is that they were predicted by a gene-finding program
(which in many cases was Glimmer2). It comes as no surprise, therefore,
that for many genomes Glimmer3 has fewer matches than Glimmer2.
|
Glimmer2 predictions were filtered to remove those:
-
marked as "NearReject", or
-
contained within another prediction and having a lower score
than the containing prediction.
The -g option is the minimum gene
length and the -o option is the maximum overlap allowed.
The -g value was obtained by finding the shortest gene in the test
set. The -o value was chosen as either the maximum overlap between
genes in the test set or 110, whichever was smaller.
|
"Matches" are predictions that had the same
reading frame and stop codon as an annotated gene in the test set.
"Correct Starts" are predictions that are matches and also
have the same start codon as the matched gene.
"Extra" are predictions that are not matches.
|
The columns labelled "vs. Glimmer2.13" are the
Glimmer3 value minus the corresponding Glimmer2.13 value.
For example, an entry of "+2" in the "Matches" column means
that Glimmer3 had 2 more matches than Glimmer2.13 on the
genome for that row.
|
Genome |
Glimmer3 Predictions |
vs. Glimmer2.13 |
|
Organism |
Length |
GC% |
# Genes |
Matches |
Correct Starts |
Extra |
Matches |
Correct Starts |
Extra |
Options |
Archaeoglobus fulgidus |
2.18Mb |
48.6 |
2398 |
2337 |
97.5% |
1715 |
71.5% |
156 |
+3 |
-85 |
-69 |
-g 141 -o 67 |
Bacillus anthracis |
5.23Mb |
35.4 |
5308 |
5110 |
96.3% |
4379 |
82.5% |
434 |
+7 |
+1097 |
-152 |
-g 111 -o 30 |
Bacillus subtilis |
4.21Mb |
43.5 |
4095 |
4023 |
98.2% |
3438 |
84.0% |
559 |
+11 |
+1030 |
-732 |
-g 102 -o 93 |
Campylobacter jejuni |
1.78Mb |
30.3 |
1836 |
1790 |
97.5% |
1649 |
89.8% |
122 |
-1 |
+176 |
-68 |
-g 111 -o 104 |
Carboxydothermus hydrogenoformans |
2.40Mb |
42.0 |
2606 |
2466 |
94.6% |
2179 |
83.6% |
165 |
-20 |
+478 |
-191 |
-g 111 -o 70 |
Caulobacter crescentus |
4.02Mb |
67.2 |
3737 |
3514 |
94.0% |
2299 |
61.5% |
255 |
-62 |
-321 |
-800 |
-g 123 -o 110 |
Chlorobium tepidum |
2.15Mb |
56.5 |
2252 |
1964 |
87.2% |
1357 |
60.3% |
160 |
-76 |
-174 |
-321 |
-g 114 -o 30 |
Clostridium perfringens |
3.03Mb |
28.6 |
2660 |
2638 |
99.2% |
2408 |
90.5% |
55 |
+5 |
+493 |
-26 |
-g 111 -o 30 |
Colwellia psychrerythraea |
5.37Mb |
38.0 |
4902 |
4572 |
93.3% |
3818 |
77.9% |
241 |
-96 |
+334 |
-198 |
-g 93 -o 30 |
Dehalococcoides ethenogenes |
1.47Mb |
48.9 |
1579 |
1484 |
94.0% |
1237 |
78.3% |
86 |
-4 |
+137 |
-50 |
-g 111 -o 30 |
Escherichia coli |
4.64Mb |
50.8 |
4231 |
4124 |
97.5% |
3616 |
85.5% |
412 |
+16 |
+881 |
-848 |
-g 99 -o 110 |
Geobacter sulfurreducens |
3.81Mb |
60.9 |
3438 |
3284 |
95.5% |
2673 |
77.7% |
218 |
-22 |
+750 |
-705 |
-g 126 -o 110 |
Haemophilus influenzae |
1.83Mb |
38.1 |
1649 |
1639 |
99.4% |
1439 |
87.3% |
187 |
0 |
+148 |
-126 |
-g 111 -o 101 |
Helicobacter pylori |
1.67Mb |
38.9 |
1556 |
1520 |
97.7% |
1284 |
82.5% |
178 |
0 |
+87 |
-101 |
-g 111 -o 68 |
Listeria monocytogenes |
2.91Mb |
38.0 |
2819 |
2752 |
97.6% |
2480 |
88.0% |
80 |
-7 |
+597 |
-46 |
-g 111 -o 30 |
Methylococcus capsulatus |
3.30Mb |
63.6 |
2958 |
2845 |
96.2% |
2090 |
70.7% |
464 |
-24 |
+448 |
-894 |
-g 93 -o 110 |
Mycobacterium tuberculosis |
4.40Mb |
65.6 |
4189 |
3901 |
93.1% |
2435 |
58.1% |
476 |
-81 |
-267 |
-640 |
-g 111 -o 110 |
Neisseria meningitidis |
2.27Mb |
51.5 |
2055 |
1891 |
92.0% |
1493 |
72.7% |
901 |
+10 |
+253 |
-860 |
-g 93 -o 89 |
Porphyromonas gingivalis |
2.34Mb |
48.3 |
1909 |
1792 |
93.9% |
1278 |
66.9% |
399 |
-23 |
-79 |
-336 |
-g 114 -o 62 |
Pseudomonas fluorescens |
7.07Mb |
63.3 |
6134 |
6044 |
98.5% |
4677 |
76.2% |
419 |
+11 |
+987 |
-2335 |
-g 108 -o 110 |
Pseudomonas putida |
6.18Mb |
61.5 |
5349 |
5165 |
96.6% |
3881 |
72.6% |
461 |
-71 |
+297 |
-1532 |
-g 114 -o 101 |
Ralstonia solanacearum |
3.72Mb |
67.0 |
3435 |
3346 |
97.4% |
2700 |
78.6% |
322 |
+452 |
+1378 |
-2295 |
-g 99 -o 110 |
Staphylococcus epidermidis |
2.62Mb |
32.1 |
2487 |
2343 |
94.2% |
2123 |
85.4% |
94 |
+4 |
+474 |
-28 |
-g 111 -o 75 |
Streptococcus agalactiae |
2.16Mb |
35.6 |
2122 |
2052 |
96.7% |
1872 |
88.2% |
90 |
+5 |
+337 |
-23 |
-g 114 -o 30 |
Streptococcus pneumoniae |
2.16Mb |
39.7 |
2093 |
1929 |
92.2% |
1654 |
79.0% |
267 |
-18 |
+98 |
-34 |
-g 114 -o 47 |
Thermotoga maritima |
1.86Mb |
46.2 |
1854 |
1816 |
98.0% |
1427 |
77.0% |
90 |
-3 |
+137 |
-119 |
-g 114 -o 56 |
Treponema denticola |
2.84Mb |
37.9 |
2761 |
2594 |
94.0% |
2249 |
81.5% |
96 |
-41 |
+380 |
-165 |
-g 111 -o 68 |
Treponema pallidum |
1.14Mb |
52.8 |
1034 |
990 |
95.7% |
619 |
59.9% |
144 |
-11 |
-21 |
-272 |
-g 111 -o 110 |
Ureaplasma parvum |
0.75Mb |
25.5 |
614 |
605 |
98.5% |
526 |
85.7% |
16 |
-5 |
+13 |
-7 |
-g 111 -o 30 |
Wolbachia endosymbiont |
1.08Mb |
34.2 |
805 |
790 |
98.1% |
642 |
79.8% |
391 |
-4 |
+28 |
-82 |
-g 126 -o 30 |
Averages: |
|
95.8% |
|
77.8% |
|
-1.5 |
+336 |
-468 |
|
|
Notes:
-
Not all the genomes necessarily have carefully/accurately annotated
start sites, so the results for number of correct starts may
be suspect.
-
Since Ureaplasma uses NCBI translation table
4 (TGA is not a stop codon), Glimmer3 must be run with an
appropriate option ("-z 4" or "-Z taa,tag") and Glimmer2 must
be modified and recompiled as described in its readme files
in order to obtain these results.
|