CCB » Software » Glimmer » Table 1

Glimmer3

Table 1:  Train & Test on Non-Hypothetical Genes

This table shows the accuracy of Glimmer3 predictions on 30 microbial genomes from RefSeq at GenBank. For each genome the set of genes is those without the word "hypothetical" in their function description, without frameshifts or in-frame stop codons, and at least 90bp long. The ICM model was trained on these genes and then Glimmer3 was run with the indicated options. The results are compared to the Glimmer2 predictions, but without those that were:
  1. marked as "NearReject", or
  2. contained within another prediction and have a lower score than the containing prediction.
The -g option is the minimum gene length and the -o option is the maximum overlap allowed. The -g value was obtained by finding the shortest gene in the training set. The -o value was chosen as either the maximum overlap between genes in the training set or 110, whichever was smaller. Both Glimmer3 and Glimmer2 were run with the same options.
"Matches" are predictions that had the same reading frame and stop codon as an annotated gene in the test set. "Correct Starts" are predictions that are matches and also have the same start codon as the matched gene. "Extra" are predictions that are not matches. Note that many of these would match genes labelled "hypothetical" that were excluded from the test set.
The columns labelled "vs. Glimmer2.13" are the Glimmer3 value minus the corresponding Glimmer2.13 value. For example, an entry of "+2" in the "Matches" column means that Glimmer3 had 2 more matches than Glimmer2.13 on the genome for that row.

Genome Glimmer3 Predictions vs. Glimmer2.13  
Organism Length GC% # Genes Matches Correct Starts Extra Matches Correct Starts Extra Options
Archaeoglobus fulgidus 2.18Mb 48.6 1165 1162 99.7% 875 75.1% 1305 -2 -33 -62 -g 141 -o 67
Bacillus anthracis 5.23Mb 35.4 3132 3129 99.9% 2768 88.4% 2340 0 +763 -72 -g 111 -o 30
Bacillus subtilis 4.21Mb 43.5 1576 1567 99.4% 1429 90.7% 2879 +3 +449 -515 -g 102 -o 93
Campylobacter jejuni 1.78Mb 30.3 1233 1233 100.0% 1149 93.2% 668 0 +124 -63 -g 111 -o 104
Carboxydothermus hydrogenoformans 2.40Mb 42.0 1753 1752 99.9% 1590 90.7% 865 0 +428 -192 -g 111 -o 70
Caulobacter crescentus 4.02Mb 67.2 2192 2187 99.8% 1552 70.8% 1559 +10 +73 -198 -g 123 -o 110
Chlorobium tepidum 2.15Mb 56.5 1292 1289 99.8% 949 73.5% 765 +2 +35 -188 -g 114 -o 30
Clostridium perfringens 3.03Mb 28.6 1504 1503 99.9% 1385 92.1% 1178 +1 +269 -12 -g 111 -o 30
Colwellia psychrerythraea 5.37Mb 38.0 3063 3060 99.9% 2663 86.9% 1714 -1 +422 -243 -g 93 -o 30
Dehalococcoides ethenogenes 1.47Mb 48.9 1069 1059 99.1% 929 86.9% 483 +2 +146 -44 -g 111 -o 30
Escherichia coli 4.64Mb 50.8 3603 3553 98.6% 3150 87.4% 913 +15 +806 -593 -g 99 -o 110
Geobacter sulfurreducens 3.81Mb 60.9 2351 2340 99.5% 1974 84.0% 1091 +13 +617 -366 -g 126 -o 110
Haemophilus influenzae 1.83Mb 38.1 1170 1170 100.0% 1054 90.1% 639 0 +130 -109 -g 111 -o 101
Helicobacter pylori 1.67Mb 38.9 915 914 99.9% 805 88.0% 765 +1 +57 -92 -g 111 -o 68
Listeria monocytogenes 2.91Mb 38.0 1966 1965 99.9% 1797 91.4% 845 0 +442 -48 -g 111 -o 30
Methylococcus capsulatus 3.30Mb 63.6 2015 2005 99.5% 1542 76.5% 1231 +21 +384 -317 -g 93 -o 110
Mycobacterium tuberculosis 4.40Mb 65.6 2217 2205 99.5% 1493 67.3% 2104 +3 +46 -369 -g 111 -o 110
Neisseria meningitidis 2.27Mb 51.5 1232 1217 98.8% 1042 84.6% 1329 +2 +258 -585 -g 93 -o 89
Porphyromonas gingivalis 2.34Mb 48.3 1200 1198 99.8% 933 77.8% 887 0 +84 -241 -g 114 -o 62
Pseudomonas fluorescens 7.07Mb 63.3 4535 4503 99.3% 3577 78.9% 1871 +10 +881 -716 -g 108 -o 110
Pseudomonas putida 6.18Mb 61.5 3633 3596 99.0% 2825 77.8% 1916 +13 +491 -599 -g 114 -o 101
Ralstonia solanacearum 3.72Mb 67.0 2512 2487 99.0% 2061 82.0% 1077 +57 +760 -328 -g 99 -o 110
Staphylococcus epidermidis 2.62Mb 32.1 1650 1649 99.9% 1511 91.6% 771 +1 +345 -40 -g 111 -o 75
Streptococcus agalactiae 2.16Mb 35.6 1441 1438 99.8% 1336 92.7% 683 +3 +258 -24 -g 114 -o 30
Streptococcus pneumoniae 2.16Mb 39.7 1359 1355 99.7% 1214 89.3% 780 +1 +169 -52 -g 114 -o 47
Thermotoga maritima 1.86Mb 46.2 1092 1090 99.8% 892 81.7% 804 -1 +105 -118 -g 114 -o 56
Treponema denticola 2.84Mb 37.9 1463 1463 100.0% 1332 91.0% 1210 +3 +286 -168 -g 111 -o 68
Treponema pallidum 1.14Mb 52.8 575 572 99.5% 425 73.9% 557 -1 +75 -257 -g 111 -o 110
Ureaplasma parvum 0.75Mb 25.5 327 327 100.0% 300 91.7% 293 +1 +25 -17 -g 111 -o 30
Wolbachia endosymbiont 1.08Mb 34.2 628 627 99.8% 528 84.1% 537 0 +40 -99 -g 126 -o 30
Averages:    99.6%   84.3%   +5.2 +297 -224  
Notes:
  • Not all the genomes necessarily have carefully/accurately annotated start sites, so the results for number of correct starts may be suspect.
  • Since Ureaplasma uses NCBI translation table 4 (TGA is not a stop codon), Glimmer3 must be run with an appropriate option ("-z 4" or "-Z taa,tag") and Glimmer2 must be modified and recompiled as described in its readme files in order to obtain these results.