CCB » Software » Glimmer » Table 2

Glimmer3

Table 2:  Train on Non-Hypothetical Genes & Test on All Annotated Genes

This table shows the accuracy of Glimmer3 predictions on 30 microbial genomes from RefSeq at GenBank. For each genome the ICM model was trained on the set of annotated genes without the word "hypothetical" in their function description, without frameshifts or in-frame stop codons, and at least 90bp long. Both Glimmer3 and Glimmer2 were run with the same indicated options and the results compared to a test set consisting of all annotated genes without frameshifts or in-frame stops and at least 90bp long. The results are compared to the Glimmer2 predictions, but without those that were:
  1. marked as "NearReject", or
  2. contained within another prediction and have a lower score than the containing prediction.
The -g option is the minimum gene length and the -o option is the maximum overlap allowed. The -g value was obtained by finding the shortest gene in the training set. The -o value was chosen as either the maximum overlap between genes in the training set or 110, whichever was smaller. Both Glimmer3 and Glimmer2 were run with the same options.
"Matches" are predictions that had the same reading frame and stop codon as an annotated gene in the test set. "Correct Starts" are predictions that are matches and also have the same start codon as the matched gene. "Extra" are predictions that are not matches.
It is important to note that the test set here includes genes labelled as "hypothetical" whose validity may be suspect. In fact, the annotation for some of these genomes was produced using Glimmer2, which will artificially inflate the number of matches for that program.
Genome Glimmer3 Predictions vs. Glimmer2.13  
Organism Length GC% # Genes Matches Correct Starts Extra Matches Correct Starts Extra Options
Archaeoglobus fulgidus 2.18Mb 48.6 2398 2323 96.9% 1670 69.6% 144 0 -118 -64 -g 141 -o 67
Bacillus anthracis 5.23Mb 35.4 5308 5075 95.6% 4355 82.0% 394 +29 +1114 -101 -g 111 -o 30
Bacillus subtilis 4.21Mb 43.5 4095 3997 97.6% 3438 84.0% 449 +25 +1045 -537 -g 102 -o 93
Campylobacter jejuni 1.78Mb 30.3 1836 1786 97.3% 1634 89.0% 115 -2 +162 -61 -g 111 -o 104
Carboxydothermus hydrogenoformans 2.40Mb 42.0 2606 2458 94.3% 2178 83.6% 159 -32 +462 -160 -g 111 -o 70
Caulobacter crescentus 4.02Mb 67.2 3737 3500 93.7% 2254 60.3% 246 +45 -287 -233 -g 123 -o 110
Chlorobium tepidum 2.15Mb 56.5 2252 1909 84.8% 1306 58.0% 145 -23 -121 -163 -g 114 -o 30
Clostridium perfringens 3.03Mb 28.6 2660 2629 98.8% 2387 89.7% 52 +5 +480 -16 -g 111 -o 30
Colwellia psychrerythraea 5.37Mb 38.0 4902 4562 93.1% 3828 78.1% 212 -79 +357 -165 -g 93 -o 30
Dehalococcoides ethenogenes 1.47Mb 48.9 1579 1485 94.0% 1255 79.5% 57 +1 +161 -43 -g 111 -o 30
Escherichia coli 4.64Mb 50.8 4231 4126 97.5% 3639 86.0% 340 +22 +900 -600 -g 99 -o 110
Geobacter sulfurreducens 3.81Mb 60.9 3438 3261 94.9% 2673 77.7% 170 +22 +789 -375 -g 126 -o 110
Haemophilus influenzae 1.83Mb 38.1 1649 1636 99.2% 1445 87.6% 173 -1 +153 -108 -g 111 -o 101
Helicobacter pylori 1.67Mb 38.9 1556 1521 97.8% 1276 82.0% 158 -1 +68 -90 -g 111 -o 68
Listeria monocytogenes 2.91Mb 38.0 2819 2750 97.6% 2476 87.8% 60 -3 +591 -45 -g 111 -o 30
Methylococcus capsulatus 3.30Mb 63.6 2958 2831 95.7% 2066 69.8% 405 +51 +477 -347 -g 93 -o 110
Mycobacterium tuberculosis 4.40Mb 65.6 4189 3884 92.7% 2355 56.2% 425 -1 -293 -365 -g 111 -o 110
Neisseria meningitidis 2.27Mb 51.5 2055 1842 89.6% 1532 74.5% 704 +13 +323 -596 -g 93 -o 89
Porphyromonas gingivalis 2.34Mb 48.3 1909 1783 93.4% 1281 67.1% 302 -20 -76 -221 -g 114 -o 62
Pseudomonas fluorescens 7.07Mb 63.3 6134 6013 98.0% 4592 74.9% 361 +26 +951 -732 -g 108 -o 110
Pseudomonas putida 6.18Mb 61.5 5349 5118 95.7% 3797 71.0% 394 +28 +307 -614 -g 114 -o 101
Ralstonia solanacearum 3.72Mb 67.0 3435 3319 96.6% 2665 77.6% 245 +127 +963 -398 -g 99 -o 110
Staphylococcus epidermidis 2.62Mb 32.1 2487 2333 93.8% 2110 84.8% 87 -7 +461 -32 -g 111 -o 75
Streptococcus agalactiae 2.16Mb 35.6 2122 2045 96.4% 1863 87.8% 76 +8 +336 -29 -g 114 -o 30
Streptococcus pneumoniae 2.16Mb 39.7 2093 1918 91.6% 1637 78.2% 217 -20 +91 -31 -g 114 -o 47
Thermotoga maritima 1.86Mb 46.2 1854 1813 97.8% 1426 76.9% 81 -4 +127 -115 -g 114 -o 56
Treponema denticola 2.84Mb 37.9 2761 2584 93.6% 2246 81.3% 89 -32 +392 -133 -g 111 -o 68
Treponema pallidum 1.14Mb 52.8 1034 986 95.4% 651 63.0% 143 -15 +11 -243 -g 111 -o 110
Ureaplasma parvum 0.75Mb 25.5 614 607 98.9% 524 85.3% 13 0 +14 -16 -g 111 -o 30
Wolbachia endosymbiont 1.08Mb 34.2 805 790 98.1% 650 80.7% 374 -4 +32 -95 -g 126 -o 30
Averages:    95.3%   77.5%   +5.3 +329 -224  
Notes:
  • Not all the genomes necessarily have carefully/accurately annotated start sites, so the results for number of correct starts may be suspect.
  • Since Ureaplasma uses NCBI translation table 4 (TGA is not a stop codon), Glimmer3 must be run with an appropriate option ("-z 4" or "-Z taa,tag") and Glimmer2 must be modified and recompiled as described in its readme files in order to obtain these results.