Matrices & Candidates
About ESE Candidates
Most of the ESE candidates are hexamers. These were chosen based on two methods : the RESCUE-ESE algorithm and ELPH.
The candidates are highlited whenever they overlap the three 9mers GAAGAAGAA, CGATCAACG and TGCTGCTGG that we have found as very effective ESE in plants.
- RESCUE-ESE
The ESE Candidates are identified in a similar fashion as described by RESCUE-ESE algorithm
(see http://genes.mit.edu/burgelab/rescue-ese/).
The ESE predictions include all hexamers that have both significantly higher
frequency of occurrence in exons than in introns and significantly higher
frequency of occurence in exons with weak (non-consensus) splice sites
than in exons with strong (consensus) splice sites.
- weak exons: 25% of bottom splice sites scores
- strong exons: 25% of top splice sites scores
Each hexamer was assigned 2 scores:
- Delta(EI) - scaled difference (in SD units) of frequency of occurence in exons vs. introns
- Delta(5WS)/Delta(3WS) - scaled difference (in SD units) of frequency of occurence in 5'/3' end of weak exons vs. strong exons
When applying a statistical significance threshold of 1.5 SD units above the mean for both
scores the curent ESE candidates were identified.
- ELPH
Another set of candidates were found by running ELPH on the data.From all 4096 hexamers we retained
only the ones that are statistically significantly represented in the data.
Many of these candidates overlap the ones determined by the RESCUE-ESE method.
For the Arabidopsis data, we used segmental genome duplication data
to further refine the potential candidates. For all significant hexamers
appearing as motifs in the data, we computed their position in each duplicated
gene pair (based on ELPH score). The hexamer was retained as ESE candidate
if there was significant conservation for the synonymous mutations in the
third position of all codons appearing inside the motif, when compared to
the rest of the data. The results of this analyses can be consulted here.
|