TopHat-Fusion is an enhanced version of TopHat with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome.

Open Source Software

Manual


Beginning users should take a look at the Getting started guide for a tutorial on running TopHat-Fusion.


How to use fusion mapping


After running tophat, you can run tophat-fusion-post to filter out fusion candidates.


Usage: tophat-fusion-post [options]* <index_base>
Options:
-v/--version Prints the help message and exits.
-p/--num-threads The number of threads used. The default is 1.
--num-fusion-reads Fusions with at least this many supporting reads will be reported. The default is 3.
--num-fusion-pairs Fusions with at least this many supporting pairs will be reported. The default is 2.
--num-fusion-both The sum of supporting reads and pairs is at least this number for a fusion to be reported. The default is 0.
--max-num-fusions The maximum number of reported fusions. The default is 500.
--fusion-read-mismatches Reads support fusions if they map across fusion with at most this many mismatches. The default is 2.
--fusion-multireads Reads that map to more than this many places will be ignored. The default is 2.
--non-human If your annotation is different from that of human, use the option.
--fusion-pair-dist Pairs with this specified distance are counted as supporting evidence for the fusion. The default threshold for the inner distance is 250.

Additional Output


In addition to those files ouput by TopHat, the tophat (with --fusion-search) and tophat-fusion-post scripts produce a number of files in the directories in which they were invoked. Most of these files are internal, intermediate files that are generated for use within the pipeline. The output files you will likely want to look at are:


tophat (with --fusion-search option)

  • accepted_hits.bam. A list of read alignments in SAM format. SAM is a compact short read alignment format that is increasingly being adopted. The formal specification is here.

    We assume that a fusion alignment involves two chromosomes or different places on the same chromosome (distant or inversion). So, a fusion alignment is reported in two partial alignments (one before the fusion point and the other after) as follows:

    (1) SRR064286.11667197 chr12 2910488255 16M chr20 625050470 ATATGTTTGAGAGGCT BCBCCBCCCBBBBBBB XF:Z:1 chr12-chr20 2910488 16M62500758F34M
    (2) SRR064286.11667197 chr20 62500758255 34M = 625050470 GGCTGAGGAGGAGGAGCTCAGGGCTGAGCTTACC BB?B@;@B?AA?A?;?:@>>@@B>>;;B>?=9## XF:Z:2 chr12-chr20 2910488 16M62500758F34M

    Some of fields are skipped for the sake of illustration, as shown above, 16bp of 50bp read is mapped on chr12 and 34bp is mapped on chr20. The whole fusion alignment is available in a custom field XF in both partial alignments above. XF:Z:1 and XF:Z:2 indicates the first and the second partial alignments, respectively.


    A new CIGAR operator 'F' (Fusion) is introduced to indicate the presence of a fusion in read alignments. The following example shows a read alignment across a fusion between chromosome 20 and chromosome 12.

    XF:Z:1 chr12-chr20 2910488 16M62500758F34M

    The left most base of a read (SRR064286.11667197) is mapped to 2910488th base of chromosome 12 from which the first 16 bases of a read are mapped from 2910488 to 2910503 on chromosome 12, the 17th base of the read is mapped to 62500758th base of chromosome 20, and the remaining 34 bases are mapped from 62500758 to 62500791 on chromosome 20. The precise fusion point is between 2910503 on chromsome 12 and 62500758 on chromosome 20. Note that small letters such as m, n, i, d have opposite interpretations of big letters, meaning a coordinate is decreasing instead of increasing.


    In case the other end spans a fusion point, a custom field XP used as follows:

    SRR064286.73586371 chr11 44890 50M XP:Z:chr1-chr10 14482

    The read (SRR064286.73586371) is mapped on chr11 at 44890th position while its mate partner spans a fusion point between chr1 and chr10.


  • fusions.out. A list of fusions tophat (with --fusion-search) finds before running tophat-fusion-post, where each fusion is supported by at least one read alignment. Each row represents a fusion with the detailed description given in the below. A row can be split into seven rows using @ as a line separator.

    chr20-chr17 49411707 59445685 ff 106 116 167 0 37 36 0.569598 @
    11 25 38 49 63 @
    CAGCGGGGCGCGCGAGCTCGCGCTCTTCCTGACCCCCGAGCCTGGGGCCG AGGTAGGGGACGGGGCTGTGGAGTTGGAGGAGAGGGTTCTCGCGGTTAGG @
    CCTGCTCCCTGAAGGTGTGGACTCAACGTCAGATGTCCCGTGTGTGCCAC AGGTACCTTTGACAGGAGCGTGACCCTGCTGGAGGTGTGCGGGAGCTGGC @
    106 106 106 106 106 106 106 106 106 106 106 106 106 106 98 90 84 79 79 78 74 68 65 64 63 63 59 59 56 55 52 51 20 15 12 10 8 0 0 0 0 0 0 0 0 0 0 0 0 0 @
    106 106 106 106 106 106 106 106 106 106 106 106 106 98 96 94 91 86 55 54 51 50 47 47 43 43 42 41 38 32 28 27 27 22 16 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 @
    -6:1 11:0 16:-3 18:1 14:6 14:6 15:7 23:0 5:21 31:5 18:19 36:-1 ...


    This is a fusion between 49411707th base on chromosome 20 and 59445685th base on chromosome 17.
    ff is the orientations of the two chromosomes - both chromosomes are in forwarding direction, like (chr 20) -----> -----> (chr 17).
    106 is the number of reads that span the fusion, 116 is the number of mate pairs that support the fusion, 167 is the number of mate pairs that support the fusion and whose one end spans the fusion.
    0 is the number of reads that contradict the fusion by mapping to only one of the chromosomes 20 and 17.
    37 and 36 are the number of bases on the left and right sides of a fusion, respectively, covered by spanning reads.
    The second row is likely to be dropped in the next version.
    The third row shows two 50-bp contigs around a fusion point on chromosome 20. The fourth row is similarly defined for chromosome 17.
    The fifth row is depth coverage by spanning reads on chromosome 20 from 49411707th base to 49411658th base. The sixth row is depth coverage by spanning reads on chromosome 17 from 59445685th base to 59445734th base.
    The seventh row is distances (distance1:distance2) between a mate pair and the fusion, i.e., distance1 bewtween the left end of a pair and the left side of the fusion and distance2 between the right end of a pair and the right side of the fusion.

    A useful linux command 'awk' can be used for filtering fusions as follows:
    awk '{if($5 > 100) print}' fusions.out | sed 's/@\t/\n/g'
    Here, $5 is the number of spanning reads, this command shows the fusions supported by at least 100 spanning reads.


tophat-fusion-post

  • result.html. A list of fusion candidates is given in HTML format. A sample list is found here.
  • result.txt. A text version of result.html.