News and updates
|New releases and related tools will be announced through the Bowtie mailing list.|
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
Kim D and Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biology 2011, 12:R72
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. . Genome Biology 2013, 14:R36
- Install quick-start
- Test the installation
- Preparing your reference
- Preparing your reads
- Running TopHat
- Examining your results
- bowtie2 (or bowtie)
- bowtie2-build (or bowtie-build)
- bowtie2-inspect (or bowtie-inspect)
Installing a pre-compiled binary release
In order to make it easy to install TopHat we provide a few binary packages to save users from
the occasionally frustrating process of building TopHat themselves, which requires a certain development environment and
the Boost libraries installed. To use the binary packages, simply download the appropriate
one for your platform, unpack it, and make sure the TopHat binaries are in a directory in your PATH environment variable
(or create a symbolic link to the included tophat2 script somewhere in your PATH, see below)
Note: if you want to be able to install and run this new version without overwriting a previous TopHat version already installed on your system, make sure you unpack the new version into a different directory from the old version, then instead of copying the new programs in a directory in your PATH just create a symbolic link from the tophat2 wrapper script in this new directory to a directory in your shell's PATH. For example, assuming the ~/bin directory is in your PATH and you unpack tophat-2.0.0.Linux_x86_64.tar.gz under your home directory:
cdNow you can start the new version of TopHat with the tophat2 command, while the previous version, if present, can still be launched with the regular "tophat" command (assuming this is how you used it before).
tar xvfz tophat-2.0.0.Linux_x86_64.tar.gz
ln -s ~/tophat-2.0.0.Linux_x86_64/tophat2 .
Building TopHat from source
In order to build TopHat2 you must have the following installed on your system:
- the Boost C++ libraries
(we recommend version 1.47 or higher so you can use it for building Cufflinks as well)
Note: Starting with version 2.0.13 TopHat no longer requires the user to have the SAMtools library (libbam.a), headers or even the samtools program installed, as TopHat2 comes pre-packaged with a stable version of SAMtools which is known to work well with TopHat. Also, TopHat uses the SeqAn library that comes with the TopHat2 source code distribution. It is not necessary for the SeqAn library to be installed separately in order to run TopHat.
- Download a recent Boost source tarball, unpack it and cd to the newly unpacked Boost source directory.
- Prepare the build:
- Build Boost. Note that you can specify where to install Boost with the --prefix
option, which specifies the base path (prefix) where sub-directories ./include/ and ./lib/ will host the Boost
headers and library files respectively. The default Boost installation directory prefix is /usr/local.
Take note of this installation directory (if you specify your own) because you will need to provide it to the
--with-boost option of TopHat's ./configure script. Run the build and install command:
./bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> link=static \
runtime-link=static stage install
- Unpack the TopHat source tarball:
tar zxvf tophat-2.0.0.tar.gz
- Change to the TopHat directory:
- Configure TopHat using the ./configure script. If Boost libraries were installed
somewhere other than under/usr/local, you will need to tell the installer where to find Boost
using the --with-boost option, specifying the base (prefix) install directory for the Boost library as discussed
in the Boost installation section above.
The --prefix option specifies where TopHat programs will be installed, and it should point to a base directory path that
will end up having a ./bin/ subdirectory containing the new TopHat programs.
A common prefix choice can be, again, /usr/local, which will cause the final tophat program and binaries to be installed in
Note: if you want to preserve (i.e. not overwrite) a previous TopHat installation and be able to use both the new version and the old version, you can specify a different prefix directory which doesn't even have to be in your shell's PATH (e.g. --prefix=/home/username/tophat_new/, which will install the new TopHat programs in /home/username/tophat_new/bin/). After installing the new TopHat, the wrapper script <tophat_prefix>/bin/tophat2 can be copied somewhere in your shell's PATH and can be used to launch this new version of TopHat (assuming the older version of TopHat is launched using the regular tophat script installed in a different directory). The tophat2 wrapper makes sure that the local execution PATH gives priority to the new binaries and it should not interfere with the TopHat programs from the previous installation.
./configure --prefix=/path/to/tophat_base_dir \
- Finally, make and install TopHat.
This will install tophat and its modules into /path/to/tophat_base_dir/bin directory.
You may want to add that directory to your shell's PATH if it's not there already. Alternatively,
as mentioned above, you can just copy the tophat2 wrapper script from this /path/to/tophat_base_dir/bin directory
somewhere in a directory which is in your shell's PATH (this is especially useful if an older tophat
script from a previous TopHat installation is already found in your shell's PATH).
If you prefer this alternative please make sure you use the tophat2 command instead of tophat
for the rest of this tutorial.
Testing the installation
After you installed Bowtie/Bowtie2 and TopHat programs, you could test the pipeline on a simple test data set, which you can download here. This data is not meant to exhaustively test all the features of TopHat. It's just to verify that the installation worked. Unzip the data, change to the test_data directory and then run tophat:
tar zxvf test_data.tar.gz
tophat -r 20 test_ref reads_1.fq reads_2.fq
If TopHat ran successfully, you should see some lines of output, like this:
[Mon May 4 11:07:23 2009] Beginning TopHat run (v1.1.1) ----------------------------------------------- [Mon May 4 11:07:23 2009] Preparing output location ./tophat_out/ [Mon May 4 11:07:23 2009] Checking for Bowtie index files [Mon May 4 11:07:23 2009] Checking for reference FASTA file [Mon May 4 11:07:23 2009] Checking for Bowtie Bowtie version: 0.9.9.1 [Mon May 4 11:07:23 2009] Checking reads seed length: 75bp format: fastq quality scale: phred Splitting reads into 3 segments [Mon May 4 11:07:23 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie Splitting reads into 3 segments [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Searching for junctions via coverage islands [Mon May 4 11:07:24 2009] Searching for junctions via mate-pair closures [Mon May 4 11:07:24 2009] Retrieving sequences for splices [Mon May 4 11:07:24 2009] Indexing splices [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Joining segment hits [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Joining segment hits [Mon May 4 11:07:24 2009] Reporting output tracks ----------------------------------------------- Run complete [00:00:00 elapsed]
In the directory tophat_out should be a file junctions.bed. This file should contain a pair of junctions, on the reference sequence "test_chromosome".
Preparing your reference
To find junctions with TopHat, you'll first need to install a Bowtie index for the organism in your RNA-Seq experiment. The Bowtie site provides pre-built indices for human, mouse, fruit fly, and others. If there's no index for your organism, it's easy to build one yourself. If you have Bowtie 2 installed and want to use it with Tophat v2.0 or later, you must create Bowtie 2 indexes for your data (using bowtie2-build).
TopHat also requires a fasta file (.fa) for your reference. If this file is not found alongside the other index files, the program will use the Bowtie index you give it to build this file and save it to the output directory. This step can take up to an hour for a human-sized genome. To skip this step in future runs, you can move the fasta file from the tophat_out directory to the directory containing the Bowtie index files.
Preparing your reads
TopHat currently accepts reads in FASTA or FASTQ format, though FASTQ is recommended. You may need to convert your reads from another format to one of these. Maq's fq_all2std.pl converts many formats into FASTQ. For reads in SRF format, we recommend using the tools bundled with the Staden io_lib package.
Note: TopHat does not support mixing FASTA and FASTQ reads in the same input file, so don't run TopHat on FASTQ and FASTA files in the same run.
TopHat will map your reads first by running Bowtie to identify places where reads map end to end. Since your reads came from spliced transcripts in an RNA-Seq experiment, Bowtie will identify "islands" in your reference genomewhere reads piled up. Many of these islands will be exons.
TopHat will then run a program to find splice junctions using the reads that did not get mapped to an island. So to identify junctions, you do not need to run Bowtie yourself, as TopHat will do it for you.
TopHat needs you specify a path to the index files and an input file containing your reads. The first argument should be the full path to the directory containing the index plus the prefix of the index files. To start the TopHat pipeline, enter the command:
tophat /path/to/h_sapiens reads1.fq,reads2.fq,reads3.fq
Be sure to check out the TopHat manual, as the pipeline has a few options you might want to use to get better results or get them more quickly.
Examining your output
TopHat produces several files of output. Because TopHat reports output in widely adopted formats, you can import it directly into a number of genome browsers and data viewers, including IGV, IGB, and the UCSC genome browser. TopHat can be run on free servers through Galaxy, which also provides a web-based genome/track browser for mapped reads produced from TopHat.