Overview
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only alignments of short reads that can also be used by other transcript assemblers, but also alignments of longer sequences that have been assembled from those reads. In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).
News
- 12/3/2021 - v2.2.0 release hilights:
-
--mix
option allows StringTie to take both short and long read alignments; when this option is used, the 2nd BAM (or CRAM) file in the command line must be a long reads alignment file (the 1st being the short-reads alignment file); -L option should not be used in this case. - improved transcriptome assembly on mixed data with annotation
- added support for CRAM input files, as StringTie is now built using HTSlib
-
- 3/9/2021 - v2.1.5 release changes:
- adjusted the calculation of coverage for long reads assembly
- new trimming procedure implemented
- 7/7/2020 - v2.1.4 release is a maintenance release with the following fixes:
- fixed an issue with
--merge
sometimes incorrectly processing the order of the reference sequences - fixed
-e
issue sometimes entering an infinite loop - fixed compatibility issue between long transfrag and path assembled so far
- small bug fix in computing long read assemblies
- fixed an issue with
- 5/12/2020 - v2.1.3 release new features and fixes
- added the
--viral
option for long reads from viral data where splice sites do not follow consensus - adjustments to the assembly of long read alignment data
- fixed an occasional issue with the
--merge
option - made the
-e
compatible with long reads (-L
option)
- added the
- 4/21/2020 - v2.1.2 release new features
to improve transcriptome assembly:
- introduced an option to load point features from a tab-delimited text
- more sensitive assembly of transcripts from long read data
- 1/29/2020 - v2.1.1 release is a maintenance release.
- fixed an issue found in the trimming procedure
- 1/25/2020 - v2.1.0 release brings improvements, corrections and new features.
- more accurate assembly of long RNA reads from complex genomes with many overlapping genes
- improved start/end trimming for long read alignments
- new
-R
option for long read alignments (-L
alternative) to output unassembled, cleaned, and non-redundant long read alignments (collapsing similar long reads alignments at the same location). - (experimental) verify consensus splice sites if genomic sequence is provided with -g
- 12/16/2019 - v2.0.6 release is a maintenance StringTie release fixing a regression bug (occasional crash/instability) affecting the previous release.
- 10/29/2019 - v2.0.4 release is a maintenance StringTie release:
- fixed an occasional instability problem occurring in some situations when long read alignments (-L option) were used in conjuction with guides (reference annotation, -G option).
- fixed a problem with -e option sometimes adding newly assembled transcripts to the output, that were not present in the -G reference transcripts file; this caused the prepDE.py script to fail in those cases.
- added a few more input checks in prepDE.py to verify if the GTF files being processed were generated as expected.
- 7/30/2019 - v2.0 release is a major StringTie release adding new features and improvements.
- added support for long read alignments (enabled with -L option)
- added a new "super-reads" module (found in the SuperReads_RNA directory of the source distribution) which can be used to perform de-novo assembly and alignment of RNA-Seq reads preparing them for assembly with StringTie
- overall improved handling of read alignments and their transcription strand assignment
- 5/2/2019 - v1.3.6 release is a maintenance build addressing a few issues found since the previous release:
- fixing a GFF/GTF sorting issue causing occasional errors when the --merge option was used
- addressing a float precision problem causing negative coverage/TPM/FPKM values in some cases
- various GFF/GTF parsing adjustments improving support for some reference annotation sources
- See entire release history here.
Back to top
Obtaining and installing StringTie
The current version of StringTie can be downloaded as precompiled binary or as a source package:
- stringtie-2.2.3.tar.gz : source package
- stringtie-2.2.3.Linux_x86_64.tar.gz : Linux x86_64 binary package
- stringtie-2.2.3.OSX_x86_64.tar.gz : Apple OS X binary package (for OS X v10.7 and above)
In order to build and install StringTie from the source package the following steps can be taken:
- Unpack the downloaded StringTie source archive in a directory of your choice, e.g.:
cd ~/src/ tar xvfz ~/Downloads/stringtie-VER.tar.gz
A directory called stringtie-VER (where VER is the current numeric version of the program) will be created in the current directory. -
Change directory and build the stringtie executable:
cd stringtie-VER make release
Alternatively, the source tree can be downloaded from GitHub and built in a similar fashion:
- Optionally, the stringtie executable can be copied to one of the shell's PATH directories for easy access.
git clone https://github.com/gpertea/stringtie cd stringtie make release
For evaluating and further processing the GTF output of StringTie, the utility gffcompare can be downloaded from the GFF utilities page.
Back to top
Licensing and Contact Information
StringTie is free, open source software released under an
MIT License
.
You can contact us about StringTie at:
mpertea jhu edu
For technical issues, bug reports and code contributions please use StringTie's GitHub repository.
Back to top
Publications
Shumate A, Wong B, Pertea G, Pertea M Improved transcriptome assembly using a hybrid of long and short reads with StringTie, PLOS Computational Biology 18, 6 (2022), doi.org/10.1371/journal.pcbi.1009730
Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M Transcriptome assembly from long-read RNA-seq alignments with StringTie2, Genome Biology 20, 278 (2019), doi:10.1186/s13059-019-1910-1
Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, Nature Protocols 11, 1650-1667 (2016), doi:10.1038/nprot.2016.095
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT & Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads Nature Biotechnology 2015, doi:10.1038/nbt.3122
Back to top