CCB » Software » StringTie

Release history

  • 12/3/2021 - v2.2.0 release hilights:
    • --mix option allows StringTie to take both short and long read alignments; when this option is used, the 2nd BAM (or CRAM) file in the command line must be a long reads alignment file (the 1st being the short-reads alignment file); -L option should not be used in this case.
    • improved transcriptome assembly on mixed data with annotation
    • added support for CRAM input files, as StringTie is now built using HTSlib
  • 3/9/2021 - v2.1.5 release changes:
    • adjusted the calculation of coverage for long reads assembly
    • new trimming procedure implemented
  • 7/7/2020 - v2.1.4 release is a maintenance release with the following fixes:
    • fixed an issue with --merge sometimes incorrectly processing the order of the reference sequences
    • fixed -e issue sometimes entering an infinite loop
    • fixed compatibility issue between long transfrag and path assembled so far
    • small bug fix in computing long read assemblies
  • 5/16/2020 - v2.1.3b release new features and fixes
    • added the --viral option for long reads from viral data where splice sites do not follow consensus
    • adjustments to the assembly of long read alignment data
    • fixed an occasional issue with the --merge option
    • made the -e compatible with long reads (-L option)
  • 4/21/2020 - v2.1.2 release new features to improve transcriptome assembly:
    • introduced an option to load point features from a tab-delimited text
    • more sensitive assembly of transcripts from long read data
  • 1/29/2020 - v2.1.1 release is a maintenance release.
    • fixed an issue found in the trimming procedure
  • 1/25/2020 - v2.1.0 release brings improvements, corrections and new features.
    • improved start/end trimming for long read alignments
    • new -R option for long read alignments (-L alternative) to output unassembled, cleaned, and non-redundant long read alignments (collapsing similar long reads alignments at the same location).
    • (experimental) verify consensus splice sites if genomic sequence is provided with -g
  • 12/16/2019 - v2.0.6 release is a maintenance StringTie release fixing a regression bug (occasional crash/instability) affecting the previous release.
  • 10/29/2019 - v2.0.4 release is a maintenance StringTie release:
    • fixed an occasional instability problem occurring in some situations when long read alignments (-L option) were used in conjuction with guides (reference annotation, -G option).
    • fixed a problem with -e option sometimes adding newly assembled transcripts to the output, that were not present in the -G reference transcripts file; this caused the prepDE.py script to fail in those cases.
    • added a few more input checks in prepDE.py to verify if the GTF files being processed were generated as expected.
  • 7/30/2019 - v2.0 release is a major StringTie release adding new features and improvements.
    • added support for long read alignments (enabled with -L option)
    • added a new "super-reads" module (found in the SuperReads_RNA directory of the source distribution) which can be used to perform de-novo assembly and alignment of RNA-Seq reads preparing them for assembly with StringTie
    • overall improved handling of read alignments and their transcription strand assignment
  • 5/2/2019 - v1.3.6 release is a maintenance build addressing a few issues found since the previous release:
    • fixing a GFF/GTF sorting issue causing occasional errors when the --merge option was used
    • addressing a float precision problem causing negative coverage/TPM/FPKM values in some cases
    • various GFF/GTF parsing adjustments improving support for some reference annotation sources
  • 11/2/2018 - v1.3.5 release is a maintenance release providing a few minor changes and additions:
    • spliced alignments produced by minimap2 (in SAM format) are now supported; there is no need to pre-process them in order to add the XS tag, the "ts" tag is now recognized as an alternative. Note: sorting of the SAM/BAM file is still required!
    • the default value for the -M option (maximum multi-mapping fraction) is now set to 1.0, such that transcripts assembled from only multi-mapped reads are no longer excluded (e.g. in case of multiple gene copies).
    • read alignments not having a transcription strand assigned (generally unspliced mappings) can be now automatically assigned the strand of the overlaping reference (guide) transcript, if any such overlap exists.
  • 2/25/2018 - v1.3.4 release
    • GTF/GFF parser adjustments and fixes; please note that gene features without explicit exons (and not parenting any transcripts) are no longer taken as single-exon transcripts
    • -v log output now includes more detailed bundle information
    • minor performance optimizations
  • 2/15/2017 - v1.3.3 release
    • fixed the computation of the TPM value that was reported incorrectly somtimes in the presence of guides (-G option)
    • few other minor fixes
  • 1/24/2017 - v1.3.2 release (currently v1.3.2d)
    • added options --fr and --rf to support stranded libraries
    • fixed a few cases where StringTie might crash due to particular input data
    • a few corrections and improvements for GFF parsing of reference annotation files
  • 11/12/2016 - v1.3.1 release
    • fixed an incorrect abundance estimation in some cases where reference transcripts were joined by bridging read alignments
    • fixed a duplication of some output transcripts in -e mode (abundance estimation only)
    • corrected negative coverage values which appeared due to rounding errors for some low coverage isoforms
    • fixed a Gencode GTF gene parsing issue (for older Gencode releases)
  • 9/7/2016 - v1.3.0 release
    • an improved abundance estimation method has been implemented
    • fixed a parsing issue for UCSC annotation files (GTF with improper gene_id values) which could have caused StringTie to crash in some cases
  • 8/11/2016 - The HISAT, StringTie and Ballgown protocol paper is published in Nature Protocols 11, 1650-1667 (2016) | doi:10.1038/nprot.2016.095
  • 7/25/2016 - v1.2.4 release
    This release provides a fix for a stability issue reported for v1.2.3, addresses a couple of GTF/GFF parsing problems, and implements some minor corrections in the output files.

  • 7/21/2016 - using DESeq2 and edgeR with StringTie's output
    Thanks to a productive summer internship of high school student David Miller there is now a script available for converting the output of StringTie into read count tables (at gene and transcript levels) which can be then imported into packages like DESeq2 or edgeR in order to estimate differential expression for genes and transcripts. Please check the new manual section for this DESeq2 and edgeR integration.
  • 5/12/2016 - v1.2.3 release
    • StringTie now accepts SAM records (or BAM, with the new --bam option) streamed directly at stdin (when the input file is given as -), which allows easier integration with on-the-fly alignment filtering and processing tools generating SAM/BAM output; however such input must still be sorted by coordinate. This feature allows uses such as:
      • running StringTie only on a specific subset of alignments (a region) from an indexed BAM file by piping samtools view output into StringTie
      • running StringTie on alignments stored in highly compressed formats like CRAM by converting them on the fly to SAM/BAM records
    • StringTie now makes sure that the thread stack size is set to at least 8MB, thus preventing a stack overflow crash on systems configured with lower default stack size (e.g. Mac OS X systems).
    • fixed a bit vector allocation issue causing a crash on some data sets.
  • 2/18/2016 - v1.2.2 release
    Note: this version had a minor update and the download packages were rebuilt on 2/19/2016 after fixing a typo which slightly changed the output for the --merge mode when the -p option was used.
    • for -e ("estimate only") option, all the reference transcripts are now reported in the output GTF (with 0.0 coverage/FPKM/TPM if they were not found to be covered by any reads).
    • similarly, for the -A option, all the input reference genes are now reported in the gene abundance file (with 0.0 Coverage/FPKM/TPM for those not expressed in the sample).
    • minor fix in the estimation algorithm for partially covered transcripts.
    • slight adjustment of the columns reported in the -A gene abundance file (see header).
    • optimized memory usage and improved results for --merge mode in some cases with many input GTF files provided (hundreds).
    • --merge mode has more aggressive filters set for transcripts that are unlikely to be real such as low covered single exon intronic transcripts.
    • a new option -i is now available in --merge mode to allow keeping transcripts with retained introns (by default these transcripts are only reported when there is strong evidence - high expression coverage - to support them).
  • 1/14/2016 - v1.2.1 release
    • fixed a gene and transcript numbering issue that affected the output in some cases.
  • 1/3/2016 - v1.2.0 release
    • new feature: a "transcripts merge" usage mode of StringTie which is triggered by the new --merge option; with this option StringTie expects as input a list of GTF files and merges/assembles all these transcripts into a non-redundant set of transcripts. This performs a function similar to the CuffMerge script in the Cufflinks/Tuxedo suite -- please see the updated protocol in the manual.
    • an improved "estimation only" usage mode (-e option); in this mode only, StringTie now assigns reads to partially covered reference transcripts as well, therefore relaxing the previous requirement that reference transcripts need to be covered end-to-end in order to be found as expressed.
    • improvements in dealing with abnormally large bundles by filtering of likely spurious spliced alignments produced by some aligners (e.g. STAR); note that HISAT2, used with the --dta option, is now the recommended aligner to use for StringTie.
  • 11/9/2015 - v1.1.2 release
    • fixed a bug for the case when a reference transcript had a one bp overlap with a bundle of reads.
  • 10/25/2015 - v1.1.1 release
    • fixed a junction management issue which could have caused StringTie to crash in rare cases of very low coverage.
  • 10/20/2015 - minor update of v1.1.0
    • new option --version simply returns the version string at stdout
    • options --version,--help and -h send their output to stdout and exit with a 0 code.
  • 10/19/2015 - v1.1.0 release
    This StringTie release includes the following updates:
    • major memory usage improvements due to: changes of internal data structures, collapsing reads aligned in the same place and filtering of spurious spliced alignments within large bundles -- most RNA-seq data samples use much less than 1Gb of memory now.
    • TPM is now also reported for transcripts and genes (besides FPKM)
    • new -A option provides gene abundance estimates in a separate file
    • -s option which sets the coverage saturation is deprecated starting at this version
    • modifying the tag HI in the BAM alignment file to start at 0 (as previously required with STAR produced alignment files) is no longer needed starting at this version
  • 5/18/2015 - v1.0.4 release
    This StringTie release includes the following updates:
    • fixed a strand assignment issue that was causing loss of strand information and the duplication of some transcript assemblies
    • improved coverage estimation for single exon transcripts partially overlapped by read alignments from a neighboring transcript
    • improved coverage estimation for overlapping transcripts on opposite strands
    • improved strand assignment to read alignments overlapping reference transcripts
    • added the -x option that can be used to instruct StringTie to not perform any transcript assembly on one or more reference sequences that are of no interest for the RNA-Seq analysis (e.g. one could use -x chrM if there is no interest in mitochondrial gene expression)
    • addressed a GFF parsing issue encountered for some annotation files (e.g. TAIR)
  • 4/3/2015 - v1.0.3 release
    This StringTie release includes the following updates:
    • fixed an output problem for the -B/-b option (Ballgown data) which caused e2t.ctab and i2t.ctab files to not provide the complete structure for some transcripts
    • fixed a memory de/allocation issue which could have caused StringTie to crash on some systems
    • now StringTie warns if the given annotation file does not provide any reference transcripts annotated on the genomic sequences on which the reads were mapped (naming convention mismatch)
    • added ref_gene_id and ref_gene_name attributes to the assembled transcripts if the corresponding gene annotation is available in the reference annotation file, for assembled transcripts which fully match a reference transcript
  • 3/11/2015 - v1.0.2 release
    This StringTie release includes the following changes:
    • fixed a linking issue on Ubuntu systems
    • fixed compilations errors with llvm (which also broke compilation on recent OS X versions)
    • removed the repeated warnings about transcript_id missing from Ensembl GTF non-transcript lines
    • now StringTie checks if the given read alignments are sorted by coordinate
  • 2/21/2015 - v1.0.1 release
    This StringTie release mainly provides corrections and improvements for the Ballgown *.ctab files. The following changes were implemented:
    • if the -o option includes a directory path, StringTie will attempt to create all the directories which do not exist in the specified path.
    • the functionality of the -B option has changed now to only be a command line switch (i.e. no argument is expected), which will simply enable the creation of the Ballgown table files in the same directory as the one provided with the -o option.
    • the -b option was added as a variant of -B which simply allows the creation of the Ballgown table files in a different directory.
    • the new -e option can direct StringTie to skip the processing of loci and transcripts which do not overlap any of the reference transcripts provided with -G. This option is recommended when StringTie is used simply for estimating the abundance of the provided reference transcripts and for generating Ballgown table files (-B/-b option).
    • the -S option is being phased out and it is no longer maintained.
  • 2/18/2015 - StringTie paper published
    The StringTie paper is published online in Nature Biotechnology.
    • 1/17/2015 - v1.0.0 release
      This release improves the "guided" assembly procedure, which is called when reference annotation is provided with the -G option. StringTie takes a conservative approach to using gene and transcript annotations: it only predicts the presence of transcripts whose introns are each supported by at least one spliced read alignment. (Some competing methods, by contrast, output all transcripts in the annotation regardless of the supporting read alignments.)
      • 11/25/2014 - v0.99 release
        This release adds the -B option which provides support for differential expression analysis using Ballgown, a system developed by Alyssa Frazee and Jeff Leek. This option instructs StringTie to generate the *.ctab files with coverage data for the provided reference/merged transcripts, which can be then loaded and analyzed with Ballgown. Differential expression can also be done as before, using the Cuffdiff2 system from Cole Trapnell and colleagues.
        • 10/22/2014 - 0.98 release
          This release attempts to address some cases where possibly spurious read alignments might cause excessive memory usage.
          • a new filter is introduced in order to discard very long introns that are not supported by coverage
          • paired reads that are not reachable through the splice graph are now treated as unpaired
          • there is now an upper limit on the number of transfrags to be processed at the same time
        • 10/1/2014 - 0.97 update
          Changed the default value for the -c parameter controling the minimum reads per base coverage to 2.5.
        • 5/22/2014 - 0.97 release
          This release introduces a parameter (-S) for a more sensitive run of StringTie. While StringTie was optimized on many simulated and real data sets to achieve the highest possible sensitivity, while still maintaining a high precision, one might be interested in exploring more transcripts, expressed at lower levels in the data. One way to achieve this is by using the -S parameter, and/or the parameter -f which adjusts the minimum expression of the lower expressed transcripts as a fraction from the most abundant transcript in the same loci.
          A higher precision can be obtained by using the parameter -c, which controls the minimum read per base coverage for the assembled transcripts.
        • 4/10/2014 - 0.96 release
          First public release of StringTie.


        Back to top