About CIDANE
CIDANE is a novel framework for genome-based transcript reconstruction and quantification. CIDANE is engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANE is able to recover splice junctions that are invisible to existing bioinformatics tools.
News
- CIDANE is on bitbucket - 12/07/15
- CIDANE's source code is hosted on bitbucket here.
- CIDANE 0.0.1 released - 11/05/14
- This first public release of CIDANE is described in our manuscript submitted for publication, and is now available for download here.
Getting and installing CIDANE
This software is free, open source software released under the GNU General Public License . CIDANE has been developed and tested on a Linux x86_64 system. The latest version of CIDANE can be downloaded here. Before building CIDANE, make sure that the following libraries are installed on your system:
- SeqAn (develop version available here):
% git clone https://github.com/seqan/seqan.git SeqAn % cd SeqAn % git checkout develop
Go to http://www.seqan.de for further instructions. - IBM ILOG CPLEX Optimizer. CIDANE has been tested with CPLEX version 12.4 using an academic license from IBM
- GNU Scientific Library (available at http://www.gnu.org/software/gsl/)
To build CIDANE on your Linux system, perform the following steps:
- Extract files to the current directory:
% tar -vxzf CIDANE-0.0.1-alpha.tar.gz
- Change to the newly created directory CIDANE-0.0.1 and use CMake (available at http://www.cmake.org) to build CIDANE.
% cd CIDANE-0.0.1 % mkdir build % cd build % export FC=gfortran % cmake -DSEQAN_INCLUDE_PATH=<YOUR_SEQAN_INCLUDE_DIR> -DCPLEX_ROOT_DIR=<YOUR_CPLEX_INSTALL_DIR> ../ % make
The GNU Fortran compiler (gfortran) is required to build the glmnet[1] library.
CIDANE
, exonRefine
, and transemble
in directory CIDANE-0.0.1/build/bin/
.
References & Data
Our paper describing CIDANE will be uploaded as soon as it is accepted for publication. Flux Simulator's .PAR parameter files used to simulate the datasets therein can be downloaded here.
The alignments of all simulated reads can be downloaded from the University of Chicago cloud storage.
[1] Friedman J., Hastie T., Tibshirani R: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 (2010)
Funding
This work is supported in part by NIH grant and R01-HG006677 to S.L. Salzberg.
Workflow
Before running CIDANE (see below), read mappings and exon boundaries have to be compiled into segment counts using
tools exonRefine
and transemble
, both located in directory CIDANE-0.0.1/build/bin/
:
-
% exonRefine <exon-intron-file>
exonRefine
generates a gtf/gff file<refined-exons-file>
containing all exons refined into non-overlapping segments. Its input<exon-intron-file>
describes in gtf/gff format known or inferred splice sites, and optionally transcription start and end sites or full-length transcripts. Gene and transcript information will be preserved and can be used during CIDANE's inference process. -
% transemble <read-mapping-file> <refined-exons-file> count
From<refined-exons-file>
generated byexonRefine
in step 1 and the read mappings in sam/bam format,transemble
generates a filecount.cnt
with calculated segment counts. -
% CIDANE [mandatory parameters] <refined-exons-file> count.cnt
The transcripts inferred by CIDANE along with their estimated abundances are ouput in a file named transcripts.gtf
Running CIDANE
The following detailed description can also be obtained by typing
% CIDANE --help
Usage
% CIDANE [options] <refined-exons-file> <segment-count-file>
Options
-h/--help | Displays this help message. |
Required library options (will be estimated automatically in future versions) | |
---|---|
-r/--readlength | read length |
-fl/--fraglength | fragment length |
-sd/--sdev | standard deviation |
-lib/--librarysize | number of mapped fragments in mio (for fpkm calculation) |
Miscellaneous options | |
-gl/--groupLoci | infer loci from mapping instead of gene_id in provided gtf. Attention, this option currently deactivates option useTranscriptInformation |
-m/--max | maximal size of loci considered. |
-th/--expThreshold | threshold of predicted transcripts to be reported. Either min fraction per gene or FPKM depending on parameter absThreshold. In [0..inf], default: 0.1. |
-ta/--absThreshold | use FPKM threshold instead of isoform fraction per gene |
-nea/--notExonAware | deactivate exon awareness |
-uti/--useTranscriptInformation | incorporate annotated transcripts into model |
Faux segment cover options | |
-f/--noFaux | deactivate faux covers. |
-e/--pseudo | pseudo count for error normalization Default: 1. |
-o/--outdir | output directory. Default: ./. |
-t/--tsstes | threshold of spliced alignments to exclude TSS/TES. Default: 0. |
Delayed transcript generation options | |
-cg/--useColGen | apply optional column generation step |
-mr/--maxNofRelaxedEdges | max number of edges relaxed in SG during column generation. Default: 1. |
-u/--epsilon | (upper) bound on error by piecewise linear approximation. Default: 0.001. |
-slb/--unexplCovSumBound | lower bound on sum of intensity of unexplained covers. Default: 1. |
-rw/--cgRegWeight | scale lambda for transcripts built by CG in second glmnet run. Default: 1.3. |
-rl/--cgRedLambda | reduce lambda in CG. Default: 0.9. |