CIDANE

Home
Manual

About CIDANE

CIDANE is a novel framework for genome-based transcript reconstruction and quantification. CIDANE is engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANE is able to recover splice junctions that are invisible to existing bioinformatics tools.

News

CIDANE is on bitbucket - 12/07/15
- CIDANE's source code is hosted on bitbucket here.
CIDANE 0.0.1 released - 11/05/14
- This first public release of CIDANE is described in our manuscript submitted for publication, and is now available for download here.

Getting and installing CIDANE

This software is free, open source software released under the GNU General Public License . CIDANE has been developed and tested on a Linux x86_64 system. The latest version of CIDANE can be downloaded here. Before building CIDANE, make sure that the following libraries are installed on your system:

SeqAn (develop version available here):

% git clone https://github.com/seqan/seqan.git SeqAn
% cd SeqAn
% git checkout develop

Go to http://www.seqan.de for further instructions.

IBM ILOG CPLEX Optimizer. CIDANE has been tested with CPLEX version 12.4 using an academic license from IBM
GNU Scientific Library (available at http://www.gnu.org/software/gsl/)

To build CIDANE on your Linux system, perform the following steps:

Extract files to the current directory:

    % tar -vxzf CIDANE-0.0.1-alpha.tar.gz

Change to the newly created directory CIDANE-0.0.1 and use CMake (available at http://www.cmake.org) to build CIDANE.

    % cd CIDANE-0.0.1 
    % mkdir build 
    % cd build 
    % export FC=gfortran
    % cmake -DSEQAN_INCLUDE_PATH=<YOUR_SEQAN_INCLUDE_DIR> -DCPLEX_ROOT_DIR=<YOUR_CPLEX_INSTALL_DIR> ../
    % make

The GNU Fortran compiler (gfortran) is required to build the glmnet^[1] library.

This will create binaries CIDANE, exonRefine, and transemble in directory CIDANE-0.0.1/build/bin/.

References & Data

Our paper describing CIDANE will be uploaded as soon as it is accepted for publication. Flux Simulator's .PAR parameter files used to simulate the datasets therein can be downloaded here. The alignments of all simulated reads can be downloaded from the University of Chicago cloud storage.
[1] Friedman J., Hastie T., Tibshirani R: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 (2010)

Funding

This work is supported in part by NIH grant and R01-HG006677 to S.L. Salzberg.

Workflow

Before running CIDANE (see below), read mappings and exon boundaries have to be compiled into segment counts using tools exonRefine and transemble, both located in directory CIDANE-0.0.1/build/bin/:

```
 % exonRefine <exon-intron-file>
```
exonRefine generates a gtf/gff file <refined-exons-file> containing all exons refined into non-overlapping segments. Its input <exon-intron-file> describes in gtf/gff format known or inferred splice sites, and optionally transcription start and end sites or full-length transcripts. Gene and transcript information will be preserved and can be used during CIDANE's inference process.
```
 % transemble <read-mapping-file> <refined-exons-file> count 
```
From <refined-exons-file> generated by exonRefine in step 1 and the read mappings in sam/bam format, transemble generates a file count.cnt with calculated segment counts.
```
 % CIDANE [mandatory parameters] <refined-exons-file> count.cnt  
```
The transcripts inferred by CIDANE along with their estimated abundances are ouput in a file named transcripts.gtf

Running CIDANE

The following detailed description can also be obtained by typing

% CIDANE --help

Usage

% CIDANE [options] <refined-exons-file> <segment-count-file>

Options

Required library options (will be estimated automatically in future versions)
`-h/--help`	Displays this help message.
`-r/--readlength`	read length
`-fl/--fraglength`	fragment length
`-sd/--sdev`	standard deviation
`-lib/--librarysize`	number of mapped fragments in mio (for fpkm calculation)
Miscellaneous options
`-gl/--groupLoci`	infer loci from mapping instead of gene_id in provided gtf. Attention, this option currently deactivates option useTranscriptInformation
`-m/--max`	maximal size of loci considered.
`-th/--expThreshold`	threshold of predicted transcripts to be reported. Either min fraction per gene or FPKM depending on parameter absThreshold. In [0..inf], default: 0.1.
`-ta/--absThreshold`	use FPKM threshold instead of isoform fraction per gene
`-nea/--notExonAware`	deactivate exon awareness
`-uti/--useTranscriptInformation`	incorporate annotated transcripts into model
Faux segment cover options
`-f/--noFaux`	deactivate faux covers.
`-e/--pseudo`	pseudo count for error normalization Default: 1.
`-o/--outdir`	output directory. Default: ./.
`-t/--tsstes`	threshold of spliced alignments to exclude TSS/TES. Default: 0.
Delayed transcript generation options
`-cg/--useColGen`	apply optional column generation step
`-mr/--maxNofRelaxedEdges`	max number of edges relaxed in SG during column generation. Default: 1.
`-u/--epsilon`	(upper) bound on error by piecewise linear approximation. Default: 0.001.
`-slb/--unexplCovSumBound`	lower bound on sum of intensity of unexplained covers. Default: 1.
`-rw/--cgRegWeight`	scale lambda for transcripts built by CG in second glmnet run. Default: 1.3.
`-rl/--cgRedLambda`	reduce lambda in CG. Default: 0.9.