CCB » CBCC » CIDANE

About CIDANE

CIDANE is a novel framework for genome-based transcript reconstruction and quantification. CIDANE is engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANE is able to recover splice junctions that are invisible to existing bioinformatics tools.


News

  • CIDANE is on bitbucket - 12/07/15
    • CIDANE's source code is hosted on bitbucket here.
  • CIDANE 0.0.1 released - 11/05/14
    • This first public release of CIDANE is described in our manuscript submitted for publication, and is now available for download here.


Getting and installing CIDANE

This software is free, open source software released under the GNU General Public License . CIDANE has been developed and tested on a Linux x86_64 system. The latest version of CIDANE can be downloaded here. Before building CIDANE, make sure that the following libraries are installed on your system:

  • SeqAn (develop version available here):
    % git clone https://github.com/seqan/seqan.git SeqAn
    % cd SeqAn
    % git checkout develop
    
    Go to http://www.seqan.de for further instructions.
  • IBM ILOG CPLEX Optimizer. CIDANE has been tested with CPLEX version 12.4 using an academic license from IBM
  • GNU Scientific Library (available at http://www.gnu.org/software/gsl/)

To build CIDANE on your Linux system, perform the following steps:

  1. Extract files to the current directory:

        % tar -vxzf CIDANE-0.0.1-alpha.tar.gz 
        
  2. Change to the newly created directory CIDANE-0.0.1 and use CMake (available at http://www.cmake.org) to build CIDANE.

        % cd CIDANE-0.0.1 
        % mkdir build 
        % cd build 
        % export FC=gfortran
        % cmake -DSEQAN_INCLUDE_PATH=<YOUR_SEQAN_INCLUDE_DIR> -DCPLEX_ROOT_DIR=<YOUR_CPLEX_INSTALL_DIR> ../
        % make 
       
    The GNU Fortran compiler (gfortran) is required to build the glmnet[1] library.

This will create binaries CIDANE, exonRefine, and transemble in directory CIDANE-0.0.1/build/bin/.

References & Data

Our paper describing CIDANE will be uploaded as soon as it is accepted for publication. Flux Simulator's .PAR parameter files used to simulate the datasets therein can be downloaded here. The alignments of all simulated reads can be downloaded from the University of Chicago cloud storage.
[1] Friedman J., Hastie T., Tibshirani R: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1-22 (2010)


Funding

This work is supported in part by NIH grant and R01-HG006677 to S.L. Salzberg.


Workflow


Before running CIDANE (see below), read mappings and exon boundaries have to be compiled into segment counts using tools exonRefine and transemble, both located in directory CIDANE-0.0.1/build/bin/:

  1.  % exonRefine <exon-intron-file>
    exonRefine generates a gtf/gff file <refined-exons-file> containing all exons refined into non-overlapping segments. Its input <exon-intron-file> describes in gtf/gff format known or inferred splice sites, and optionally transcription start and end sites or full-length transcripts. Gene and transcript information will be preserved and can be used during CIDANE's inference process.

  2.  % transemble <read-mapping-file> <refined-exons-file> count 
    From <refined-exons-file> generated by exonRefine in step 1 and the read mappings in sam/bam format, transemble generates a file count.cnt with calculated segment counts.

  3.  % CIDANE [mandatory parameters] <refined-exons-file> count.cnt  
    The transcripts inferred by CIDANE along with their estimated abundances are ouput in a file named transcripts.gtf



Running CIDANE


The following detailed description can also be obtained by typing

% CIDANE --help


Usage

% CIDANE [options] <refined-exons-file> <segment-count-file>

Options
-h/--help Displays this help message.
Required library options (will be estimated automatically in future versions)
-r/--readlength read length
-fl/--fraglength fragment length
-sd/--sdev standard deviation
-lib/--librarysize number of mapped fragments in mio (for fpkm calculation)
Miscellaneous options
-gl/--groupLoci infer loci from mapping instead of gene_id in provided gtf. Attention, this option currently deactivates option useTranscriptInformation
-m/--max maximal size of loci considered.
-th/--expThreshold threshold of predicted transcripts to be reported. Either min fraction per gene or FPKM depending on parameter absThreshold. In [0..inf], default: 0.1.
-ta/--absThreshold use FPKM threshold instead of isoform fraction per gene
-nea/--notExonAware deactivate exon awareness
-uti/--useTranscriptInformation incorporate annotated transcripts into model
Faux segment cover options
-f/--noFaux deactivate faux covers.
-e/--pseudo pseudo count for error normalization Default: 1.
-o/--outdir output directory. Default: ./.
-t/--tsstes threshold of spliced alignments to exclude TSS/TES. Default: 0.
Delayed transcript generation options
-cg/--useColGen apply optional column generation step
-mr/--maxNofRelaxedEdges max number of edges relaxed in SG during column generation. Default: 1.
-u/--epsilon (upper) bound on error by piecewise linear approximation. Default: 0.001.
-slb/--unexplCovSumBound lower bound on sum of intensity of unexplained covers. Default: 1.
-rw/--cgRegWeight scale lambda for transcripts built by CG in second glmnet run. Default: 1.3.
-rl/--cgRedLambda reduce lambda in CG. Default: 0.9.