CCB » CBCC » rddChecker


rddChecker - software for detecting RNA-DNA differences


rddChecker is a heuristic software for determining sites of RNA-DNA differences (RDDs) and candidate RNA editing sites from RNA-seq data.

Download rddChecker here. RDD and SNP sites inferred from the Illumina Human Body Map Project RNA-seq data can be found here.


A guide to using rddChecker


NAME

rddChecker  --   discover sites of RNA-DNA differences in RNA-seq alignment data

DESCRIPTION

rddChecker is a tool for predicting RNA-DNA differences that might be caused by post-translation editing of the RNA. It first analyzes short read alignments to detect variable sites, then employs a variety of filters to remove potential sequencing and alignment artifacts, as well as known single nucleotide polymorphisms (SNPs).

Variable sites are detected subject to the following constraints: i. at least R reads must cover the site (default: R=20); ii. reads should show exactly two bases at that position, of which one must match the reference genome (hg19 for human; mm9 for mouse); and iii. at least P percent of the reads must support the minority letter (default: P=20). This initial set of sites is compared against the human gene and transcript annotations to retain only features within gene regions and to determine their orientation. Lastly, positions of known SNPs are removed, to determine a final set of variable sites that is as accurate and comprehensive as possible.

Variable sites undergo further analysis to remove most, if not all, sequencing and alignment artifacts. Specifically, a candidate site is removed if: i. either or both of the reference and alternate letter are supported from reads from a single strand; ii. the alternate letter appears at positions 1-2 in all supporting reads; iii. the observed difference is a likely systematic sequencing error as detected by the program SysCall; iv. there is a close paralog elsewhere in the genome that shows the variation; or v. the site is located within short distance (1-3 bases) outside the boundary of an exon (these sites must also appear in an alternative exon).

USAGE
perl rddChecker.pl bamfile genome bwtindex syscall_path [-annot annotfile] [-snp snpfile] [-dir workdir] [-verbose]

rddChecker takes as input a set of short read alignments in BAM format and sorted by chromosome, for instance one produced with the program Tophat, a reference genome and its bowtie index, and the local path to the Syscall executable, and produces a list of putative RDD sites. Optionally, specifying a gene annotation file in GTF format and/or a SNP file will restrict the sites to those found within the gene regions and that do not occur at known SNP sites, respectively. When a SNP file is given, the algorithm also detects a set of high-confidence SNPs, which can be used as controls. Full paths must be specified for all of the four required files.

Program parameters:

bamfile   read alignment file, in BAM format
genome   fasta file containg the genome
bwtindex   bowtie genome index (download here)
syscall_path   local path to the SysCall executable (download here)
annotfile   gene annotations, in GTF format
SNPfile   list of known SNPs, one per line, with the format:
          chrom position ref_letter alt_letter database
workdir   work directory for storing the temporary and final results
verbose   use this option to preserve all temporary work files and logs generated by the filters

Currently only the hg19 and mm9 genomes are directly supported; see the documentation file included with the release for instructions on how to modify the programs to work with other genomes. Also, the programs will create a SAM file by uncompressing the BAM file in its directory. Please allow for sufficient disk space in that directory.


Back to the top
 

Download and installation procedure

rddChecker was written for Linux platforms. It may require some modifications to run on another Unix platform. is still under development, so please write us with feedback and bug reports.

To install and run rddChecker, you will need to download and install several packages, including the samtools package for managing large short read alignment files, the bowtie executable and genome index, the SysCall program for detecting systematic sequencing errors, and the sim4db/leaff module for handling large sequence databases. You may also want to download and prepare a local copy of a SNP database and gene annotations, or you can use the files we provide on our web site.

  1. Download and install the auxiliary packages. Save the executables for samtools, bowtie and leaff somewhere in your PATH.

  2. Download and unpack the gzipped tar file rddChecker.tar.gz.

            gunzip < rddChecker.tar.gz | tar -xvf -
    
  3. The tar will unpack into a directory named rddChecker.[some_date]. (You'll see what the precise name is while tar is unpacking.) Make that directory current.

            cd rddChecker.*
    
  4. Follow the instructions in the COMPILING file, included, to compile.


Back to the top
 

References and data

The software and analysis of 16 human tissue samples from the Illumina Human Body Map Project have been described in:


Back to the top