rddChecker - software for detecting RNA-DNA differences
rddChecker is a heuristic software for determining sites of RNA-DNA differences (RDDs) and candidate RNA editing sites from RNA-seq data.
Download rddChecker here. RDD and SNP sites inferred from the Illumina Human Body Map Project RNA-seq data can be found here.
A guide to using rddChecker
NAME
DESCRIPTIONrddChecker -- discover sites of RNA-DNA differences in RNA-seq alignment data
USAGErddChecker is a tool for predicting RNA-DNA differences that might be caused by post-translation editing of the RNA. It first analyzes short read alignments to detect variable sites, then employs a variety of filters to remove potential sequencing and alignment artifacts, as well as known single nucleotide polymorphisms (SNPs).
Variable sites are detected subject to the following constraints: i. at least R reads must cover the site (default: R=20); ii. reads should show exactly two bases at that position, of which one must match the reference genome (
hg19 for human;mm9 for mouse); and iii. at least P percent of the reads must support the minority letter (default: P=20). This initial set of sites is compared against the human gene and transcript annotations to retain only features within gene regions and to determine their orientation. Lastly, positions of known SNPs are removed, to determine a final set of variable sites that is as accurate and comprehensive as possible.Variable sites undergo further analysis to remove most, if not all, sequencing and alignment artifacts. Specifically, a candidate site is removed if: i. either or both of the reference and alternate letter are supported from reads from a single strand; ii. the alternate letter appears at positions 1-2 in all supporting reads; iii. the observed difference is a likely systematic sequencing error as detected by the program SysCall; iv. there is a close paralog elsewhere in the genome that shows the variation; or v. the site is located within short distance (1-3 bases) outside the boundary of an exon (these sites must also appear in an alternative exon).
perl rddChecker.pl bamfile genome bwtindex syscall_path
[-annot annotfile] [-snp snpfile] [-dir workdir] [-verbose]
rddChecker takes as input a set of short read alignments in BAM format and sorted by chromosome, for instance one produced with the program Tophat, a reference genome and its bowtie index, and the local path to the Syscall executable, and produces a list of putative RDD sites. Optionally, specifying a gene annotation file in GTF format and/or a SNP file will restrict the sites to those found within the gene regions and that do not occur at known SNP sites, respectively. When a SNP file is given, the algorithm also detects a set of high-confidence SNPs, which can be used as controls. Full paths must be specified for all of the four required files.
Program parameters:
bamfile read alignment file, in BAM format genome fasta file containg the genome bwtindex bowtie genome index (download here) syscall_path local path to the SysCall executable (download here) annotfile gene annotations, in GTF format SNPfile list of known SNPs, one per line, with the format: chrom position ref_letter alt_letter database
workdir work directory for storing the temporary and final results verbose use this option to preserve all temporary work files and logs generated by the filters Currently only the hg19 and mm9 genomes are directly supported; see the documentation file included with the release for instructions on how to modify the programs to work with other genomes. Also, the programs will create a SAM file by uncompressing the BAM file in its directory. Please allow for sufficient disk space in that directory.
Back to the top
Download and installation procedure
rddChecker was written for Linux platforms. It may require some
modifications to run on another Unix platform.
To install and run rddChecker, you will need to download and
install several packages, including the samtools
package for managing large short read alignment files, the
bowtie
executable and genome index, the SysCall
program for detecting systematic sequencing errors, and the sim4db/leaff
module for handling large sequence
databases. You may also want to download and prepare a local copy of a
SNP database and gene annotations, or you can use the files we provide
on our web site.
Download and install the auxiliary packages. Save the executables for
samtools, bowtie and leaff somewhere in your PATH.
Download and unpack the gzipped tar file
rddChecker.tar.gz.
The tar will unpack into a directory named rddChecker.[some_date].
(You'll see what the precise name is while tar is unpacking.)
Make that directory current.
Follow the instructions in the COMPILING file, included, to compile.
The software and analysis of 16 human tissue samples from the Illumina Human Body Map Project have been described in:
gunzip < rddChecker.tar.gz | tar -xvf -
cd rddChecker.*
Back to the top
References and data
Back to the top