Diamund

Description
A guide to using Diamund
Download and installation procedures
Other links

Description

Diamund is a new, efficient algorithm for variant detection that compares DNA sequences directly to one another, without aligning them to the reference genome. When used to analyze exome sequences from family trios, or to compare normal and diseased samples from the same individual, this method produces a dramatically smaller list of candidate mutations than previous methods, without losing sensitivity to detect the true cause of a genetic disease.

Back to the top

A guide to using diamund

USAGE

perl diamund.pl {<patient>.fa|<patient>.fq[.gz|.gz2][,<patient_mates>.fq[.gz|.gz2]]} {<parent1>.fa|<parent1>.fq[.gz|.gz2][,<parent1_mates>.fq[.gz|.gz2]]} [{<parent2>.fa|<parent2>.fq[.gz|.gz2][,<parent2_mates>.fq[.gz|.gz2]]}] {-e <exon_file>.fa|-x <exon_kmer_file>} [options]

Usage examples:

diamund.pl patient.fa father.fa mother.fa -e hg19_exome.fa -g 1,0,1
or
diamund.pl patient.1.fq.gz,patient.2.fq.gz normal.1.fg.gz,normal.2.fg.gz -x 27mers-from-exome-sort

OPTIONS:

-e <exon_file> fasta file with the sequence of all exons

-x <exon_kmer_file > sorted exome kmer file (either the -e or -x options should be present in the input)

-k|kmer <kmer_size > kmer size for jellyfish program

-r|recessive run program in recessive mode (default: dominant)

-t <no_of_threads> number of threads for the jellyfish program

-v <kmer_vector_file> file with vector kmers (if available)

-z <vector_file> fasta file with vectors

-f <filter_threshold> only keep kmers in patient that appear more than the <filter_threshold> times

-n <read_no> only keep assemblies that have at least <read_no> assembled reads

-d <vecscreen_data> screen for vector contamination with vescreen using the data from this directory

-g <[0|1](,[0|1])*> a string of 0 and 1 separated by commas, and signifying the sex associated with each data sample respectively; 1 is male, and 0 is female

-l <read_len> read length (default:100)

-o <output_file> output file with variants called (default is STDOUT)

-s <sequence_depth>

-y <kmer_no> assume kmers occuring this many times in normals are errors (default: 0)

-c don't clean-up the files

-m use mummer instead of kraken to align kmers to reads (default: no)

Back to the top

Download and installation

Download and install the following local copies of each auxiliary packages and data files:
[For the official/latest version of these packages please see the Other Links section.]

Auxiliary packages:
- Kraken a system for assigning taxonomic labels to short DNA sequences;
- Samtools package for managing large short read alignment files;
- Jellyfish a tool for fast, memory-efficient counting of k-mers in DNA;
- Bowtie2 an ultrafast and memory-efficient tool for aligning sequencing reads;
- Minimo a de novo assembler based on the AMOS infrastructure;
- Mummer a system for rapidly aligning entire genomes.
Auxiliary data files:
- precomputed files with 27mers from vectors and from exomes
  [You can also prepare your own k-mers files to use with -x, respectively -v options.]
- a fasta file of the human hg19 genome and the bowtie2 index for hg19
- UniVec data
Download and unpack the gzipped tar file diamund.tar.gz.
tar -xzf diamund.tar.gz
The tar will unpack into a directory named Diamund containing 2 files :
- diamund.pl
- diamund.cfg - a tab delimited file with 2 columns:
  - 1st col -- has the name of the auxiliary software and data files needed by diamund.pl (do not change that).
  - 2nd col -- you need to update this with the path for the corresponding software/data file found in the first column.

An email for feedback and bug report soon will be available soon.

Back to the top

Diamund

DIrect Alignment for MUtatioN Discovery

Description

A guide to using diamund

Download and installation

Other links:

-e	<exon_file>	fasta file with the sequence of all exons
-x	<exon_kmer_file >	sorted exome kmer file (either the -e or -x options should be present in the input)
-k\|kmer	<kmer_size >	kmer size for jellyfish program
-r\|recessive		run program in recessive mode (default: dominant)
-t	<no_of_threads>	number of threads for the jellyfish program
-v	<kmer_vector_file>	file with vector kmers (if available)
-z	<vector_file>	fasta file with vectors
-f	<filter_threshold>	only keep kmers in patient that appear more than the <filter_threshold> times
-n	<read_no>	only keep assemblies that have at least <read_no> assembled reads
-d	<vecscreen_data>	screen for vector contamination with vescreen using the data from this directory
-g	<[0\|1](,[0\|1])*>	a string of 0 and 1 separated by commas, and signifying the sex associated with each data sample respectively; 1 is male, and 0 is female
-l	<read_len>	read length (default:100)
-o	<output_file>	output file with variants called (default is STDOUT)
-s	<sequence_depth>
-y	<kmer_no>	assume kmers occuring this many times in normals are errors (default: 0)
-c		don't clean-up the files
-m		use mummer instead of kraken to align kmers to reads (default: no)