CCB » Software » FLASh


About FLASH

FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies. They can also improve transcriptome assembly when FLASH is used to merge RNA-seq data.
 

Accuracy

FLASH merges reads from paired-end sequencing runs with very high accuracy.
FLASH accuracy on one million 100bp long synthetic pairs generated from fragments with a mean length of 180bp, normaly distributed with a standard deviation of 20bp:
 
     No error   1% error rate   2% error rate   3% error rate   5% error rate 
 default parameters    99.73%    99.68%    98.43%    94.76%    77.91%  
 more aggressive parameters    99.73%    99.68%    99.06%    98.30%    93.65%  

Simulated reads used in the experiments are available here:
No error
1% error
2% error
3% error
5% error

FLASH accuracy on real data:
 
647,052 pairs of 101bp long reads from Staphylococcus aureus 90.77%
18,252,400 pairs of 101bp long reads from human 91.02%

The reads are available at the GAGE site: Reads from GAGE

Time requirements

The latest version of FLASH includes a multi-threaded mode.
When run in single threaded mode:
  • FLASH takes 120 seconds to process one million 100-bp long pairs on a server with 256GB of RAM and a six-core 2.4GHz AMD Opteron CPU.
  • FLASH takes 129 seconds to process one million 100-bp long pairs on a desktop with 2GB of RAM and a dual-core Intel Xeon 3.00GHz CPU.
  • Time is linearly proportional to the read length and the number of reads.

Impact of FLASH on genome assemblies

Merging mate pairs by FLASH as a pre-processor for genome assembly yields singificantly higher N50 value of contigs and scaffolds. It also reduces the number of missassembled contigs.

Publication

FLASH: Fast length adjustment of short reads to improve genome assemblies. T. Magoc and S. Salzberg. Bioinformatics 27:21 (2011), 2957-63.

Obtaining the Software

This software is OSI Certified Open Source Software.   
 
FLASH code or executable can be downloaded from Sourceforge. Release packages can also be directly downloaded from here:

Questions/Comments/Requests

Send an e-mail to flash.comment@gmail.com.

Funding

This work has been supported in part by NIH grants R01-LM006845, R01-GM083873, and R01-HG006677 to S.L. Salzberg.