Research
My Research
I have been working
in the field of Bioinformatics since 2002, beginning with my collaborations
with The Institute for Genomic Research (TIGR) and Celera Genomics. The main
goals of my research are:
1. developing algorithms
and software for de novo genome assembly and annotation for the latest
generation sequencing data, and
2. applying the software to produce high
quality annotated assemblies for the most challenging genomes.
I lead the development of the open-source
MaSuRCA genome assembly package, which is currently able to produce accurate
high-quality assemblies from sequencing data produced by Illumina, PacBio, and
Oxford Nanopore instruments. As of
today, MaSuRCA was used to assemble over 2600 eukaryotic genomes submitted to
NCBI GenBank. I played a leading role in producing assemblies for many
challenging genome projects, including the 22 Gbp
genome of Loblolly pine (Pinus taeda), the 17 Gbp genome of bread wheat (Triticum aestivum), the
3Gbp Atlantic salmon (Salmo salar), and many
other plants and animals. In the recent years, I was the
leading author of several widely used bioinformatic software titles such as
MUMmer4 sequence aligner, POLCA and JASPER assembly polishers, and SAMBA
scaffolder. Most recently the focus of my research has encompassed transcriptome
assembly, protein alignment and genome annotation algorithms. My most recent work includes a novel
automated genome annotation package called EviAnn, which surpasses all
published annotation software when tested with the same input data.
Funding
My research is currently supported by NIH grants R01-HG006677 (PI Steven Salzberg) and R35-GM130151 (PI Steven Salzberg).In the past my research has been supported in part by NSF award IOS-1744309 (PI Steven Salzberg) and USDA-NIFA award 2018-67015-28199 (PI Aleksey Zimin), and NSF award ABR-PG 1444893 (PI Aldwinckle, Cornell University, Zimin was PI on subcontract to University of Maryland).