LiftOn's tutorial#

LiftOn is a homology-based lift-over tool designed to accurately map annotations in GFF or GTF between assemblies. It is built upon the fantastic Liftoff (credits to Dr. Alaina Shumate) and miniprot (credits to Dr. Heng Li), and employs a two-step protein maximization algorithm to improve the protein-coding gene lift-over process.

Why LiftOn❓#

  1. Burgeoning number of genome assemblies: As of December 2023, there are 30,530 eukaryotic genomes listed on NCBI (NCBI genome browser). However, genome annotation is lagging behind. As more high-quality assemblies are generated, it is important to accurately annotate them.

  2. Improved protein-coding gene mapping: The popular Liftoff map genes only based on the DNA alignment. With the protein-to-genome alignment, LiftOn is able to further improve the lift-over protein-coding gene annotations. LiftOn improves the current released T2T-CHM13 annotation (JHU RefSeqv110 + Liftoff v5.1).

  3. Improved distant species lift-over: LiftOn extends from lift-over between the same or closely related species to more distantly related species. See mouse_2_rat and drosophila_melanogaster_2_erecta lift-over sections.

LiftOn is free, it's open source, it's easy to install , and it's in Python!

Who is it for❓#

  1. If you have sequenced and assembled a new genome and need to annotate it, LiftOn is the ideal choice for generating annotations.

  2. If you want to do comparative genomics analysis, run liftOn to lift-over and compare annotations!

  3. If you wish to utilize the finest CHM13 annotation, you can run LiftOn! We have also pre-generated the T2T_CHM13_LiftOn.gff3 file for your convenience.

What does LiftOn do❓#

LiftOn is designed for individuals who would like to annotate a new assembly, referred to as target Genome \(T\).

The first step is to select a well-annotated genome along with its annotation, denoted as reference Genome \(R\) and Annotation \(R_A\).

LiftOn employs a two-step protein maximization algorithm (PM algorithm).

  1. The first module is the chaining algorithm. It starts by extracting protein sequences annotated by Liftoff and miniprot. LiftOn then aligns these sequences to full-length reference proteins. For each gene locus, LiftOn compares each section of the protein alignments from Liftoff and miniprot, chaining together the best combinations.

  2. The second module is the open-reading frame search (ORF search) algorithm. In the case of truncated protein-coding transcripts, this algorithm examines alternative frames to identify the ORF that produces the longest match with the reference protein.

  • Input:
    1. target Genome \(T\) in FASTA

    2. reference Genome \(R\) in FASTA

    3. reference Annotation \(R_A\) in GFF3

  • Output:
    1. LiftOn annotation file in GFF3

    2. Protein sequence identities & mutation types

    3. Features with extra copies

    4. Unmapped features

Cite us#

Chao, Kua-Hao, Jakob M. Heinz, Celine Hoh, Alan Mao, Alaina Shumate, Mihaela Perte, aand Steven L. Salzberg. "Combining DNA and protein alignments to improve genome annotation with LiftOn." bioRxiv.

Shumate, Alaina, and Steven L. Salzberg. "Liftoff: accurate mapping of gene annotations." Bioinformatics 37.12 (2021): 1639-1643.

User support#

Please go through the documentation below first. If you have questions about using the package, a bug report, or a feature request, please use the GitHub issue tracker here:

Key contributors#

LiftOn was designed and developed by Kuan-Hao Chao. This documentation was written by Kuan-Hao Chao and Alan Man. The LiftOn logo was designed by Alan Man.

Table of contents#

LiftOn's limitation#

LiftOn's chaining algorithm currently only utilizes miniprot alignment results to fix the Liftoff annotation. However, it can be extended to chain together multiple protein-based annotation files or aasembled RNA-Seq transcripts.

DNA- and protein-based methods still have some limitations. We are developing a module to merge the LiftOn annotation with the released curated annotations to generate better annotations.

The LiftOn chaining algorithm now does not support multi-threading. This functionality stands as our next targeted feature on the development horizon!