User Manual#

LiftOn#

====================================================================
An accurate homology lift-over tool between assemblies
====================================================================


   ██╗     ██╗███████╗████████╗ ██████╗ ███╗   ██╗
   ██║     ██║██╔════╝╚══██╔══╝██╔═══██╗████╗  ██║
   ██║     ██║█████╗     ██║   ██║   ██║██╔██╗ ██║
   ██║     ██║██╔══╝     ██║   ██║   ██║██║╚██╗██║
   ███████╗██║██║        ██║   ╚██████╔╝██║ ╚████║
   ╚══════╝╚═╝╚═╝        ╚═╝    ╚═════╝ ╚═╝  ╚═══╝

v1.0.0

usage: lifton [-h] [-E] [-EL] [-c] [-o FILE] [-u FILE] [-exclude_partial] [-mm2_options =STR] [-a A] [-s S] [-min_miniprot MIN_MINIPROT]
            [-max_miniprot MAX_MINIPROT] [-d D] [-flank F] [-V] [-D] [-t THREADS] [-m PATH] [-f TYPES] [-infer-genes] [-infer_transcripts] [-chroms TXT]
            [-unplaced TXT] [-copies] [-sc SC] [-overlap O] [-mismatch M] [-gap_open GO] [-gap_extend GE] [-polish] [-cds] -g GFF [-P FASTA] [-T FASTA] [-L gff]
            [-M gff] [-ad SOURCE]
            target reference

Lift features from one genome assembly to another

* Required input (sequences):
target                target fasta genome to lift genes to
reference             reference fasta genome to lift genes from

* Required input (Reference annotation):
-g GFF, --reference-annotation GFF
                        the reference annotation file to lift over in GFF or GTF format (or) name of feature database; if not specified, the -g argument must be
                        provided and a database will be built automatically

* Optional input (Reference sequences):
-P FASTA, --proteins FASTA
                        the reference protein sequences.
-T FASTA, --transcripts FASTA
                        the reference transcript sequences.

* Optional input (Liftoff annotation):
-L gff, --liftoff gff
                        the annotation generated by Liftoff (or) name of Liftoff gffutils database; if not specified, the -liftoff argument must be provided and a
                        database will be built automatically

* Optional input (miniprot annotation):
-M gff, --miniprot gff
                        the annotation generated by miniprot (or) name of miniprot gffutils database; if not specified, the -miniprot argument must be provided
                        and a database will be built automatically

* Output settings:
-o FILE, --output FILE
                        write output to FILE in same format as input; by default, output is written to "lifton.gff3"
-u FILE               write unmapped features to FILE; default is "unmapped_features.txt"
-exclude_partial      write partial mappings below -s and -a threshold to unmapped_features.txt; if true partial/low sequence identity mappings will be included
                        in the gff file with partial_mapping=True, low_identity=True in comments

* Miscellaneous settings:
-h, --help            show this help message and exit
-E, --evaluation      Run LiftOn in evaluation mode
-EL, --evaluation-liftoff-chm13
                        Run LiftOn in evaluation mode
-c, --write_chains    Write chaining files
-V, --version         show program version
-D, --debug           Run debug mode
-t THREADS, --threads THREADS
                        use t parallel processes to accelerate alignment; by default p=1
-m PATH               Minimap2 path
-f TYPES, --features TYPES
                        list of feature types to lift over
-infer-genes          use if annotation file only includes transcripts, exon/CDS features
-infer_transcripts    use if annotation file only includes exon/CDS features and does not include transcripts/mRNA
-chroms TXT           comma seperated file with corresponding chromosomes in the reference,target sequences
-unplaced TXT         text file with name(s) of unplaced sequences to map genes from after genes from chromosomes in chroms.txt are mapped; default is
                        "unplaced_seq_names.txt"
-copies               look for extra gene copies in the target genome
-sc SC                with -copies, minimum sequence identity in exons/CDS for which a gene is considered a copy; must be greater than -s; default is 1.0
-overlap O            maximum fraction [0.0-1.0] of overlap allowed by 2 features; by default O=0.1
-mismatch M           mismatch penalty in exons when finding best mapping; by default M=2
-gap_open GO          gap open penalty in exons when finding best mapping; by default GO=2
-gap_extend GE        gap extend penalty in exons when finding best mapping; by default GE=1
-polish
-cds                  annotate status of each CDS (partial, missing start, missing stop, inframe stop codon)
-ad SOURCE, --annotation-database SOURCE
                        The source of the reference annotation (RefSeq / GENCODE / others).

Alignments:
-mm2_options =STR     space delimited minimap2 parameters. By default ="-a --end-bonus 5 --eqx -N 50 -p 0.5"
-a A                  designate a feature mapped only if it aligns with coverage ≥A; by default A=0.5
-s S                  designate a feature mapped only if its child features (usually exons/CDS) align with sequence identity ≥S; by default S=0.5
-min_miniprot MIN_MINIPROT
                        The minimum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
                        exclusively by miniprot in the target genome, is set by default to MIN_MINIPROT=0.9.
-max_miniprot MAX_MINIPROT
                        The maximum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
                        exclusively by miniprot in the target genome, is set by default to MIN_MINIPROT=1.5.
-d D                  distance scaling factor; alignment nodes separated by more than a factor of D in the target genome will not be connected in the graph; by
                        default D=2.0
-flank F              amount of flanking sequence to align as a fraction [0.0-1.0] of gene length. This can improve gene alignment where gene structure differs
                        between target and reference; by default F=0.0