User Manual

LiftOn

====================================================================
An accurate homology lift-over tool between assemblies
====================================================================


   ██╗     ██╗███████╗████████╗ ██████╗ ███╗   ██╗
   ██║     ██║██╔════╝╚══██╔══╝██╔═══██╗████╗  ██║
   ██║     ██║█████╗     ██║   ██║   ██║██╔██╗ ██║
   ██║     ██║██╔══╝     ██║   ██║   ██║██║╚██╗██║
   ███████╗██║██║        ██║   ╚██████╔╝██║ ╚████║
   ╚══════╝╚═╝╚═╝        ╚═╝    ╚═════╝ ╚═╝  ╚═══╝

v1.0.9

usage: lifton [-h] [-E] [-EL] [-c] [--no-orf-search] [-o FILE] [-u FILE] [-exclude_partial]
            [-mm2_options =STR] [-mp_options =STR] [-a A] [-s S] [-min_miniprot MIN_MINIPROT]
            [-max_miniprot MAX_MINIPROT] [-d D] [-flank F] [-V] [-D] [-t THREADS] [-m PATH]
            [-f TYPES] [-infer-genes] [-infer_transcripts] [-chroms TXT] [-unplaced TXT]
            [-copies] [-sc SC] [-overlap O] [-mismatch M] [-gap_open GO] [-gap_extend GE]
            [-polish] [-cds] [-time] [--validate-output] [--validate-verbose] [--strict-gff]
            [--stream] [--inmemory-liftoff] [--locus-pipeline] [--native] [--serial-aligners]
            [--parallel-aligners] [--optimize] [--legacy-merge] [--full-dp-align] [--fast-align]
            [--gene-only] [--lift-gene-like] [--no-miniprot-rescue] [--miniprot-rescue]
            [--merge-strategy STRATEGY] [--id-spec ID_SPEC] [--force] [--verbose]
            [--no-auto-convert-gtf] -g GFF [-P FASTA] [-T FASTA] [-L gff] [-M gff] [-ad SOURCE]
            target reference

Lift features from one genome assembly to another

* Required input (sequences):
target                target fasta genome to lift genes to
reference             reference fasta genome to lift genes from

* Required input (Reference annotation):
-g GFF, --reference-annotation GFF
                        the reference annotation file to lift over in GFF or GTF format (or) name of feature database; if not specified, the -g argument must be
                        provided and a database will be built automatically

* Optional input (Reference sequences):
-P FASTA, --proteins FASTA
                        the reference protein sequences.
-T FASTA, --transcripts FASTA
                        the reference transcript sequences.

* Optional input (Liftoff annotation):
-L gff, --liftoff gff
                        the annotation generated by Liftoff (or) name of Liftoff gffutils database; if not specified, the -liftoff argument must be provided and a
                        database will be built automatically

* Optional input (miniprot annotation):
-M gff, --miniprot gff
                        the annotation generated by miniprot (or) name of miniprot gffutils database; if not specified, the -miniprot argument must be provided
                        and a database will be built automatically

* gffutils parameters:
--merge-strategy {create_unique,merge,error,warning,replace}
                        strategy for merging features when building the database; default "create_unique"
--id-spec ID_SPEC     attribute to use as feature ID; default "ID"
--force               overwrite existing database
--verbose             enable verbose output

* Output settings:
-o FILE, --output FILE
                        write output to FILE in same format as input; by default, output is written to "lifton.gff3"
-u FILE               write unmapped features to FILE; default is "unmapped_features.txt"
-exclude_partial      write partial mappings below -s and -a threshold to unmapped_features.txt; if true partial/low sequence identity mappings will be included
                        in the gff file with partial_mapping=True, low_identity=True in comments

* Miscellaneous settings:
-h, --help            show this help message and exit
-E, --evaluation      Run LiftOn in evaluation mode
-EL, --evaluation-liftoff-chm13
                        Run LiftOn in evaluation mode
-c, --write_chains    Write chaining files
--no-orf-search       do not perform open reading frame (ORF) search
-V, --version         show program version
-D, --debug           Run debug mode
-t THREADS, --threads THREADS
                        use t parallel processes to accelerate alignment; by default t=1
-m PATH               Minimap2 path
-f TYPES, --features TYPES
                        list of feature types to lift over (an explicit -f overrides gene-like auto-detection)
-infer-genes          use if annotation file only includes transcripts, exon/CDS features; auto-enabled for GTF
-infer_transcripts    use if annotation file only includes exon/CDS features and does not include transcripts/mRNA; auto-enabled for GTF
-chroms TXT           comma seperated file with corresponding chromosomes in the reference,target sequences
-unplaced TXT         text file with name(s) of unplaced sequences to map genes from after genes from chromosomes in chroms.txt are mapped; default is
                        "unplaced_seq_names.txt"
-copies               look for extra gene copies in the target genome
-sc SC                with -copies, minimum sequence identity in exons/CDS for which a gene is considered a copy; must be greater than -s; default is 1.0
-overlap O            maximum fraction [0.0-1.0] of overlap allowed by 2 features; by default O=0.1
-mismatch M           mismatch penalty in exons when finding best mapping; by default M=2
-gap_open GO          gap open penalty in exons when finding best mapping; by default GO=2
-gap_extend GE        gap extend penalty in exons when finding best mapping; by default GE=1
-polish
-cds                  annotate status of each CDS (partial, missing start, missing stop, inframe stop codon)
-time, --measure_time enable time measurement for each step (writes time.txt)
-ad SOURCE, --annotation-database SOURCE
                        The source of the reference annotation (RefSeq / GENCODE / others)
--no-auto-convert-gtf disable automatic GTF -> GFF3 conversion

* Validation (exit-code only; does not change output bytes):
--strict-gff          run the NCBI GFF3 input-side validator on the reference annotation; exit non-zero on any spec violation
--validate-output     re-validate the generated output GFF3 after writing; print a structured report to stderr
--validate-verbose    with --validate-output, also print warnings (not just errors)

* Performance fast-paths (BYTE-IDENTICAL to the default output):
--stream              pipe miniprot output into an in-memory database; skip the miniprot.gff3 disk round-trip / SQLite re-ingest
--inmemory-liftoff    feed Liftoff's lifted features to the database in-process; skip the liftoff.gff3 disk write / re-ingest
--locus-pipeline      fan out per-locus work through a thread pool sized by --threads; emitted in submission order (byte-identical to -t 1)
--native              route miniprot through the in-process native facade + unlock in-process threading; falls back gracefully if mappy is absent
--serial-aligners     opt OUT of the (default) concurrent Liftoff||miniprot overlap; run them sequentially

* Output-changing flags (opt-outs that RESTORE pre-v1.0.9 behaviour):
--legacy-merge        restore the pre-promotion UNCONDITIONAL Liftoff/miniprot merge (default = verified best-of-outcome merge)
--full-dp-align       restore the exact giant-only full-DP alignment (default = band-everything anchor-windowed alignment)
--gene-only           restore the pre-v1.0.9 gene-only lift (default = lift all auto-detected gene-like top-level types)
--no-miniprot-rescue  disable the (default-ON) miniprot-only rescue pass for genes the DNA lift missed entirely

* Deprecated NO-OP aliases (kept for backward compatibility; have no effect):
--parallel-aligners   no-op (concurrent Step 4 is now default; use --serial-aligners to opt out)
--optimize            no-op (best-of-outcome merge is now default; use --legacy-merge to opt out)
--fast-align          no-op (band-everything alignment is now default; use --full-dp-align to opt out)
--lift-gene-like      no-op (gene-like lift is now default; use --gene-only to opt out)
--miniprot-rescue     no-op (miniprot-only rescue is now default; use --no-miniprot-rescue to opt out)

Alignments:
-mm2_options =STR     space delimited minimap2 parameters. By default ="-a --end-bonus 5 --eqx -N 50 -p 0.5"
-mp_options =STR      space delimited miniprot parameters. By default ""
-a A                  designate a feature mapped only if it aligns with coverage ≥A; by default A=0.5
-s S                  designate a feature mapped only if its child features (usually exons/CDS) align with sequence identity ≥S; by default S=0.5
-min_miniprot MIN_MINIPROT
                        The minimum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
                        exclusively by miniprot in the target genome, is set by default to MIN_MINIPROT=0.9.
-max_miniprot MAX_MINIPROT
                        The maximum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
                        exclusively by miniprot in the target genome, is set by default to MAX_MINIPROT=1.5.
-d D                  distance scaling factor; alignment nodes separated by more than a factor of D in the target genome will not be connected in the graph; by
                        default D=2.0
-flank F              amount of flanking sequence to align as a fraction [0.0-1.0] of gene length. This can improve gene alignment where gene structure differs
                        between target and reference; by default F=0.0

What's new in v1.0.9

Several v1.0.9 flags toggle behaviour that was introduced or promoted since v1.0.8. The tables below summarise, for each flag, whether it CHANGES the output annotation or is BYTE-NEUTRAL (same output bytes, different I/O or scheduling), and which flag restores the older behaviour.

Output-changing defaults (results differ vs v1.0.8 — opt-out restores old behaviour):

Default behaviour (v1.0.9)

Opt-out flag

Effect

Gene-like lift (lifts pseudogenes / ncRNA_gene / mobile elements, not just gene)

--gene-only

CHANGES output (adds features). --lift-gene-like is a deprecated no-op alias.

Best-of-outcome Liftoff/miniprot merge

--legacy-merge

CHANGES output. Keeps the higher-identity of {merge, Liftoff} per transcript. --optimize is a deprecated no-op alias.

Band-everything / windowed alignment

--full-dp-align

CHANGES output on divergent inputs only (identity-exact same-species). --fast-align is a deprecated no-op alias.

Miniprot-only rescue (default-ON)

--no-miniprot-rescue

CHANGES output (adds lifton_rescue=miniprot_only genes the DNA lift missed). --miniprot-rescue is a deprecated no-op alias. Env LIFTON_MINIPROT_RESCUE=0/1 force-disables/enables.

Byte-neutral performance flags (identical output, faster / lighter):

Flag

Effect (BYTE-IDENTICAL to the default output)

--stream

Pipe miniprot output into an in-memory database; skip the miniprot.gff3 disk round-trip and SQLite re-ingest.

--inmemory-liftoff

Feed Liftoff's lifted features to the database in-process; skip the liftoff.gff3 disk write and re-ingest.

--threads N --locus-pipeline

Fan out per-locus work across a thread pool; emitted in submission order so --threads N == --threads 1 byte-for-byte.

--native

Route miniprot through the in-process native facade and unlock in-process threading; falls back gracefully if mappy is absent.

--serial-aligners

Opt out of the (default) concurrent Liftoff/miniprot overlap; useful on core-constrained machines. --parallel-aligners is a deprecated no-op alias.

Validation flags (change the exit code / emit a report; do NOT change output bytes): --strict-gff (validate the reference annotation on input), --validate-output and --validate-verbose (re-validate the written output GFF3). A standalone gff3-validate console script is also installed alongside lifton.