User Manual
LiftOn
====================================================================
An accurate homology lift-over tool between assemblies
====================================================================
██╗ ██╗███████╗████████╗ ██████╗ ███╗ ██╗
██║ ██║██╔════╝╚══██╔══╝██╔═══██╗████╗ ██║
██║ ██║█████╗ ██║ ██║ ██║██╔██╗ ██║
██║ ██║██╔══╝ ██║ ██║ ██║██║╚██╗██║
███████╗██║██║ ██║ ╚██████╔╝██║ ╚████║
╚══════╝╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═══╝
v1.0.9
usage: lifton [-h] [-E] [-EL] [-c] [--no-orf-search] [-o FILE] [-u FILE] [-exclude_partial]
[-mm2_options =STR] [-mp_options =STR] [-a A] [-s S] [-min_miniprot MIN_MINIPROT]
[-max_miniprot MAX_MINIPROT] [-d D] [-flank F] [-V] [-D] [-t THREADS] [-m PATH]
[-f TYPES] [-infer-genes] [-infer_transcripts] [-chroms TXT] [-unplaced TXT]
[-copies] [-sc SC] [-overlap O] [-mismatch M] [-gap_open GO] [-gap_extend GE]
[-polish] [-cds] [-time] [--validate-output] [--validate-verbose] [--strict-gff]
[--stream] [--inmemory-liftoff] [--locus-pipeline] [--native] [--serial-aligners]
[--parallel-aligners] [--optimize] [--legacy-merge] [--full-dp-align] [--fast-align]
[--gene-only] [--lift-gene-like] [--no-miniprot-rescue] [--miniprot-rescue]
[--merge-strategy STRATEGY] [--id-spec ID_SPEC] [--force] [--verbose]
[--no-auto-convert-gtf] -g GFF [-P FASTA] [-T FASTA] [-L gff] [-M gff] [-ad SOURCE]
target reference
Lift features from one genome assembly to another
* Required input (sequences):
target target fasta genome to lift genes to
reference reference fasta genome to lift genes from
* Required input (Reference annotation):
-g GFF, --reference-annotation GFF
the reference annotation file to lift over in GFF or GTF format (or) name of feature database; if not specified, the -g argument must be
provided and a database will be built automatically
* Optional input (Reference sequences):
-P FASTA, --proteins FASTA
the reference protein sequences.
-T FASTA, --transcripts FASTA
the reference transcript sequences.
* Optional input (Liftoff annotation):
-L gff, --liftoff gff
the annotation generated by Liftoff (or) name of Liftoff gffutils database; if not specified, the -liftoff argument must be provided and a
database will be built automatically
* Optional input (miniprot annotation):
-M gff, --miniprot gff
the annotation generated by miniprot (or) name of miniprot gffutils database; if not specified, the -miniprot argument must be provided
and a database will be built automatically
* gffutils parameters:
--merge-strategy {create_unique,merge,error,warning,replace}
strategy for merging features when building the database; default "create_unique"
--id-spec ID_SPEC attribute to use as feature ID; default "ID"
--force overwrite existing database
--verbose enable verbose output
* Output settings:
-o FILE, --output FILE
write output to FILE in same format as input; by default, output is written to "lifton.gff3"
-u FILE write unmapped features to FILE; default is "unmapped_features.txt"
-exclude_partial write partial mappings below -s and -a threshold to unmapped_features.txt; if true partial/low sequence identity mappings will be included
in the gff file with partial_mapping=True, low_identity=True in comments
* Miscellaneous settings:
-h, --help show this help message and exit
-E, --evaluation Run LiftOn in evaluation mode
-EL, --evaluation-liftoff-chm13
Run LiftOn in evaluation mode
-c, --write_chains Write chaining files
--no-orf-search do not perform open reading frame (ORF) search
-V, --version show program version
-D, --debug Run debug mode
-t THREADS, --threads THREADS
use t parallel processes to accelerate alignment; by default t=1
-m PATH Minimap2 path
-f TYPES, --features TYPES
list of feature types to lift over (an explicit -f overrides gene-like auto-detection)
-infer-genes use if annotation file only includes transcripts, exon/CDS features; auto-enabled for GTF
-infer_transcripts use if annotation file only includes exon/CDS features and does not include transcripts/mRNA; auto-enabled for GTF
-chroms TXT comma seperated file with corresponding chromosomes in the reference,target sequences
-unplaced TXT text file with name(s) of unplaced sequences to map genes from after genes from chromosomes in chroms.txt are mapped; default is
"unplaced_seq_names.txt"
-copies look for extra gene copies in the target genome
-sc SC with -copies, minimum sequence identity in exons/CDS for which a gene is considered a copy; must be greater than -s; default is 1.0
-overlap O maximum fraction [0.0-1.0] of overlap allowed by 2 features; by default O=0.1
-mismatch M mismatch penalty in exons when finding best mapping; by default M=2
-gap_open GO gap open penalty in exons when finding best mapping; by default GO=2
-gap_extend GE gap extend penalty in exons when finding best mapping; by default GE=1
-polish
-cds annotate status of each CDS (partial, missing start, missing stop, inframe stop codon)
-time, --measure_time enable time measurement for each step (writes time.txt)
-ad SOURCE, --annotation-database SOURCE
The source of the reference annotation (RefSeq / GENCODE / others)
--no-auto-convert-gtf disable automatic GTF -> GFF3 conversion
* Validation (exit-code only; does not change output bytes):
--strict-gff run the NCBI GFF3 input-side validator on the reference annotation; exit non-zero on any spec violation
--validate-output re-validate the generated output GFF3 after writing; print a structured report to stderr
--validate-verbose with --validate-output, also print warnings (not just errors)
* Performance fast-paths (BYTE-IDENTICAL to the default output):
--stream pipe miniprot output into an in-memory database; skip the miniprot.gff3 disk round-trip / SQLite re-ingest
--inmemory-liftoff feed Liftoff's lifted features to the database in-process; skip the liftoff.gff3 disk write / re-ingest
--locus-pipeline fan out per-locus work through a thread pool sized by --threads; emitted in submission order (byte-identical to -t 1)
--native route miniprot through the in-process native facade + unlock in-process threading; falls back gracefully if mappy is absent
--serial-aligners opt OUT of the (default) concurrent Liftoff||miniprot overlap; run them sequentially
* Output-changing flags (opt-outs that RESTORE pre-v1.0.9 behaviour):
--legacy-merge restore the pre-promotion UNCONDITIONAL Liftoff/miniprot merge (default = verified best-of-outcome merge)
--full-dp-align restore the exact giant-only full-DP alignment (default = band-everything anchor-windowed alignment)
--gene-only restore the pre-v1.0.9 gene-only lift (default = lift all auto-detected gene-like top-level types)
--no-miniprot-rescue disable the (default-ON) miniprot-only rescue pass for genes the DNA lift missed entirely
* Deprecated NO-OP aliases (kept for backward compatibility; have no effect):
--parallel-aligners no-op (concurrent Step 4 is now default; use --serial-aligners to opt out)
--optimize no-op (best-of-outcome merge is now default; use --legacy-merge to opt out)
--fast-align no-op (band-everything alignment is now default; use --full-dp-align to opt out)
--lift-gene-like no-op (gene-like lift is now default; use --gene-only to opt out)
--miniprot-rescue no-op (miniprot-only rescue is now default; use --no-miniprot-rescue to opt out)
Alignments:
-mm2_options =STR space delimited minimap2 parameters. By default ="-a --end-bonus 5 --eqx -N 50 -p 0.5"
-mp_options =STR space delimited miniprot parameters. By default ""
-a A designate a feature mapped only if it aligns with coverage ≥A; by default A=0.5
-s S designate a feature mapped only if its child features (usually exons/CDS) align with sequence identity ≥S; by default S=0.5
-min_miniprot MIN_MINIPROT
The minimum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
exclusively by miniprot in the target genome, is set by default to MIN_MINIPROT=0.9.
-max_miniprot MAX_MINIPROT
The maximum length ratio of a protein-coding transcript to the longest protein-coding transcript within a gene locus, as identified
exclusively by miniprot in the target genome, is set by default to MAX_MINIPROT=1.5.
-d D distance scaling factor; alignment nodes separated by more than a factor of D in the target genome will not be connected in the graph; by
default D=2.0
-flank F amount of flanking sequence to align as a fraction [0.0-1.0] of gene length. This can improve gene alignment where gene structure differs
between target and reference; by default F=0.0
What's new in v1.0.9
Several v1.0.9 flags toggle behaviour that was introduced or promoted since v1.0.8. The tables below summarise, for each flag, whether it CHANGES the output annotation or is BYTE-NEUTRAL (same output bytes, different I/O or scheduling), and which flag restores the older behaviour.
Output-changing defaults (results differ vs v1.0.8 — opt-out restores old behaviour):
Default behaviour (v1.0.9) |
Opt-out flag |
Effect |
|---|---|---|
Gene-like lift (lifts pseudogenes / |
|
CHANGES output (adds features). |
Best-of-outcome Liftoff/miniprot merge |
|
CHANGES output. Keeps the higher-identity of {merge, Liftoff} per transcript. |
Band-everything / windowed alignment |
|
CHANGES output on divergent inputs only (identity-exact same-species). |
Miniprot-only rescue (default-ON) |
|
CHANGES output (adds |
Byte-neutral performance flags (identical output, faster / lighter):
Flag |
Effect (BYTE-IDENTICAL to the default output) |
|---|---|
|
Pipe miniprot output into an in-memory database; skip the |
|
Feed Liftoff's lifted features to the database in-process; skip the |
|
Fan out per-locus work across a thread pool; emitted in submission order so |
|
Route miniprot through the in-process native facade and unlock in-process threading; falls back gracefully if |
|
Opt out of the (default) concurrent Liftoff/miniprot overlap; useful on core-constrained machines. |
Validation flags (change the exit code / emit a report; do NOT change output bytes):
--strict-gff (validate the reference annotation on input), --validate-output
and --validate-verbose (re-validate the written output GFF3). A standalone
gff3-validate console script is also installed alongside lifton.