Output files#
After running LiftOn, you will obtain a LiftOn GFF3 file and a lifton_output/
directory. More details are shown in the output directory hierarchy below. This page provides guidance on how to interpret your results.
LiftOn output directory hierarchy
# LiftOn output directory hierarchy
.
├── lifton.gff3
└── lifton_output
|
├── score.txt
|
├── liftoff
│ ├── liftoff.gff3
│ ├── liftoff.gff3_db
│ └── unmapped_features.txt
|
├── miniprot
│ ├── miniprot.gff3
│ └── miniprot.gff3_db
|
├── intermediate_files
│ ├── proteins.fa
│ ├── reference_all_genes.fa
│ ├── reference_all_to_target_all.sam
│ ├── ref_feature.txt
│ ├── ref_transcript.txt
│ └── transcripts.fa
|
└── stats
├── extra_copy_features.txt
├── mapped_feature.txt
├── mapped_transcript.txt
└── unmapped_features.txt
lifton.gff3#
This is the main output of LiftOn software. It is an annotation file of the target genome in GFF3 format. Following is an example of a gene locus. For more details about GFF3, check GFF3 file format.
lifton_output/#
This directory contains all LiftOn outputs, including the following:
1. score.txt#
It is a tsv file summarizing the LiftOn results.
Column definition
Transcript ID
Liftoff transcript protein sequence identity: \(0-1\)
miniprot transcript protein sequence identity: \(0-1\)
LiftOn transcript DNA sequence identity: \(0-1\)
LiftOn transcript protein sequence identity: \(0-1\)
The source of the annotation:
'Liftoff', 'miniprot', or 'LiftOn_chaining_algorithm'.
Mutation types
'synonymous', 'nonsynonymous', 'inframe_insertion', 'inframe_deletion', 'frameshift', 'stop_codon_gain', 'stop_codon_loss', and 'start_codon_loss".
Transcript locus coordinate:
<chromosome>:<start>-<end>
2. liftoff/#
The liftoff GFF3 annotatation generated during the LiftOn process.
3. miniprot/#
The miniprot GFF3 file generated during the LiftOn process.
4. intermediate_files/#
In this directory, it stores all intermdeiate files, including protein sequences (FASTA), gene seuqence to genome alignment (SAM), transcript sequences (FASTA), the type of the reference gene (coding or non-coding), and the type of the reference transcript (coding or non-coding).
5. stats/mapped_feature.txt#
It is a TSV file summarizing the number of features being mapped from the reference genome to the target genome.
Column definition
Feature ID
The number of feature copy
Feeature type: coding, non-coding, or other
6. stats/mapped_transcript.txt#
It is a TSV file summarizing the number of transcripts being mapped from the reference genome to the target genome.
Column definition
Transcript ID
The number of transcript copy
Transcript type: coding, non-coding, or other
7. stats/extra_copy_features.txt#
Similar to mapped_feature.txt, this TSV file summarizes the number of additional copies of genes and indicates whether they are coding or non-coding.
Column definition
Feature ID
The number of feature copy
Feeature type: coding, non-coding, or other
8. stats/unmapped_features.txt#
It is a TSV file summarizing unmapped gene IDs and their types.
Column definition
Gene ID
Feeature type: coding, non-coding, or other