Output files#

After running LiftOn, you will obtain a LiftOn GFF3 file and a lifton_output/ directory. More details are shown in the output directory hierarchy below. This page provides guidance on how to interpret your results.

LiftOn output directory hierarchy

# LiftOn output directory hierarchy
.
├── lifton.gff3
└── lifton_output
   |
   ├── score.txt
   |
   ├── liftoff
   │   ├── liftoff.gff3
   │   ├── liftoff.gff3_db
   │   └── unmapped_features.txt
   |
   ├── miniprot
   │   ├── miniprot.gff3
   │   └── miniprot.gff3_db
   |
   ├── intermediate_files
   │   ├── proteins.fa
   │   ├── reference_all_genes.fa
   │   ├── reference_all_to_target_all.sam
   │   ├── ref_feature.txt
   │   ├── ref_transcript.txt
   │   └── transcripts.fa
   |
   └── stats
      ├── extra_copy_features.txt
      ├── mapped_feature.txt
      ├── mapped_transcript.txt
      └── unmapped_features.txt

lifton.gff3#

This is the main output of LiftOn software. It is an annotation file of the target genome in GFF3 format. Following is an example of a gene locus. For more details about GFF3, check GFF3 file format.

lifton_output/#

This directory contains all LiftOn outputs, including the following:

1. score.txt#

It is a tsv file summarizing the LiftOn results.

Column definition

Transcript ID

Liftoff transcript protein sequence identity: \(0-1\)

miniprot transcript protein sequence identity: \(0-1\)

LiftOn transcript DNA sequence identity: \(0-1\)

LiftOn transcript protein sequence identity: \(0-1\)

The source of the annotation:

'Liftoff', 'miniprot', or 'LiftOn_chaining_algorithm'.

Mutation types

'synonymous', 'nonsynonymous', 'inframe_insertion', 'inframe_deletion', 'frameshift', 'stop_codon_gain', 'stop_codon_loss', and 'start_codon_loss".

Transcript locus coordinate: <chromosome>:<start>-<end>

2. liftoff/#

The liftoff GFF3 annotatation generated during the LiftOn process.

3. miniprot/#

The miniprot GFF3 file generated during the LiftOn process.

4. intermediate_files/#

In this directory, it stores all intermdeiate files, including protein sequences (FASTA), gene seuqence to genome alignment (SAM), transcript sequences (FASTA), the type of the reference gene (coding or non-coding), and the type of the reference transcript (coding or non-coding).

5. stats/mapped_feature.txt#

It is a TSV file summarizing the number of features being mapped from the reference genome to the target genome.

Column definition

Feature ID

The number of feature copy

Feeature type: coding, non-coding, or other

6. stats/mapped_transcript.txt#

It is a TSV file summarizing the number of transcripts being mapped from the reference genome to the target genome.

Column definition

Transcript ID

The number of transcript copy

Transcript type: coding, non-coding, or other

7. stats/extra_copy_features.txt#

Similar to mapped_feature.txt, this TSV file summarizes the number of additional copies of genes and indicates whether they are coding or non-coding.

Column definition

Feature ID

The number of feature copy

Feeature type: coding, non-coding, or other

8. stats/unmapped_features.txt#

It is a TSV file summarizing unmapped gene IDs and their types.

Column definition

Gene ID

Feeature type: coding, non-coding, or other