Output files#

After running LiftOn, you will obtain a LiftOn GFF3 file and a lifton_output/ directory. More details are shown in the output directory hierarchy below. This page provides guidance on how to interpret your results.

LiftOn output directory hierarchy

# LiftOn output directory hierarchy
.
├── lifton.gff3
└── lifton_output
   |
   ├── score.txt
   |
   ├── liftoff
   │   ├── liftoff.gff3
   │   ├── liftoff.gff3_db
   │   └── unmapped_features.txt
   |
   ├── miniprot
   │   ├── miniprot.gff3
   │   └── miniprot.gff3_db
   |
   ├── intermediate_files
   │   ├── proteins.fa
   │   ├── reference_all_genes.fa
   │   ├── reference_all_to_target_all.sam
   │   ├── ref_feature.txt
   │   ├── ref_transcript.txt
   │   └── transcripts.fa
   |
   └── stats
      ├── extra_copy_features.txt
      ├── mapped_feature.txt
      ├── mapped_transcript.txt
      └── unmapped_features.txt


lifton.gff3#

This is the main output of LiftOn software. It is an annotation file of the target genome in GFF3 format. Following is an example of a gene locus. For more details about GFF3, check GFF3 file format.



lifton_output/#

This directory contains all LiftOn outputs, including the following:

1. score.txt#

It is a tsv file summarizing the LiftOn results.

Column definition

  1. Transcript ID

  2. Liftoff transcript protein sequence identity: \(0-1\)

  3. miniprot transcript protein sequence identity: \(0-1\)

  4. LiftOn transcript DNA sequence identity: \(0-1\)

  5. LiftOn transcript protein sequence identity: \(0-1\)

  6. The source of the annotation:

    • 'Liftoff', 'miniprot', or 'LiftOn_chaining_algorithm'.

  7. Mutation types

    • 'synonymous', 'nonsynonymous', 'inframe_insertion', 'inframe_deletion', 'frameshift', 'stop_codon_gain', 'stop_codon_loss', and 'start_codon_loss".

  8. Transcript locus coordinate: <chromosome>:<start>-<end>


2. liftoff/#

The liftoff GFF3 annotatation generated during the LiftOn process.


3. miniprot/#

The miniprot GFF3 file generated during the LiftOn process.


4. intermediate_files/#

In this directory, it stores all intermdeiate files, including protein sequences (FASTA), gene seuqence to genome alignment (SAM), transcript sequences (FASTA), the type of the reference gene (coding or non-coding), and the type of the reference transcript (coding or non-coding).


5. stats/mapped_feature.txt#

It is a TSV file summarizing the number of features being mapped from the reference genome to the target genome.

Column definition

  1. Feature ID

  2. The number of feature copy

  3. Feeature type: coding, non-coding, or other


6. stats/mapped_transcript.txt#

It is a TSV file summarizing the number of transcripts being mapped from the reference genome to the target genome.

Column definition

  1. Transcript ID

  2. The number of transcript copy

  3. Transcript type: coding, non-coding, or other


7. stats/extra_copy_features.txt#

Similar to mapped_feature.txt, this TSV file summarizes the number of additional copies of genes and indicates whether they are coding or non-coding.

Column definition

  1. Feature ID

  2. The number of feature copy

  3. Feeature type: coding, non-coding, or other


8. stats/unmapped_features.txt#

It is a TSV file summarizing unmapped gene IDs and their types.

Column definition

  1. Gene ID

  2. Feeature type: coding, non-coding, or other