Quick Start Guide: variant#

This page summarizes how to use OpenSpliceAI's variant subcommand to assess the impact of genomic variants on splice sites.


Before You Begin#

  • Install OpenSpliceAI: Ensure you have installed OpenSpliceAI and its dependencies as described in the Installation page.

  • Check Example Scripts: We provide an example script examples/variant/variant.sh

  • Prepare you input files:
    • VCF File: A variant call format file containing SNPs or small INDELs.

    • Reference Genome (FASTA): Must match the reference used in the VCF.

    • Annotation File: Gene annotations to filter variants by genomic region.

    • Trained Model: One or more OpenSpliceAI model checkpoints (PyTorch or Keras).


One-liner Start#

  1. Variants: input.vcf

  2. Reference FASTA: hg19.fa

  3. Annotation File: grch37.txt

  1. A pre-trained OpenSpliceAI model or directory of models:

Run:

openspliceai variant \
  -R data/ref_genome/homo_sapiens/GRCh37/hg19.fa \
  -A examples/data/grch37.txt \
  -m models/spliceai-mane/400nt/ \
  -f 400 \
  -t pytorch \
  -I data/vcf/input.vcf \
  -O examples/variant/output.vcf

This command:

  • Loads the VCF variants and checks them against the reference genome.

  • Predicts donor/acceptor scores for both wild-type and mutant sequences within ±50 nt.

  • Outputs an annotated VCF (output.vcf) with delta scores and positions for donor/acceptor gain or loss.


Next Steps#

  • Review: Inspect the appended INFO fields in the VCF for delta scores and their positions.

  • Further Analysis: Filter or rank variants by largest delta scores to prioritize functional splicing impacts.

Congratulations

Congratulations! You have gone through all subcommands of OpenSpliceAI.