OpenSpliceAI is an open‐source, efficient, and modular framework for splice site prediction. It is a reimplementation and extension of SpliceAI (Jaganathan et al., 2019) built on the modern PyTorch framework. OpenSpliceAI provides researchers with a user‐friendly suite of tools for studying transcript splicing - from creating training datasets and training models to predicting splice sites and assessing the impact of genetic variants.
Key Features#
Modern, Retrainable Framework: Built on Python 3 and PyTorch, OpenSpliceAI improves the limitations of older TensorFlow/Keras implementations. Its modular design enables fast and efficient prediction, as well as easy retraining on species-specific data with just a few commands.
Updated and Cross-Species Models: OpenSpliceAI includes a pre-trained human model, OSAIMANE-10000nt, updated from GRCh37 to GRCh38 using the latest MANE annotations, along with models for mouse, thale cress (Arabidopsis), honey bee, and zebrafish. This versatility empowers researchers to study splicing across diverse species.
Variant Impact Prediction: OpenSpliceAI not only predicts splice sites but also assesses the impact of genetic variants (SNPs and INDELs) on splicing. Its
variant
subcommand calculates “delta” scores that quantify changes in splice site strength and predicts cryptic splice sites.Efficiency and Scalability: Optimized for improved processing speeds, lower memory usage, and efficient GPU utilization, OpenSpliceAI can handle large genomic regions and whole-genome predictions on a single GPU.
Who Should Use OpenSpliceAI?#
Human Genomics Researchers: Use the newly retrained OpenSpliceAI model, OSAIMANE-10000nt, for highly accurate splice site predictions based on the latest human annotations.
Comparative and Non-Human Genomics: Whether you’re studying mouse, zebrafish, honey bee, or thale cress, OpenSpliceAI offers models pre-trained on multiple species — and the ability to train your own models — ensuring broad applicability.
Variant Analysts: If you need to predict how genetic variants affect splicing, OpenSpliceAI’s variant subcommand provides detailed delta scores and positional information to assess functional impacts.
What OpenSpliceAI Does#
Data Preprocessing (create-data): Converts genome FASTA and annotation (GFF/GTF) files into one-hot encoded datasets (HDF5 format) for training and testing.
Model Training (train): Trains deep residual convolutional neural networks on the preprocessed datasets. OpenSpliceAI supports training from scratch and employs adaptive learning rate schedulers and early stopping.
Transfer Learning (transfer): Fine-tunes a pre-trained human model for other species, reducing training time and improving performance on species with limited data.
Model Calibration (calibrate): Adjusts model output probabilities to better reflect true splice site likelihoods, enhancing prediction accuracy.
Prediction (predict): Uses trained models to generate splice site predictions from FASTA sequences, outputting BED files with donor and acceptor site coordinates.
Variant Analysis (variant): Annotates VCF files with delta scores and positions to evaluate the impact of genetic variants on splicing.
Cite Us#
If you use OpenSpliceAI in your research, please cite our work as well as the original SpliceAI paper:
Kuan-Hao Chao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species" bioRxiv coming soon!.
Kishore Jaganathan, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, Juan Arbelaez, Wenwu Cui, Grace B. Schwartz, Eric D. Chow, Efstathios Kanterakis, Hong Gao, Amirali Kia, Serafim Batzoglou, Stephan J. Sanders, and Kyle Kai-How Farh. "Predicting splicing from primary sequence with deep learning" Cell.
User Support & Contributors#
If you have questions, encounter issues, or would like to request a new feature, please use our GitHub issue tracker at: https://github.com/Kuanhao-Chao/OpenSpliceAI/issues
OpenSpliceAI was developed by Kuan-Hao Chao, Alan Mao, and collaborators at Johns Hopkins University. For further details on usage, methods, and performance, please refer to the full documentation and online methods sections.
Next Steps#
Check out the Installation Guide to get started with OpenSpliceAI. For a quick overview of the main commands and subcommands, see the Quick Start Guide.
Table of Contents#

