The SpliceAI-toolkit is a flexible framework designed for easy retraining of the SpliceAI model with new datasets. It comes with models pre-trained on various species, including humans (MANE database), mice, thale cress (Arabidopsis), honey bees, and zebrafish. Additionally, the SpliceAI-toolkit is capable of processing genetic variants in VCF format to predict their impact on splicing.
Why SpliceAI-toolkit❓#
Easy-to-retrain framework: Transitioning from the outdated Python 2.7, along with older versions of TensorFlow and Keras, the SpliceAI-toolkit is built on Python 3.7 and leverages the powerful PyTorch library. This simplifies the retraining process significantly. Say goodbye to compatibility issues and hello to efficiency — retrain your models with just two simple commands.
Retrained on new dataset: SpliceAI is great, but SpliceAI-toolkit makes it even better! The newly pretrained SpliceAI-Human model is updated from GRCh37 to GRCh38 human genome and integrates the latest MANE (Matched Annotation from NCBI and EMBL-EBI) annotations, ensuring that research is supported by the most up-to-date and precise genomic data available.
Retrained on various species: Concerned that the SpliceAI model does not generalize to your study species because you are not studying humans? No problem! The SpliceAI-toolkit is released with models pretrained on various species, including human MANE, mouse, thale cress, honey bee, and zebrafish.
Predict the impact of genetic variants on splicing: Similar to SpliceAI, the SpliceAI-toolkit can take genetic variants in VCF format and predict the impact of these variants on splicing with any of the pretrained models.
SpliceAI-toolkit is open-source, free, and combines the ease of Python with the power of PyTorch for accurate splicing predictions.
Who is it for❓#
If you want to study splicing in humans, just use the newly pretrained human SpliceAI-MANE! Better annotation, better results!
If you want to do splicing research in other species, the SpliceAI-toolkit has you covered! It comes with models pretrained on various species! And you can easily train your own SpliceAI with your own genome & annotation data.
If you are interested in predicting the impact of genetic variants on splicing, SpliceAI-toolkit is the perfect tool for you!
What does SpliceAI-toolkit do❓#
The spliceai-toolkit
create-data
command takes a genome and annotation file as input and generates a dataset for training and testing your SpliceAI model.The spliceai-toolkit
train
command uses the created dataset to train your own SpliceAI model.To avoid retraining your SpliceAI model from the ground up, the spliceai-toolkit
fine-tune
command allows for the fine-tuning of the pretrained human model using your own created dataset. It tailors the model to better generalize to your specific species.The spliceai-toolkit
predict
command takes a random gene sequence and predicts the score of each position, determining whether it is a donor, acceptor, or neither.The spliceai-toolkit
variant
command takes a VCF file and predicts the impact of genetic variants on splicing.
Cite us#
Kuan-Hao Chao, Alan Mao, Anqi Liu, Mihaela Pertea, and Steven L. Salzberg. "SpliceAI-toolkit" bioRxiv.
Kishore Jaganathan, Sofia Kyriazopoulou Panagiotopoulou, Jeremy F. McRae, Siavash Fazel Darbandi, David Knowles, Yang I. Li, Jack A. Kosmicki, Juan Arbelaez, Wenwu Cui, Grace B. Schwartz, Eric D. Chow, Efstathios Kanterakis, Hong Gao, Amirali Kia, Serafim Batzoglou, Stephan J. Sanders, and Kyle Kai-How Farh. "Predicting splicing from primary sequence with deep learning" Cell.
User support#
Please go through the documentation below first. If you have questions about using the package, a bug report, or a feature request, please use the GitHub issue tracker here:
https://github.com/Kuanhao-Chao/spliceAI-toolkit/issues
Key contributors#
SpliceAI-toolkit was designed and developed by Kuan-Hao Chao. This documentation was written by Kuan-Hao Chao. The LiftOn logo was designed by Kuan-Hao Chao.
Table of contents#
Subcommands usage
Pretrained models

