Train your own OpenSpliceAI Models#
OpenSpliceAI offers a streamlined pipeline for training splice site prediction models on different species. Whether you aim to build a model using the Human MANE annotations or train a model for mouse, our framework provides a modular, efficient, and reproducible process.
The training workflow typically involves:
Data Preprocessing: Use the
create-data
subcommand to convert your reference genome and gene annotation files into HDF5-formatted training and test datasets (Quick Start Guide: create-data).Model Training: Train your model from scratch using the
train
subcommand, which employs a deep residual CNN architecture with adaptive learning rate scheduling and early stopping for optimal performance (Quick Start Guide: train).(Optional) Model Calibration: Fine-tune the output probabilities using the
calibrate
subcommand, ensuring that the predicted splice site probabilities accurately reflect true likelihoods (Quick Start Guide: calibrate).
For specific examples, please refer to the following OpenSpliceAI examples:
OpenSpliceAI Examples
Train Your Own Model – Human (MANE) — Train the Human (MANE) model.
Train Your Own Model – Mouse — Train a model using mouse data.

