Train your own OpenSpliceAI Models#

OpenSpliceAI offers a streamlined pipeline for training splice site prediction models on different species. Whether you aim to build a model using the Human MANE annotations or train a model for mouse, our framework provides a modular, efficient, and reproducible process.

The training workflow typically involves:

  • Data Preprocessing: Use the create-data subcommand to convert your reference genome and gene annotation files into HDF5-formatted training and test datasets (Quick Start Guide: create-data).

  • Model Training: Train your model from scratch using the train subcommand, which employs a deep residual CNN architecture with adaptive learning rate scheduling and early stopping for optimal performance (Quick Start Guide: train).

  • (Optional) Model Calibration: Fine-tune the output probabilities using the calibrate subcommand, ensuring that the predicted splice site probabilities accurately reflect true likelihoods (Quick Start Guide: calibrate).

For specific examples, please refer to the following OpenSpliceAI examples:

OpenSpliceAI Examples