Quick Start Guide: transfer#

This guide walks you through the essential steps for using the transfer subcommand to fine-tune your own OpenSpliceAI model. By leveraging a pre-trained model (for example, a human-trained model), you can convert HDF5 datasets - generated via the create-data subcommand - into a tailored deep learning model for splice site prediction.


Before You Begin#

  • Installation: Follow the instructions on the Installation page to install OpenSpliceAI along with all necessary dependencies.

  • Pre-trained Model: Obtain a pre-trained OpenSpliceAI model in .pt format (e.g., model_10000nt_rs10.pt ).

  • Dataset Preparation: Generate the training and testing datasets for your species of interest using the create-data subcommand. See the Quick Start Guide: create-data guide for details. You will need dataset_train.h5 and dataset_test.h5.

  • Check Example Scripts: We provide an example script examples/transfer/transfer_example.sh


One-liner Start#

If you have already generated the necessary files with the create-data subcommand — or if you prefer to download them directly from GitHub—proceed with the steps below:

  1. Training Dataset: dataset_train.h5

  2. Testing Dataset: dataset_test.h5

  3. Pre-trained Model: Download a pre-trained model from the OpenSpliceAI GitHub repository. For example: model_10000nt_rs10.pt

Execute the following command to initiate transfer learning:

openspliceai transfer \
   --train-dataset dataset_train.h5 \
   --test-dataset dataset_test.h5 \
   --pretrained-model model_10000nt_rs10.pt \
   --flanking-size 10000 \
   --unfreeze-all \
   --epochs 10 \
   --early-stopping \
   --project-name new_species_transfer \
   --output-dir ./transfer_out/

This command will:

  • Load the pre-trained model (model_10000nt_rs10.pt).

  • Unfreeze all layers (using --unfreeze-all).

  • Fine-tuning the model on your custom dataset for 10 epochs, saving logs and checkpoints in the transfer_out/ directory.

Note

Please note that the model transfer-learned in this experiment is not optimized for splice site prediction, as it was fine-tuned only on a small subset of the data. This example is intended solely to demonstrate the transfer-learning process. For a fully optimized, pre-trained model, please refer to the Released OpenSpliceAI models guide.


Next Steps#

After completing transfer learning, consider the following actions:

  • Explore ``transfer`` Options: Review the transfer documentation to discover additional customization options for your transfer-learning process.

  • Calibration (Optional): Enhance the reliability of your model’s probability outputs by following the guidelines in the Quick Start Guide: calibrate guide.

  • Prediction: To deploy your newly trained model for splice site prediction, see the Quick Start Guide: predict guide.

  • Advanced Options: Experiment with further training parameters (such as adjusting the number of epochs or the patience value) to optimize model performance.