Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

Sequence-Based Therapeutic Peptide Classification with Augmented Negative Sampling

Preprint Created on 11 Jun 2026 bioRxiv

Therapeutic peptides offer high target specificity, low toxicity, and the ability to modulate protein-protein interactions, yet experimental functional characterization remains costly and slow. Computational prediction of therapeutic function directly from sequence could accelerate peptide screening and enable generative design pipelines, but requires reliable discrimination between therapeutic and non-therapeutic peptides. Existing multi-label predictors cover few functions, rely on limited datasets, and exhibit high glspl{fpr}, limiting their practical utility. We present a lightweight CNN classifier trained on the most comprehensive therapeutic peptide database to date (54,655 peptides, 48 functional categories). A key contribution is a statistically motivated negative sampling strategy using Markov models to generate diverse synthetic decoys at multiple difficulty levels. When evaluated on this controlled decoy benchmark, the FRP is reduced from over 60% for previous models to 2.1% for our approach. Our fine-tuned five-model ensemble achieves 78.9% Micro F1 and 54.6% Macro F1 while requiring only amino acid sequences as inputs. Analysis using a sparse L1-constrained variant of our model shows that convolutional filters capture conserved functional motifs and statistically improbable non-therapeutic patterns, with downstream layers combining these signals, providing mechanistic evidence that the network learns biologically meaningful structure. In a generalization task on the TPpred-LE benchmark, our model achieves 55.3% Micro F1 and 38.6% Macro F1, comparable to TPpred-LE trained on its native dataset (57.9%/38.1%) while predicting four times more therapeutic functions with four times fewer parameters. Code and models will be made available at https://github.com/terra-quantum-public/tq-therapep-ai.

Ellerbrock, R., Valentini, A., Paul, A. C., Mukhopadhyay, S., Perelshtein, M. R.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 14
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement