Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

Polynomial Trajectory Compression for Protein Language Model Embeddings

Preprint Created on 07 Jun 2026 bioRxiv

Protein language models (PLMs) generate rich, layer-wise embeddings that capture diverse biological information but are expensive in terms of storage and computation at scale. In this work, we propose a compact surrogate representation for PLM embeddings across transformer layers using low-dimensional PCA projections and cubic polynomial trajectories. This approach enables efficient storage and on-demand reconstruction of these protein-level embeddings at any layer without rerunning the PLM. We evaluate our method on two downstream tasks: protein protein interaction and subcellular localization using ESM-35M and ESM-3B PLM. We show that the surrogate embeddings achieve high reconstruction fidelity while reducing storage and computational requirements significantly. The new approach also retains downstream task prediction performance compared to original embeddings. Our approach provides a scalable and practical solution for large-scale protein embedding storage and reuse.

Sahni, H., Chen, X., Estrada, T.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 11
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement