Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

Folding the unfoldable 2: using AlphaFold and ESMFold to explore spurious proteins

Preprint Created on 11 Jun 2026 bioRxiv

Motivation: Spurious protein sequences, resulting from gene prediction errors, theoretically should not yield folded structures. AlphaFold2 was previously shown to predict short spurious sequences with high pLDDT scores and was therefore unlikely to distinguish between real proteins and spurious proteins which are usually short. We evaluate whether newer structure prediction methods (ESMFold and AlphaFold3) similarly predict short sequences with high pLDDT or if they better discriminate between spurious and real proteins. Results: All three structure prediction methods (ESMFold, AlphaFold2, and AlphaFold3) predict short spurious sequences from AntiFam with unexpectedly high pLDDT scores, however the discrimination between spurious and real proteins improves beyond 100 amino acids. By analysing sequences with disparate pTM and pLDDT scores, we identified two likely spurious shadow ORFs in Swiss-Prot and one potentially non-spurious AntiFam entry. Using the structure prediction scores, we developed a Gaussian Process Model and evaluated its performance on AlphaFold DB, identifying potential spurious proteins at scale. While limited on its own, this model can increase confidence in spurious protein identification when combined with other methods.

Orr, A. K., Bateman, A.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 5
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement