Single-cell foundation models are large, self-supervised deep learning networks pretrained on millions of cellular transcriptomes. These models promise to deliver cell representations that are transferable across diverse biological domains and, when used in specific tasks, would outperform narrowly scoped models. A central assumption is that more pretraining data translates to better downstream performance. However, despite its centrality, this assumption remains largely untested. Here, we tested downstream performance on gold-standard benchmarking tasks across massive dataset reductions, showing that performance was largely insensitive to pretraining data size once finetuning was allowed. This trend reveals a finetuning masking effect that offsets differences in representation quality induced by pretraining, making the benefit of additional pretraining scale largely invisible under current benchmark settings. These findings challenge current benchmarking standards, which rely on closed-ended finetuning tasks that are too narrow to expose the full representational value of pretraining. They also challenge the main driving force in single-cell foundation-model development when evaluated through common narrow tasks. We propose that the next generation of foundation models should be assessed less by performance on highly optimised finetuning tasks and more by their ability to support open-ended biological inference, frozen-representation evaluation and zero-shot capability.
Shakeel, M. H., Shen, M., Mangiola, S.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 10
- Comments 0
