Accurate annotation of intermediate cell states remains a major challenge in single-cell RNA sequencing (scRNA-seq), particularly in continuous differentiation systems such as erythropoiesis. Existing reference-based methods often lack the resolution required to distinguish early and transitional erythroid progenitors and may generalise poorly across datasets and modalities. Here, we present a supervised framework for erythroid lineage annotation based on expert-curated training data that integrates bulk and single-cell transcriptomic information. Starting from a human bone marrow scRNA-seq atlas, we refined erythroid annotations by introducing previously unresolved progenitor stages, including burst-forming unit-erythroid (BFU-E), colony-forming unit-erythroid (CFU-E), and pro-erythroblast (ProE), guided by canonical marker genes and bulk RNA-seq references. We trained and benchmarked four classical machine learning models and identified LightGBM as the best-performing approach, achieving a validation macro F1-score of 0.821 and balanced accuracy of 0.826. On a held-out test set, the model showed strong performance across most erythroid stages, with errors largely confined to adjacent differentiation states. The classifier was further transferred to independent bulk RNA-seq samples and an external bone marrow scRNA-seq dataset, where it recovered expected erythroid progression and refined coarse-grained annotations into higher-resolution cell states. Together, these results show that expert-curated supervised learning can improve erythroid cell state annotation in scRNA-seq and provide a practical framework for studying differentiation hierarchies in settings where finely resolved public references are limited.
Enderti, A., Stranieri, N., Riva, S. G., Hughes, J. R.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 3
- Comments 0
