Heterologous gene expression is widely used across biology and medicine, and often relies on codon optimization to increase protein yields. Here we uncover missplicing as a common and largely unrecognized failure mode of heterologous expression. Using systematically designed libraries comprising over 5,000 synthetic reporter genes and natural human cDNAs, we find that the majority of gene variants expressed in a human cell line are at least partially spliced, and in many variants the spliced isoform dominates, reducing protein output or ablating expression entirely. By analysing sequence determinants of expression across multiple human cell lines, we uncover a hierarchical architecture of regulatory control, where GC content establishes baseline mRNA levels, local sequence features influence splicing, and tissue-specific codon adaptation to tRNA pools fine-tunes translation efficiency. These findings enable us to develop predictive models of expression and splicing, benchmark current optimization strategies, and design a splice-aware optimization algorithm that substantially improves transgene performance.
Ahmad, M., Bellido Molias, F., Cano Aroca, L., Mordstein, C., Watson, S., Bhuiyan, N., Clarke, M. J., Kimchi-Sarfaty, C., Katneni, U., Gaunt, E. R., Hurst, L. D., Netuschil, N., Hofmeister, T., Liss, M., Kudla, G.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 11
- Comments 0
