The ability to derive neural-level language coding models holds great scientific and clinical potential. Current approaches are limited by the scale and ethological validity of input data; applications requiring large, rare, or naturalistic samples in particular would benefit from the ability to infer neural coding from incidental everyday speech. Here we present a novel pipeline designed to leverage spontaneous and incidental naturalistic speech. This pipeline performs transcription, segmentation, and video-assisted diarization, as well as alignment and spike detection of neural data. We apply this pipeline to a dataset derived from 21 patients (6+ days each, over 800 hours and 5 million words total). We benchmark both encoding and decoding models against extensive and rare ground-truth control datasets consisting of human-curated word-level temporal alignment and manually sorted spikes. We further validate our approach by quantifying representational drift, effect of dataset size, and differences between six brain areas. Together, these findings demonstrate that incidental natural speech is sufficiently processed in the brain to enable the estimation neural-level embeddings.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 5
- Comments 0
