Automated analysis of dairy cow vocalizations has largely relied on supervised classifiers evaluated within a single farm, a setting that inflates apparent performance and gives no measure of how far predictions can be trusted. We address this with a three-layer framework that separates acoustic structure discovery, proxy-state inference, and reliability assessment, evaluated on 569 annotated clips from three commercial dairy farms. A frozen self-supervised speech encoder, latent-space segmentation, and stability-guided clustering convert continuous recordings into discrete acoustic units without behavioral labels. Proxy-state signal is then tested under audio-only, audio-plus-context, and leave-one-farm-out (LOFO) protocols designed to separate transferable acoustic structure from farm-specific shortcuts. The results suggest that cross-farm generalizability differs substantially across biologically distinct vocalization categories. Non-vocal physiological sounds transfer across farms (LOFO macro-F1 = 0.763) and calibrate well (expected calibration error reduced from 0.087 to 0.023), whereas resource-related calls collapse to a majority-class baseline (macro-F1 = 0.500) and distress-related calls degrade under farm holdout. Selective prediction improves the retained-set score of the multiclass functional proxy (0.407 to 0.430), and an end-to-end convolutional baseline matches or exceeds the framework on raw accuracy for the easier targets yet yields a roughly two- to six-fold larger calibration error and offers no abstention. Random cross-validation consistently overstates cross-farm utility. These findings show that acoustic models for livestock monitoring require reliability-aware evaluation rather than flat classification.
Kate, M., Neethirajan, S.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 0
- Comments 0
