Optimal Practice Schedules in a Dual-Rate Model of Motor Adaptation, and Their Recovery by Reinforcement Learning

A clinician guiding a stroke patient through a 45-minute rehabilitation session, a coach planning a training day, a teacher choosing the order of practice problems, they all face the same question: ``given everything practiced so far, what should the next trial be?" The motor-learning literature offers two coarse answers, blocked and interleaved (``random'') practice, with a well-known dissociation, blocked practice gives faster acquisition but worse retention, while interleaved practice gives the opposite. We argue that this dissociation is not a fixed property of practice schedules but a shadow of a richer structure. In particular, for a learner whose memory has a fast shared component and slower context-specific components, the best schedule should be a function of the learner's current internal state and the time remaining before the retention probe. We make this precise in a minimal two-context fast--slow learner model whose optimal schedules can be computed exactly for short sessions and approximated by a structured beam-search upper bound for longer ones. The optimal schedule is not blocked, not interleaved, and not a single rule; it is a family of schedules determined by how much retention is weighted relative to acquisition. The family has three regimes (alternating, mixed, blocked-with-late-correction) and for long sessions, the optimal schedule has an interpretable structure --- exploit one context, repair the neglected one, then interleave to lock in retention. We then investigate whether a reinforcement-learning teacher, observing only the learner's actions and errors without access to their internal memory states, can learn these optimal policies from interaction alone. Comparing these learned policies against the exact optima, we show that a model-free agent (PPO) recovers the short-horizon schedules and the long-horizon block--repair--interleave motif in the intermediate regime, but the benchmark also exposes a sharp failure in the acquisition-dominated regime, where PPO collapses to pure blocking and misses a sparse terminal correction. A warm-start diagnostic shows this failure is a genuine metastability of policy gradients rather than a tuning artifact, with blocked-plus-switch and pure-blocked acting as competing attractors that PPO cannot stabilize between. A hyperparameter sweep over observation history reveals that the agent requires very little behavioral context to plan optimally, demonstrating that partial observability is not a major barrier to finding optimal practice schedules. Finally, we discuss the implications of our framework for motor adaptation and contextual interference, offering practical insights on how instructors can design finite practice sessions to favor long-term retention.

Jeter, R., Todorov, D., Molkov, Y.

Attention!

To access all content shared on our platform and the source link, please sign up for an account. If you already have an account, sign in, or connect with LinkedIn, Google.

Stats

Recommendations n/a n/a positive of 0 vote(s)
Views 1
Comments 0

Comments

There are no comments yet.

Attention!

Stats

Recommended by

Post a comment

Comments