When should robots switch between model-based and model-free controllers?

Robots should switch between model-based and model-free controllers when the expected benefit of planning exceeds its computational and time cost, and vice versa. Richard S. Sutton, University of Alberta, and Andrew G. Barto, University of Massachusetts Amherst, distinguish these classes in their foundational work on reinforcement learning, describing model-based control as using an internal model to simulate consequences and model-free control as relying on cached action values learned from experience. Empirical and theoretical work by Nathaniel D. Daw, New York University, and Samuel J. Gershman, Harvard University, further shows that biological systems arbitrate between these strategies depending on uncertainty, novelty, and task demands.

Computational cost and time constraints

When latency or energy budgets are tight, the overhead of search and simulation makes model-free policies preferable. In industrial or embedded systems operating under strict power or real-time constraints, relying on learned policies reduces reaction time and conserves resources. Conversely, when decisions are rare, high-stakes, or when there is sufficient compute and time, model-based control improves robustness by enabling counterfactual reasoning and recovery from unforeseen states. Model misspecification remains a caveat because an imperfect model can produce misleading plans, so model validation and uncertainty quantification are critical before switching to planning.

Uncertainty, novelty, and social context

Switching favors model-based methods in novel or nonstationary environments where cached values are stale. Neuroscience and computational studies led by Nathaniel D. Daw show humans increase reliance on planning under uncertainty, a principle transferable to robots deploying in changing terrains or culturally sensitive settings. Human-robot interaction introduces cultural nuance: social norms vary across territories and communities, so robots that can simulate local consequences can avoid breaches of etiquette or safety, whereas routine domestic tasks benefit from efficient model-free habits.

Practical implementations use a meta-control or arbitration layer that estimates the expected value of computation and chooses the controller that maximizes net utility. This arbitration should incorporate estimated planning gains, model confidence, latency, energy cost, and the social or environmental stakes of errors. The consequence of mis-switching can be costly: excessive planning can waste resources and slow response, while over-reliance on cached policies can fail catastrophically in novel situations. Designing transparent switching criteria improves safety, explainability, and public trust, especially in domains such as healthcare, transportation, and environmental monitoring where territorial regulations and cultural expectations shape acceptable robot behavior.