/* ---- Google Analytics Code Below */

Tuesday, April 20, 2021

MBRL Tuning for Partially Understood Environments

Below is very technical,  but I do like some of the background statements such as 'solving tasks in a partially understood environment ...'.    And the idea of optimizing agents to resolve elements of understanding.  (Which exemplifies the situations we are often in).    So I am not saying I understand this yet, but working through it now for broader application.  As part of my broader study of practical reinforcement learning.

The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Nathan Lambert, Baohe Zhang, Raghu Rajan, Andr√© Biedenkapp    Apr 19, 2021  From BAIR  Berkeley

Model-based reinforcement learning (MBRL) is a variant of the iterative learning framework, reinforcement learning, that includes a structured component of the system that is solely optimized to model the environment dynamics. Learning a model is broadly motivated from biology, optimal control, and more – it is grounded in natural human intuition of planning before acting. This intuitive grounding, however, results in a more complicated learning process. In this post, we discuss how model-based reinforcement learning is more susceptible to parameter tuning and how AutoML can help in finding very well performing parameter settings and schedules. Below, left is the expected behavior of an agent maximizing velocity on a “Half Cheetah” robotic task, and to the right is what our paper with hyperparameter tuning finds.


Model-based reinforcement learning (MBRL) is an iterative framework for solving tasks in a partially understood environment. There is an agent that repeatedly tries to solve a problem, accumulating state and action data. With that data, the agent creates a structured learning tool – a dynamics model – to reason about the world. With the dynamics model, the agent decides how to act by predicting into the future. With those actions, the agent collects more data, improves said model, and hopefully improves future actions.  ... " 

No comments: