/* ---- Google Analytics Code Below */

Friday, January 03, 2020

Upside Down Reinforcement Learning

New to me, in DSC, about new ways to look at reinforcement learning, see also posts on IRL.  See also other links to Inverse Reinforcement Learning.  How closely are these related?   Reading.

Reimagining Reinforcement Learning – Upside Down      Posted by William Vorhies 

Summary:  For all the hype around winning game play and self-driving cars, traditional Reinforcement Learning (RL) has yet to deliver as a reliable tool for ML applications.  Here we explore the main drawbacks as well as an innovative approach to RL that dramatically reduces the training compute requirement and time to train.

Ever since Reinforcement Learning (RL) was recognized as a legitimate third style of machine learning alongside supervised and unsupervised learning we’ve been waiting for that killer app to prove its value.

Yes RL has had some press-worthy wins in game play (Alpha Go), self-driving cars (not here yet), drone control, and even dialogue systems like personal assistants but the big breakthrough isn’t here yet.  ....

RL ought to be our go-to solution for any problem requiring sequential decisions and these individual successes might make you think that RL is ready for prime time but the reality is that it’s not.

Shortcomings of Reinforcement Learning

Romain Laroche, a Principal Researcher in RL at Microsoft points out several critical shortcomings.  And while there are several, the most severe problems to be overcome Laroche points out are these:

“They are largely unreliable. Even worse, two runs with different random seeds can yield very different results because of the stochasticity in the reinforcement learning process.”
“They require billions of samples to obtain their results and extracting such astronomical numbers of samples in real world applications isn’t feasible.”

In fact, if you read our last blog closely about the barriers to continuously improving AI, you would have seen that the increasing compute power necessary to improve the most advanced algorithms is rapidly approaching the point of becoming uneconomic.  And, that the most compute hungry among the examples tracked by OpenAI is AlphaGoZero, an RL game play algorithm requiring orders of magnitude more compute than the next closest deep learning application.

While Laroche’s research has lately focused on the reliability problem and he’s making some headway, if we don’t solve the compute requirement problem RL can’t take its rightful place as an important ML tool.

Upside Down Reinforcement Learning (UDRL) 


Two recent papers out of AI research organizations in Switzerland describe a unique and unexpected approach to this by literally turning the RL learning process upside down (Upside Down Reinforcement Learning UDRL).  Jürgen Schmidhuber and his colleagues say:

“Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques.”    ... '

No comments: