/* ---- Google Analytics Code Below */

Thursday, April 12, 2018

Reinforcement Learning

Lengthy musings on the capabilities of Reinforcement Learning.  Which in one way feels very attractive, but as a former classical optimization oriented analyst, I have my doubts.    Still intriguing piece about the challenge.  A considerable and technical piece.

Deep Reinforcement Learning Doesn't Work Yet
This mostly cites papers from Berkeley, Google Brain, DeepMind, and OpenAI from the past few years, because that work is most visible to me. I’m almost certainly missing stuff from older literature and other institutions, and for that I apologize - I’m just one guy, after all.

Introduction
Once, on Facebook, I made the following claim.

Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time.

Deep reinforcement learning is surrounded by mountains and mountains of hype. And for good reasons! Reinforcement learning is an incredibly general paradigm, and in principle, a robust and performant RL system should be great at everything. Merging this paradigm with the empirical power of deep learning is an obvious fit. Deep RL is one of the closest things that looks anything like AGI, and that’s the kind of dream that fuels billions of dollars of funding.

Unfortunately, it doesn’t really work yet.

Now, I believe it can work. If I didn’t believe in reinforcement learning, I wouldn’t be working on it. But there are a lot of problems in the way, many of which feel fundamentally difficult. The beautiful demos of learned agents hide all the blood, sweat, and tears that go into creating them.

Several times now, I’ve seen people get lured by recent work. They try deep reinforcement learning for the first time, and without fail, they underestimate deep RL’s difficulties. Without fail, the “toy problem” is not as easy as it looks. And without fail, the field destroys them a few times, until they learn how to set realistic research expectations.

This isn’t the fault of anyone in particular. It’s more of a systemic problem. It’s easy to write a story around a positive result. It’s hard to do the same for negative ones. The problem is that the negative ones are the ones that researchers run into the most often. In some ways, the negative cases are actually more important than the positives.

In the rest of the post, I explain why deep RL doesn’t work, cases where it does work, and ways I can see it working more reliably in the future. I’m not doing this because I want people to stop working on deep RL. I’m doing this because I believe it’s easier to make progress on problems if there’s agreement on what those problems are, and it’s easier to build agreement if people actually talk about the problems, instead of independently re-discovering the same issues over and over again.   .... " 

No comments: