The Eponymous Pickle: Simulation plus Randomness Producing Learning

Tuesday, July 31, 2018

Simulation plus Randomness Producing Learning

Where else can this be used? Note in particular the control of the randomness used.

OpenAI Demonstrates Complex Manipulation Transfer from Simulation to Real World by adding randomness to a relatively simple simulation, OpenAI's robot hand learned to perform complex in-hand manipulation By Evan Ackerman in IEEE Spectrum

In-hand manipulation is one of those things that’s fairly high on the list of “skills that are effortless for humans but extraordinarily difficult for robots.” Without even really thinking about it, we’re able to adaptively coordinate four fingers and a thumb with our palm and friction and gravity to move things around in one hand without using our other hand—you’ve probably done this a handful (heh) of times today already, just with your cellphone.

It takes us humans years of practice to figure out how to do in-hand manipulation robustly, but robots don’t have that kind of time. Learning through practice and experience is still the way to go for complex tasks like this, and the challenge is finding a way to learn faster and more efficiently than just giving a robot hand something to manipulate over and over until it learns what works and what doesn’t, which would probably take about a hundred years.

Rather than wait a hundred years, researchers at OpenAI have used reinforcement learning to train a convolutional neural network to control a five-fingered Shadow hand to manipulate objects, all in just 50 hours. They’ve managed this by doing it in simulation, a technique that is notoriously “doomed to succeed,” but by carefully randomizing the simulation to better match real-world variability, a real Shadow hand was able to successfully perform in-hand manipulation on real objects without any retraining at all. .... " Full paper.

The Eponymous Pickle

About Me

RSS

Blog Archive

Tuesday, July 31, 2018

Simulation plus Randomness Producing Learning

No comments: