/* ---- Google Analytics Code Below */

Saturday, July 20, 2019

Rewarding Autonomous AIs

Thoughtful piece.    Can a human just provide some sort of reward function?    Or is creating that alone very hard?  Especially if we include some measures of risk as well.  The latter we found in practice at least if you are honest about risk.  This also clouds some the the 'future is unsupervised', things I have heard recently.   What exactly does 'unsupervised' mean?    More than just a simple lack of a measure of success, we discovered.

Stanford researchers teach robots what humans want
Researchers are developing better, faster ways of providing human guidance to autonomous robots.

By Taylor Robota

Told to optimize for speed while racing down a track in a computer game, a car pushes the pedal to the metal … and proceeds to spin in a tight little circle. Nothing in the instructions told the car to drive straight, and so it improvised.

Researchers are trying to make it easier for humans to tell autonomous systems, such as vehicles and robots, what they want them to do. 

This example – funny in a computer game but not so much in life – is among those that motivated Stanford University researchers to build a better way to set goals for autonomous systems.

Dorsa Sadigh, assistant professor of computer science and of electrical engineering, and her lab have combined two different ways of setting goals for robots into a single process, which performed better than either of its parts alone in both simulations and real-world experiments. The researchers presented the work June 24 at the Robotics: Science and Systems conference.

“In the future, I fully expect there to be more autonomous systems in the world and they are going to need some concept of what is good and what is bad,” said Andy Palan, graduate student in computer science and co-lead author of the paper. “It’s crucial, if we want to deploy these autonomous systems in the future, that we get that right.”

The team’s new system for providing instruction to robots – known as reward functions – combines demonstrations, in which humans show the robot what to do, and user preference surveys, in which people answer questions about how they want the robot to behave.

“Demonstrations are informative but they can be noisy. On the other hand, preferences provide, at most, one bit of information, but are way more accurate,” said Sadigh. “Our goal is to get the best of both worlds, and combine data coming from both of these sources more intelligently to better learn about humans’ preferred reward function.”  .... " 

No comments: