My long time background is in systems optimization. Quite intriguing claims made that could be very useful. Ultimately very technical. .
Reinforcement learning is supervised learning on optimized data Ben Eysenbach and Aviral Kumar and Abhishek Gupta Oct 13, 2020
The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods. While these methods have shown considerable success in recent years, these methods are still quite challenging to apply to new problems. In contrast deep supervised learning has been extremely successful and we may hence ask: Can we use supervised learning to perform RL?... "
No comments:
Post a Comment