Technical, but quite interesting point being made. Optimality may be a good thing, but how do I embed it in useful real time decisions? Notably too the consideration of noise, often a key consideration. This worth a look.
Algorithm for Optimal Decision-Making Under Heavy-Tailed Noisy Rewards
Chung-Ang University (South Korea), November 17, 2022
Researchers at South Korea's Chung-Ang University (CAU) and Ulsan Institute of Science and Technology created an algorithm that supports minimum loss under a maximum-loss scenario (minimax optimality) with minimal prior data. The algorithm addresses sub-optimal performance for heavy-tailed rewards by algorithms designed for stochastic multi-armed bandit (MAB) problems. CAU's Kyungjae Lee said the researchers proposed minimax optimal robust upper confidence bound (MR-UCB) and adaptively perturbed exploration (MR-APE) methods. The team obtained gap-dependent and independent upper bounds of the cumulative regret, then assessed their methods via simulations conducted under Pareto and Fréchet noises. The researchers found MR-UCB outperformed other exploration techniques with stronger robustness and a greater number of actions under heavy-tailed noise; MR-UCB and MR-APE also could solve heavy-tailed synthetic and real-world stochastic MAB problems.
No comments:
Post a Comment