The Eponymous Pickle: MineCraft

Showing posts with label MineCraft. Show all posts

Sunday, May 07, 2023

Dream First, Learn Later: DECKARD is an AI Approach

Interesting, new to me.

Dream First, Learn Later: DECKARD is an AI Approach That Uses LLMs for Training Reinforcement learning (RL) Agents

By Ekrem Çetinkaya -May 4, 2023

Reinforcement learning (RL) is a popular approach to training autonomous agents that can learn to perform complex tasks by interacting with their environment. RL enables them to learn the best action in different conditions and adapt to their environment using a reward system.

A major challenge in RL is how to explore the vast state space of many real-world problems efficiently. This challenge arises due to the fact that in RL, agents learn by interacting with their environment via exploration. Think of an agent that tries to play Minecraft. If you heard about it before, you know how complicated Minecraft crafting tree looks. You have hundreds of craftable objects, and you might need to craft one to craft another, etc. So, it is a really complex environment.

As the environment can have a large number of possible states and actions, it can become difficult for the agent to find the optimal policy through random exploration alone. The agent must balance between exploiting the current best policy and exploring new parts of the state space to find a better policy potentially. Finding efficient exploration methods that can balance exploration and exploitation is an active area of research in RL.

🚀 JOIN the fastest ML Subreddit Community

It’s known that practical decision-making systems need to use prior knowledge about a task efficiently. By having prior information about the task itself, the agent can better adapt its policy and can avoid getting stuck in sub-optimal policies. However, most reinforcement learning methods currently train without any previous training or external knowledge.

But why is that the case? In recent years, there has been growing interest in using large language models (LLMs) to aid RL agents in exploration by providing external knowledge. This approach has shown promise, but there are still many challenges to overcome, such as grounding the LLM knowledge in the environment and dealing with the accuracy of LLM outputs.

So, should we give up on using LLMs to aid RL agents? If not, how can we fix those problems and then use them again to guide RL agents? The answer has a name, and it’s DECKARD.

Overview of DECKARD. Source: https://arxiv.org/abs/2301.12050

DECKARD is trained for Minecraft, as crafting a specific item in Minecraft can be a challenging task if one lacks expert knowledge of the game. This has been demonstrated by studies that have shown that achieving a goal in Minecraft can be made easier through the use of dense rewards or expert demonstrations. As a result, item crafting in Minecraft has become a persistent challenge in the field of AI. ... '

Monday, December 05, 2022

Recent Game playing example

Games can be useful to test learning and alternative process options.

Why Researchers Are Teaching AI to Play Minecraft By Popular Science, December 2, 2022

Nuclear power plant model made in Minecraft An artist's approximation of a nuclear power plant model made in Minecraft. Using Video Pre-Training (VPT),the new AI program could construct items in Minecraft previously unattainable to bots reliant only on reinforcement learning.

OpenAI has developed a Minecraft-playing bot that can build pixelated tools and buildings in the game that require more than 20,000 consecutive actions via a combination of imitation and reinforcement learning. The bot, trained on 70,000 hours of human gameplay, is the first to build "diamond tools," which take human players 20 minutes and 24,000 actions, on average, to construct.

Imitation learning requires each step to be hand-labeled, but the researchers used a separate neural network to handle labeling via Video Pre-Training.... The researchers said the use of imitation and reinforcement learning in combination could pave the way for advancements in self-driving vehicles and nuclear fusion research.

From Popular Science

View Full Article

Friday, July 09, 2021

Challenge for Learning from Human Feedback using Minecraft

Berkeley Bair challenge competition here using a common gaming environment. Been a long time since I looked at Minecraft. Short extract of the idea below, more complete look at the link. Seems a novel look at a broader look at contextual learning.

BASALT: A Benchmark for Learning from Human Feedback by Rohin Shah Jul 8, 2021

TL;DR: We are launching a NeurIPS competition and benchmark called BASALT: a set of Minecraft environments and a human evaluation protocol that we hope will stimulate research and investigation into solving tasks with no pre-specified reward function, where the goal of an agent must be communicated through demonstrations, preferences, or some other form of human feedback. Sign up to participate in the competition!

Motivation

Deep reinforcement learning takes a reward function as input and learns to maximize the expected total reward. An obvious question is: where did this reward come from? How do we know it captures what we want? Indeed, it often doesn’t capture what we want, with many recent examples showing that the provided specification often leads the agent to behave in an unintended way.

Our existing algorithms have a problem: they implicitly assume access to a perfect specification, as though one has been handed down by God. Of course, in reality, tasks don’t come pre-packaged with rewards; those rewards come from imperfect human reward designers.

For example, consider the task of summarizing articles. Should the agent focus more on the key claims, or on the supporting evidence? Should it always use a dry, analytic tone, or should it copy the tone of the source material? If the article contains toxic content, should the agent summarize it faithfully, mention that toxic content exists but not summarize it, or ignore it completely? How should the agent deal with claims that it knows or suspects to be false? A human designer likely won’t be able to capture all of these considerations in a reward function on their first try, and, even if they did manage to have a complete set of considerations in mind, it might be quite difficult to translate these conceptual preferences into a reward function the environment can directly calculate. ...................

Conclusion

We hope that BASALT will be used by anyone who aims to learn from human feedback, whether they are working on imitation learning, learning from comparisons, or some other method. It mitigates many of the issues with the standard benchmarks used in the field. The current baseline has lots of obvious flaws, which we hope the research community will soon fix.

Note that, so far, we have worked on the competition version of BASALT. We aim to release the benchmark version shortly. You can get started now, by simply installing MineRL from pip and loading up the BASALT environments. The code to run your own human evaluations will be added in the benchmark release.

If you would like to use BASALT in the very near future and would like beta access to the evaluation code, please email the lead organizer, Rohin Shah, at rohinmshah@berkeley.edu.

This post is based on the paper “The MineRL BASALT Competition on Learning from Human Feedback”, accepted at the NeurIPS 2021 Competition Track. Sign up to participate in the competition!

Saturday, September 14, 2019

Facebook Proposes Building Assistant with Minecraft

The idea seemed a bit odd at first, but the idea brings together ideas used in agent modeling, where you build a simulation that has agents interact with other agents (or people) and then use the results of the interactions to train a model of the world. Mincraft could be used to create such a sim-world. Though its perhaps not the best vehicle. Recall Facebook's assistant M, mentioned here previously, which I don't think was successful, which perhaps drives this idea.

Facebook is using Minecraft to build an AI assistant By Isobel Asher Hamilton

Facebook is hoping it can train an AI assistant to understand a broad range of human commands with a little help from one of the biggest games in the world — Minecraft. Paper here.

A group of Facebook researchers published a paper in July explaining why they think Minecraft is the perfect place for an AI to learn about human communication. The key lies in the fact that Minecraft is what's known as a "sandbox" game, where players can roam around with relatively free rein as to what they want to do or build, while also following a set of relatively simple rules.

The researchers also hope that the natural curiosity of Minecraft players will give the AI plenty of humans to practise with. "Since we work in a game environment, players may enjoy interacting with the assistants as they are developed, yielding a rich resource for human-in-the-loop research," the paper says. Minecraft has 91 million monthly active users, so the potential pool of humans who could help train the AI is pretty vast. ,,,, "

Monday, April 23, 2018

Minecraft Content as Business

A virtual world made of process rather than place? And adding a business model? Good descriptive piece. And how Microsoft is supporting the idea.

Inside Microsoft’s Quest To Turn Minecraft Content Into A Business
Microsoft has paid out $7 million to Minecraft content makers since last June. But it’s just beginning to build out its marketplace. By Jared Newman

Stefan Panic and Joe Arsenault used to dream of building things in Minecraft for a living. But until recently, they couldn’t quite figure out how.

While Panic worked odd jobs and lived with his father, and Arsenault climbed the corporate ladder at Best Buy, they ran a 50-person volunteer collective called Noxcrew that built intricate environments and mini-games within Minecraft . To date, their work has been downloaded more than 1 million times. Yet all their attempts at making money–from Patreon donations to ad-supported download pages–have only been enough to cover their hosting costs.

Noxcrew’s World Of Horses Ranch lets players train and ride horses on virtual tracks.

In late 2016, Microsoft reached out with an offer that changed everything: The company wanted to discuss an official marketplace for Minecraft creations, which would allow groups like Noxcrew to sell their work directly to players. Like most of the other creators that Microsoft invited into the program, Panic and Arsenault trusted their guts and gave up their day jobs. Now, they make Minecraft content full-time, with help from 15 paid contractors.

“We got the Marketplace opportunity, and it turned from a hobbyist community into an actual business,” Panic says. .... "

The Eponymous Pickle

About Me

RSS

Blog Archive