Benchmarking Suite¶
MineDojo features a massively multitask benchmarking suite with 3142 tasks in total. In this section, we elaborate how to access the entire suite, create a task instance, and more.
Overview¶
The entire benchmarking suite includes Programmatic tasks and Creative tasks. Programmatic tasks are defined as tasks that can be programmatically assessed based on the ground-truth simulator states. Creative tasks are defined as tasks that do not have well-defined or easily-automated success criteria. Besides, our benchmarking suite also includes a special task of “Playthrough: Defeat the Ender Dragon”. This task holds a unique position in our benchmark because killing the dragon means “beating the game” in the traditional sense of the phrase, and is considered the most significant achievement for a new player. See our paper for more details about this task.
Task Category |
Number of Tasks |
---|---|
Programmatic |
1581 |
Creative |
1560 |
Playthrough |
1 |
We pair all tasks with natural language descriptions of task goals (a.k.a., prompts), such as “obtain 8 bone in swampland” and “make a football stadium”. Most tasks also have step-by-step guidance generated by GPT-3. Users can access instructions for all tasks through
all_ids: list[str] = minedojo.tasks.ALL_TASK_IDS
all_instructions: dict[str, tuple(str, str | None)] = minedojo.tasks.ALL_TASK_INSTRUCTIONS
where all_instructions
maps task_id
to tuple(task_prompt, task_guidance)
.
Programmatic Tasks¶
1581 Programmatic tasks can be further divided into four categories: (1) Survival: surviving for a designated number of days, (2) Harvest: finding, obtaining, cultivating, or manufacturing hundreds of materials and objects, (3) Tech Tree: the skills of crafting and using a hierarchy of tools, and (4) Combat: fight various monsters and creatures to test agent’s reflex and martial skills.
All Programmatic tasks can be created following the unified API minedojo.make()
. As introduced in Getting Started, it requires arguments of task_id
and image_size
. All Programmatic tasks IDs and instructions can be accessed by
all_programmatic_ids = minedojo.tasks.ALL_PROGRAMMATIC_TASK_IDS
all_programmatic_instructions = minedojo.tasks.ALL_PROGRAMMATIC_TASK_INSTRUCTIONS
The prompt and instruction for a certain task can be accessed by
task_prompt, task_guidance = minedojo.tasks.ALL_PROGRAMMATIC_TASK_INSTRUCTIONS[task_id]
Note
Users can find the specification file for all Programmatic tasks here and a listing of all Programmatic tasks here.
Survival¶
This task group tests the ability to stay alive in the game. It is nontrivial to survive in Minecraft, because the agent grows hungry as time passes and the health bar drops gradually. Hostile mobs like zombie and skeleton spawn at night, which are very dangerous if the agent does not have the appropriate armor to protect itself or weapons to fight back. We provide two tasks with different levels of difficulty for Survival. One is to start from scratch without any assistance. The other is to start with initial weapons and food. Agent receives 1 reward per day survived and 100 reward upon success.
# start from scratch without any assistance
env = minedojo.make(task_id="survival", image_size=image_size)
# start with initial weapons and food
env = minedojo.make(task_id="survival_sword_food", image_size=image_size)
Harvest¶
This task group tests the agent’s ability to collect useful resources such as minerals (iron, diamond, obsidian), food (beef, pumpkin, carrots, milk), and other useful items (wool, oak wood, coal). We construct these tasks by enumerating the Cartesian product between target items to collect, initial inventory, and world conditions (terrain, weather, lighting, etc.) so that they cover a spectrum of difficulty. For instance, if the task is to harvest wool, then it is relatively easy if the agent has a shear in its initial inventory with a sheep nearby, but more difficult if the agent has to craft the shear from raw material and explore extensively to find a sheep. We filter out combinations that are impossible (such as farming certain plants in the desert) from the Cartesian product. We provide 895 Harvest tasks in total.
Taking “harvest wool” as an example, the code below demonstrates how to instantiate tasks with four different difficulties.
# without initial tool or initial sheep
env = minedojo.make(task_id="harvest_wool", image_size=image_size)
# without initial tool but with a sheep spawned nearby
env = minedojo.make(task_id="harvest_wool_with_sheep", image_size=image_size)
# with initial tool but without initial sheep
env = minedojo.make(task_id="harvest_wool_with_shears", image_size=image_size)
# with both initial tool and sheep spawned nearby
env = minedojo.make(task_id="harvest_wool_with_shears_and_sheep", image_size=image_size)
Although they all aim for wool, these four instances cover a spectrum of difficulty. In the most difficult level, the agent needs to craft a required tool and also explore the terrain to find a sheep. By contrast, in the easiest level, the agent only needs to recognize a sheep, approach it, and then collect wool. Agent receives sparse reward upon success. We provide examples on how to engineer a dense reward leveraging lidar in Privileged Observations.
Tech Tree¶
Minecraft includes several levels of tools and armors with different properties and difficulties to unlock. To progress to a higher level of tools and armors, the agent needs to develop systematic and compositional skills to navigate the technology tree (e.g. wood → stone → iron → diamond). In this task group, the agent is asked to craft and use a hierarchy of tools starting from a less advanced level. For example, some task asks the agent to craft a wooden sword from bare hand. Another task may ask the agent to craft a gold helmet. An agent that can successfully complete these tasks should have the ability to transfer similar exploration strategies to different tech levels. We provide 213 Tech Tree tasks in total.
We include five main tech levels, namely “wood”, “stone”, “iron”, “gold”, and “diamond”, ordered in increasing difficulty to unlock. In addition to these, we also include three special technologies, “archery”, “explosives”, and “redstone”. These tasks cover a full spectrum of difficulty. E.g., in easy setting, the agent is asked to start from bare hand to unlock a wooden sword, which is straightforward to solve by repeatedly chopping trees to collect sufficient wood. However, in a much more difficult setting where the agent is asked to craft a diamond sword from bare hand, disciplined exploration is necessary to find a sufficient amount of diamonds. The code block below demonstrates how to create these two example tasks. Agent receives 1 reward once successfully obtaining the target item and 10 reward once successfully using the target item. The task is considered as a success if the agent uses the corresponding item.
# unlock wood sword from bare hand
env = minedojo.make(task_id="techtree_from_barehand_to_wooden_sword", image_size=image_size)
# unlock diamond sword from bare hand
env = minedojo.make(task_id="techtree_from_barehand_to_diamond_sword", image_size=image_size)
Note
Although we support chainmail armors, we don’t consider it as a technology level because they are unobtainable through crafting.
Combat¶
Combat tasks test agent’s reflex and martial skills to fight against various monsters and creatures. Similar to how we develop the Harvest task group, we generate these tasks by enumerating the Cartesian product between the target entity to combat with, initial inventory, and world conditions to cover a spectrum of difficulty. We include 471 Combat tasks in total.
There are four axes behind Combat tasks, namely the hostility of entities to combat with, initial weapons and armors, battleground terrain, and lighting conditions. For example, entities to combat with can be friendly such as sheep and pig, neutral such as enderman, and hostile such as zombie and creeper. Initial weapons and armors range from wooden equipment to diamond equipment. Battleground terrain ranges from plain to hill. Lighting conditions include day and night. Agent receives sparse reward upon success.
Axis |
Range |
---|---|
Hostility of Entities |
Friendly, Neutral, Hostile |
Equipment |
Wooden, Leather, Iron, Diamond |
Battleground Terrain |
Plain, Forest, Hill, Desert |
Lighting Conditions |
Day, Night |
The code block below demonstrates how to create two example tasks.
# combat a husk in desert with iron equipment
env = minedojo.make(task_id="combat_husk_desert_iron_armors_iron_sword_shield", image_size=image_size)
# combat a bat in forest with leather armors, a shield, and a wooden sword
env = minedojo.make(task_id="combat_bat_forest_leather_armors_wooden_sword_shield", image_size=image_size)
Creative Tasks¶
We harvest Creative tasks from three ways: (1) manually brainstorming, (2) mining from YouTube tutorials, and (3) querying from GPT3. See our paper for detailed collecting process. All Creative tasks have no well-defined reward function or success criteria.
Similar to Programmatic tasks, all Creative tasks can be created following the unified API minedojo.make()
. Unlike Programmatic tasks whose task IDs also have semantic meanings, Creative tasks’ IDs are formatted as creative:{task_index}
. All Creative tasks IDs and instructions can be accessed by
all_creative_ids = minedojo.tasks.ALL_CREATIVE_TASK_IDS
all_creative_instructions = minedojo.tasks.ALL_CREATIVE_TASK_INSTRUCTIONS
The prompt and instruction for a certain task can be accessed by
task_prompt, task_guidance = minedojo.tasks.ALL_CREATIVE_TASK_INSTRUCTIONS[task_id]
Note
Users can find a listing of all Creative tasks here.
After instantiating a Creative task instance, it is also possible to query its collection (“manual”, “youtube”, or “gpt3”):
collection: Literal["manual", "youtube", "gpt3"] = env.collection
We shall explain each collection now.
Manually Brainstormed¶
We brainstormed and authored 216 Creative tasks such as “build a haunted house with zombie inside”. Such a task can be created through
env = minedojo.make(task_id="creative:88", image_size=image_size)
We can also access its metadata:
>>> env.collection
manual
>>> env.task_prompt
Build a Haunted House with zombie inside.
>>> env.task_guidance
1. Find a good location for your haunted house. You'll want to make sure it's in a dark, spooky area.
2. Start by building the basic structure of your house. Make it as big or small as you want, but keep in mind that you'll need to add plenty of details to make it look truly haunted.
3. Add some windows and doors. Be sure to make them look creepy!
4. Now it's time to start decorating the inside of your house. Add furniture, paintings, and other spooky details.
5. Finally, add your zombie! You can either build one using blocks, or find a ready-made one in a resource pack.
Mined from YouTube Tutorial Videos¶
We identify our YouTube dataset as a rich source of tasks, as many human players demonstrate and narrate creative missions in the tutorial playlists. We collect 1042 task ideas from the common wisdom of a huge number of veteran Minecraft gamers, such as “make an automated mining machine” and “grow cactus up to the sky”.
To create such a task, run
env = minedojo.make(task_id="creative:630", image_size=image_size)
An important attribute of these tasks is source
, which stores corresponding YouTube tutorial video ID, start timestamp, and end timestamp. Users can use this information to retrieve videos from our YouTube dataset.
>>> env.source
{'end': 484, 'start': 0, 'youtube_id': 'kRukK0L9QYI'}
We can also access its metadata
>>> env.collection
youtube
>>> env.task_prompt
make an automated mining machine.
>>> env.task_guidance
None
Note
Narrated transcripts of YouTube tutorial videos already can serve as the guidance. So we don’t run GPT3 to generate guidance for these Creative tasks mined from YouTube.
Playthrough Task¶
To create the special Playthrough task, run:
env = minedojo.make(task_id="playthrough",image_size=image_size)
We can query the task prompt and guidance through
>>> env.task_prompt
Defeat the Ender Dragon and obtain the trophy dragon egg.
>>> env.task_guidance
First, you need to find a place to build your base. You'll need a crafting table, a furnace, and a lot of cobblestone. Once you have your base set up, you need to find a way to get to the Nether. The easiest way to do this is by building a Nether Portal.
Once you're in the Nether, you need to find the Ender Dragon. The easiest way to do this is by following the path of endermen. They will lead you right to the dragon.
Once you find the dragon, you need to kill it. You can do this by hitting it with your sword or by shooting it with arrows. Once the dragon is dead, you can collect the dragon egg.
Playthrough task has a time limit:
>>> env.time_limit
115200
This time limit is estimated using human statistics. Specifically, human players roughly take eight hours to defeat the dragon. With four actions per second, we set the max number of steps to 115200.
Congratulations! Now you have become an expert in MineDojo benchmarking suite. Let’s move to following sections to see how we can train agents to solve these interesting tasks.