• Docs >
  • First Agent in MineDojo

First Agent in MineDojo

In this section, we will walk through the steps of creating your first agent in MineDojo benchmarking suite.

Step 1: Import MineDojo

After following the installation instructions to install MineDojo, let’s import it

import minedojo

If everything is working correctly, you should see the following info popup:

[INFO:minedojo.tasks] Loaded 1572 Programmatic tasks, 1558 Creative tasks, and 1 special task: "Playthrough". Totally 3131 tasks loaded.

It means MineDojo has successfully loaded 3131 tasks, with 1572 Programmatic tasks, 1558 Creative tasks, and one special task of “Playthrough”. See our paper for more details about task definition.

Step 2: Create an Environment Instance

MineDojo provides a unified API of minedojo.make() to create an environment instance. It can create four types of simulation: Programmatic tasks, Creative tasks, meta tasks, and an open-ended world. In this tutorial, we only show the first two types. See Task Customization and Simulation Customization on how to create other two.

Create a Programmatic Task

minedojo.make() requires the following arguments to create a Programmatic task: task_id and image_size. task_id is a string specifying the task ID. You can query all Programmatic Tasks’ IDs from minedojo.tasks.ALL_PROGRAMMATIC_TASK_IDS. image_size specifies the size of RGB images observed by agents. It can be either a single integer representing a square image size, or a tuple of two integers representing the width and height of the image.

The following code creates a Programmatic task with ID harvest_milk and an image size of 160x256.

env = minedojo.make(task_id="harvest_milk", image_size=(160, 256))

You can access task-related attributes such as task_prompt and task_guidance:

>>> env.task_prompt
obtain milk from a cow
>>> env.task_guidance
1. Find a cow.
2. Right-click the cow with an empty bucket.

You can also access these attributes without creating an environment instance:

task_prompt, task_guidance = minedojo.tasks.ALL_PROGRAMMATIC_TASK_INSTRUCTIONS["harvest_milk"]

Environment instance returned by minedojo.make() follows OpenAI Gym’s API, which means you can interact with it in the conventional way.

obs = env.reset()

next_obs, reward, done, info = env.step(action)

Create a Creative Task

Similar to Programmatic tasks, Creative tasks are created by invoking minedojo.make() with the same arguments. The only difference is that task_id no longer has any semantic meaning such as harvest_milk. Instead, it has the format of creative:{task_index}. You can query all Creative tasks’ IDs from minedojo.tasks.ALL_CREATIVE_TASK_IDS.

The following code creates the 256-th task from our Creative suite. The task ID is creative:255.

env = minedojo.make(task_id="creative:255", image_size=(160, 256))

Let’s see what the task prompt and guidance are:

>>> env.task_prompt
Build a replica of the Great Pyramid of Giza
>>> env.task_guidance
1. Find a desert biome.
2. Find a spot that is 64 blocks wide and 64 blocks long.
3. Make a foundation that is 4 blocks high.
4. Make the first layer of the pyramid using blocks that are 4 blocks wide and 4 blocks long.
5. Make the second layer of the pyramid using blocks that are 3 blocks wide and 3 blocks long.
6. Make the third layer of the pyramid using blocks that are 2 blocks wide and 2 blocks long.
7. Make the fourth layer of the pyramid using blocks that are 1 block wide and 1 block long.
8. Make the capstone of the pyramid using a block that is 1 block wide and 1 block long.

You can also access these attributes without creating an environment instance:

task_prompt, task_guidance = minedojo.tasks.ALL_CREATIVE_TASK_INSTRUCTIONS["creative:255"]


Due to the modding framework that MineDojo uses, it may pull new assets on-the-fly. This behavior may cause network issues on clusters with mutliple parallel environment instances but poor network connection. Please refer to this discussion for more information.

Basic Observation and Action Spaces

MineDojo provides a comprehensive and unified observation space to train multitask open-ended agents. As a portion of the full observation space, you still can use basic observations to train a competitive agent. We now walk through basic observation and action spaces.

Basic Observation

You can use RGB, compass, GPS, voxel (surrounding blocks), and terrain id to train good agents. Each one is specified below.

>>> obs_space = env.observation_space

>>> obs_space["rgb"]
Box(low=0, high=255, shape=(3, H, W))    # you can configure H and W in argument `image_size`
>>> obs_space["location_stats"]["yaw"]
Box(low=-180.0, high=180.0, shape=(1,))    # yaw of the agent, part of compass
>>> obs_space["location_stats"]["pitch"]
Box(low=-180.0, high=180.0, shape=(1,))    # pitch of the agent, part of compass
>>> obs_space["location_stats"]["pos"]
Box(low=-640000.0, high=640000.0, shape=(3,))    # x y z location
>>> obs_space["voxels"]["block_name"]
Text(3, 3, 3)    # names of 3x3x3 blocks surrounding the agent
>>> obs_space["location_stats"]["biome_id"]
Box(low=0, high=167, shape=())    # id of the current terrain

Basic Action

The basic action space mainly consists of movement and camera control with other primitive functional actions such as “attack” and “use”. Actions related to inventory manipulation are covered in Action Space.

>>> env.action_space
MultiDiscrete([  3   3   4  25  25   8 244  36])

Just like how to use a game controller, the entire action space includes a series of discrete action spaces with different number of actions in each. We only need the first five dimensions to control movements and cameras.

action = env.action_space.no_op()    # 8-len vector

# forward and backward
action[0] = 1    # move forward
action[0] = 2    # move backward
# move left and right
action[1] = 1    # move left
action[1] = 2    # move right
# jump
action[2] = 1
# control camera pitch, discretize into 15 degree interval
action[3] = 0    # change camera pitch by -180 degree
action[3] = 24    # change camera pitch by 180 degree
# control camera yaw, discretize into 15 degree interval
action[4] = 0    # change camera yaw by -180 degree
action[4] = 24    # change camera yaw by 180 degree

To invoke “attack” or “use”, you can set

action[5] = 1    # action "use"
action[5] = 3    # action "attack"

We recommend users to read through Observation Space and Action Space in Core API to better understand the observation and action space.

Congratulations! You have successfully built your first agent in MineDojo. Move to the next page to see how to use our Knowledge Base.