Git Product home page Git Product logo

gym-sokoban's Introduction

gym-sokoban

Sokoban is Japanese for warehouse keeper and a traditional video game. The game is a transportation puzzle, where the player has to push all boxes in the room on the storage locations/ targets. The possibility of making irreversible mistakes makes these puzzles so challenging especially for Reinforcement Learning algorithms, which mostly lack the ability to think ahead.
The repository implements the game Sokoban based on the rules presented DeepMind's paper Imagination Augmented Agents for Deep Reinforcement Learning. The room generation is random and therefore, will allow to train Deep Neural Networks without overfitting on a set of predefined rooms.

Example Game 1 Example Game 2 Example Game 3
Game 1 Game 2 Game 3

1 Installation

Via PIP

pip install gym-sokoban

From Repository

git clone [email protected]:mpSchrader/gym-sokoban.git
cd gym-sokoban
pip install -e .

Checkout the examples on how to use an external gym environment.

2 Game Environment

2.1 Room Elements

Every room consists of five main elements: walls, floor, boxes, box targets, and a player. They might have different states whether they overlap with a box target or not.

Type State Graphic TinyWorld
Wall Static Wall Wall
Floor Empty Floor Floor
Box Target Empty BoxTarget BoxTarget
Box Off Target BoxOffTarget BoxOffTarget
Box On Target BoxOnTarget BoxOnTarget
Player Off Target PlayerOffTarget PlayerOffTarget
Player On Target PlayerOnTarget PlayerOnTarget

2.2 Actions

The game provides 9 actions to interact with the environment. Push and Move actions into the directions Up, Down, Left and Right. The No Operation action is a void action, which does not change anything in the environment. The mapping of the action numbers to the actual actions looks as follows

Action ID
No Operation 0
Push Up 1
Push Down 2
Push Left 3
Push Right 4
Move Up 5
Move Down 6
Move Left 7
Move Right 8

Move simply moves if there is a free field in the direction, which means no blocking box or wall.

Push push tries to move an adjacent box if the next field behind the box is free. This means no chain pushing of boxes is possible. In case there is no box at the adjacent field, the push action is handled the same way as the move action into the same direction.

2.3 Rewards

Finishing the game by pushing all on the targets gives a reward of 10 in the last step. Also pushing a box on or off a target gives a reward of 1 respectively of -1. In addition a reward of -0.1 is given for every step, this penalizes solutions with many steps.

Reason Reward
Perform Step -0.1
Push Box on Target 1.0
Push Box off Target -1.0
Push all boxes on targets 10.0

2.4 Level Generation

Every time a Sokoban environment is loaded or reset a new room is randomly generated. The generation consists of 3 phases: Topology Generation, Placement of Targets and Players, and Reverse Playing.

2.4.1 Topology Generation

To generate the basic topology of the room, consisting of walls and empty floor, is based on a random walk, which changes its direction at probability 0.35. At every step centered at the current position, a pattern of fields is set to empty spaces. The patterns used can be found in Figure 2.

Figure 2: Masks for creating a topology

2.4.2 Placement of Elements

During this phase, the player including all n box targets are placed on randomly chosen empty spaces.

2.4.3 Reverse Playing

This is the crucial phase to ensure a solvable room. Now Sokoban is played in a reverse fashion, where a player can move and pull boxes. The goal of this phase is to find the room state, with the highest room score, with a Depth First Search. For every room explored during the search is a room score is calculated with the equation shown below. The equation is a heuristic approach to evaluate the difficulty of the room. BoxSwaps counts the number of times a player changes the box to pull. BoxDisplacement is the Manhattan Distance between a specific box and its origin box target. As long as at least one box is on a target the RoomScore is always 0.

2.5 Configuration

Sokoban has many different variations, such as: Room Size, Number of Boxes, Rendering Modes, or Rules.

2.5.1 Rendering Modes

Besides the regular Sokoban rendering, each configuration can be rendered as TinyWorld, which has a pixel size equal to the grid size. To get an environment rendered as a tiny world just add tiny_ in front of the rendering mode. E.g: env.render('tiny_rgb_array', scale=scale_tiny). Scale allows to increase the size of the rendered tiny world observation. Using scale in combination with the rendering modes, human or rgb_array, does not influence the output size. Available rendering modes are:

Mode Description
rgb_array Well looking 2d rgb image
human Displays the current state on screen
tiny_rgb_array Each pixel describing one element in the room
tiny_human Displays the tiny rgb_array on screen

2.5.2 Size Variations

The available room configurations are shown in the table below.

Room Id Grid-Size Pixels #Boxes Example TinyWorld
Sokoban-v0 10x10 160x160 3 Sokoban-v0 Sokoban-v0
Sokoban-v1 10x10 160x160 4 Sokoban-v1 Sokoban-v1
Sokoban-v2 10x10 160x160 5 Sokoban-v2 Sokoban-v2
Sokoban-small-v0 7x7 112x112 2 Sokoban-small-v0 Sokoban-small-v0
Sokoban-small-v1 7x7 112x112 3 Sokoban-small-v1 Sokoban-small-v1
Sokoban-large-v0 13x11 208x176 3 Sokoban-large-v0 Sokoban-large-v0
Sokoban-large-v1 13x11 208x176 4 Sokoban-large-v1 Sokoban-large-v1
Sokoban-large-v2 13x11 208x176 5 Sokoban-large-v2 Sokoban-large-v2
Sokoban-huge-v0 13x13 208x208 5 Sokoban-huge-v0 Sokoban-huge-v0

Please note that the larger rooms might take some time to be created, especially on a laptop.

2.5.3 Other Variations

Besides the regular game of Sokoban, this repository implements or will implement variations, which might make the game easier or more complicated. Except noted differently the variations do not implement a Tiny-World version.

Variation Summary Expected Difficulty Example Tiny World Status Details
Fixed Targets Every box has to be pushed on the target with the same color. More difficult Fixed-Targets Yes implemented ReadMe
Multiple Player There are two players in the room. Every round one of the two players can be used. There is no order of moves between the two players. More difficult TwoPlayer Yes implemented ReadMe
Push&Pull The player can not only push the boxes, but also pull them. Therefore, no more irreversible moves exist. Easier PushAndPull-Targets Yes implemented ReadMe
Boxoban Uses by DeepMind pregenerated Sokoban puzzles. Similar PushAndPull-Targets Yes Implemented ReadMe

3 Cite

If you are using this repository for your research please cite it with the following information:

@misc{SchraderSokoban2018,
  author = {Schrader, Max-Philipp B.},
  title = {gym-sokoban},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mpSchrader/gym-sokoban}},
  commit = {#CommitId}
}

4 Connect & Contribute

4.1 Connect

Feel free to get in touch with me to talk about this or other projects. Either by creating an issue or mail me on LinkedIn.

If you reached the end and liked the project, please show your appreciation by starting this project.

4.2 Contribute

Feel free to contribute to this project by forking the repo and implement whatever you are missing. Alternatively, open a new issue in case you need help or want to have a feature added.

gym-sokoban's People

Contributors

ben-bay avatar guyfreund avatar jaromiru avatar maximilianigl avatar mjanschek avatar mpschrader avatar olloxan avatar zjowowen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gym-sokoban's Issues

Number states of the env

How I can get the number states of env? I try to used policy iteration and value iteration to solve this!

How to use the game?

Hi, I used "gym.make(Sokoban)" to create the environment, but there was an error that "No registered env with id: Sokoban-v0". How to deal with it?

2 resets in PushAndPullSokobanEnv init

When using the PushAndPullSokobanEnv the init has a reset at the end, even though the base class already has a reset.
This causes env.reset() not to return to the same environment even when using constant seed.

To fix remove the extra reset on the init of PushAndPullSokobanEnv

Baseline agent for Sokban

Hello,

Is there any RL baseline configuration for Sokoban gym environment. I want to compare to working learning agent on this game in order to see if my approach is doing good enough.

Thank you.

0 move doesn't work in PushAndPullSokobanEnv

since there is no special case for 0 and it is taken as a regular push it crashes - no CHANGE_COORDINATES value that fits.

To fix - change if action < 5 to elif (0 < action) and (action < 5)

max time steps

Hi!
I would like to know if there is any way of modifying the max time steps so that the enviorment doesnt disappear early.

environment reward for each step

i read your code and i found that reward = -0.1 + sum_component_rewards. So instead of return 1 for Push Box on Target, env returns 0.9 and instead of return 10 for winning, env returns -0.1+1+10= 10.9

Reward calculation

Hi,
function _calc_reward() in sokoban_env.py is only called, when action is < 4. So if i start training and actions are >= 4 the penalty for steps is 0 until the chosen action is < 4 for the first time. I don´t know if this has any effect in the longrun (e.g. if rewards are calculated correctly in general)

Runtime issues

Hi,

Thank you for open source this awesome project!

Is there a mechanism for avoiding runtime issue like:

env = gym.make('TinyWorld-Sokoban-small-v0')  # Error
# RuntimeError: Not enough free spots (#3) to place 1 player and 2 boxes.

There is also possible issue when we reset the environment

env = gym.make('TinyWorld-Sokoban-small-v0')
for _ in range(int(1e3)):
    env.reset()
# RuntimeError: Not enough free spots (#3) to place 1 player and 2 boxes.
# or
# RuntimeWarning: Generated Model with score == 0 

A simple workaround for the issue with reset could be:

def callback_sokoban_reset(f):
    def callback():
        try:
            return f()
        except (RuntimeWarning, RuntimeError):
            print("[SOKOBAN] Runtime error retry . . .")
            return callback()
    return callback

env = gym.make('TinyWorld-Sokoban-small-v0')
env.reset = callback_sokoban_reset(env.reset)

but there is maybe a more efficient way of dealing with these issues.

Issue running ' env.render(mode='human')'

Hello! Firstly thanks for creating this repo and sokoban package!

I'm running into some initial issues in trying to sanity check the install and environment. When I try to run Random_Sampling.py, I run into an error:
"ImportError: cannot import name 'rendering' from 'gym.envs.classic_control' " , and it errors on line: env.render(mode='human').

Any thoughts on what might be causing the issue or how I could fix it?

Thanks!

Action space?

hello!
Thanks for your code so much.I am wondering what's the difference between action 0-3 and action 4-7.I found they have the same consequence in my test.

Feature Request: upgrade from gym to gymnasium

Hi, this repository is currently listed in the gymnasium third party environments but we are cleaning the list up to only include maintained gymnasium-compatible repositories.

Would it be possible for it to be upgraded from gym to gymnasium? Gymnasium is the maintained version of openai gym and is compatible with current RL training libraries (rllib and tianshou have already migrated, and stable-baselines3 will soon).

For information about upgrading and compatibility, see migration guide and gym compatibility. The main difference is the API has switched to returning truncated and terminated, rather than done, in order to give more information and mitigate edge case issues. The documentation explains how to easily convert your code.

Success Rate?

Hello

I am amazed by your work. I am wondering if you tested the Sokoban's game on the standard RL method (Q learning, A2C, ec), and wondering if you have success rate for this kind of game?

check, if game is done

Hi,
could it be that in file sokoban_env.py function "_check_if_done()" is not considering self.max_steps in evaluation, if a game is done?

long import / build time

how can we make the package load and environment start up quicker?

there are some nested for loops, could we potentially replace or parallelize them?

Can the optimal policy for Sokoban be retrieved from the reverse playing?

I am not really familiar with the code in this Github repository, in particular, with the code that generates the rooms. However, I was told that the optimal policy for each room can be generated from the "reverse playing" algorithm that is used to ensure that the rooms are solvable. So, is this true? If yes, what's the easiest way to do it? (Of course, if I get familiar with the code, I will be able to answer this question, but if you can immediately answer it, that would save me some time: I am looking for some environments where the optimal policy is known, so I was wondering if I could use your code to compute the optimal policy).

Cannot run the example

Context:

  1. Install gym-sokoban from the source:
    python install -e .

  2. Running the following code in a Jupyter notebook:

import gym 
import gym_sokoban

env = gym.make('Sokoban-v0')
#env.reset()

env.render(mode='human')

action = env.action_space.sample()
observation, reward, done, info = env.step(action)
  1. Error:
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[9], line 1
----> 1 env = gym.make('Sokoban-v0')
      2 #env.reset()
      4 env.render(mode='human')

File ~/miniconda3/envs/llama3/lib/python3.9/site-packages/gym/envs/registration.py:640, in make(id, max_episode_steps, autoreset, apply_api_compatibility, disable_env_checker, **kwargs)
    637     render_mode = None
    639 try:
--> 640     env = env_creator(**_kwargs)
    641 except TypeError as e:
    642     if (
    643         str(e).find("got an unexpected keyword argument 'render_mode'") >= 0
    644         and apply_human_rendering
    645     ):

File ~/miniconda3/envs/llama3/lib/python3.9/site-packages/gym_sokoban/envs/sokoban_env_variations.py:14, in __init__(self)

File ~/miniconda3/envs/llama3/lib/python3.9/site-packages/gym_sokoban/envs/sokoban_env.py:48, in SokobanEnv.__init__(self, dim_room, max_steps, num_boxes, num_gen_steps, reset)
     44 self.observation_space = Box(low=0, high=255, shape=(screen_height, screen_width, 3), dtype=np.uint8)
     46 if reset:
     47     # Initialize Room
---> 48     _ = self.reset()
...
    408 else:
    409     # Writing: check that the directory to write to does exist
    410     dn = os.path.dirname(fn)

FileNotFoundError: No such file: '/home/xy/miniconda3/envs/llama3/lib/python3.9/site-packages/gym_sokoban/envs/surface/box.png'
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
  1. After checking the site-package folder, I found there is no gym_sokoban folder. There is only an egg-link file:
gym-sokoban.egg-link

I am not sure how I can fix the above issue and get it running.

cannot re-register id

If I try to run a script twice which imports "gym_sokoban", I get the error "cannot re-register id: Sokoban-v0"

I have to restart my kernel to be able to run it again.

Any suggestions?

Integrate boxoban level

Recently DeepMind published 1.000.000 pre-generated levels (Repo). We would love to integrate these levels as a new variation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.