Git Product home page Git Product logo

multiagent-rl's Introduction

Multiagent-RL

Introduction

This repository contains the code used in the undergraduate thesis in Mechatronics Engineering, at the University of Brasilia, entitled "Behavior selection for multiple autonomous agents with reinforcement learning in stochastic environments" (Portuguese only).

The idea is to have multiple simulated robotic agents learning to select appropriate behaviors in a stochastic environment. The uncertainty of a state is handled through Bayesian Programming, and the agents learn by applying Q-learning with function approximation.

Currently, the approach is tested in a predator-prey problem using a modified version of the Pac-Man game with introduced uncertainties. Therefore, this simplified multi-agent situation aims to answer the following question: can the ghosts learn to get the Pac-Man?

Installation

The Pac-Man AI Projects provides six Pac-Man-like simulators that are free to use for educational purposes. The one we will be using is Project 5: Classification, which provides an arena mimicking the complete Pac-Man game, including various ghosts.

This project requires the following Python packages:

GNU/Unix

This assumes a GNU/Unix distribution (Ubuntu), but everything is in Pỳthon so the setup shouldn't be too different for other platforms.

Install by running the following commands.

sudo apt-get install python python-dev python-pip python-tk libzmq-dev python-matplotlib
sudo pip install pyzmq

Mac OS X

Installing on an OS X distribution requires a special setup using Homebrew and the XCode.

XCode shall be downloaded from the provided link or in your App Store. Then you need to run the following commands.

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install python --with-tcl-tk --enable-threads --with-x11
pip install matplotlib bumpy pyzmq

Documentation

All documentation are kept in our wiki, with usage examples and modules explanations.

Publications

multiagent-rl's People

Contributors

gnramos avatar matheusportela avatar skalwalker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multiagent-rl's Issues

Is normalization interfering with learning?

The weights vector determines the importance for each feature in the estimated value for the current state. However, by normalizing it, we actually change the feature importance in relation with other actions.

Possibilities:

  • Remove normalization at all.
  • Normalize all weights instead of only the action weights.

Weights structure must map actions to feature and then to weights.

Currently, weights is a dictionary that maps actions (or behaviors) to weights array, one for each feature, i.e. action -> [feature].

Instead, it must map an action to a feature and, then, to the weight, i.e. action -> feature -> weight. Then, should the features order change from a previous game to another one, where the policy is loaded, it'll map correctly the action to its weight.

Change ghosts rewards.

Currently, ghosts receive the negative of Pacman reward. This means that each game step where the Pacman does nothing, it will receive +1 reward, making the ghosts simply wait the Pacman to get lost and penalize itself.

It would be interesting that ghosts receive the following rewards:

  • +500: Capturing the Pacman
  • -500: Getting captured
  • -1: Otherwise

Run tests in different conditions.

Testing with different conditions allows to understand how well policies learned with the same algorithm perform in different scenarios. We must test with the Pacman alone up to 4 ghosts in the same map.

Communicate food position.

Currently, the simulator does not communicate any food position at all. This is desirable since it influences the game score.

Unify all output files.

Currently, three files are generated during a simulation: learn_scores.txt, test_scores.txt, and pacman_policy.txt. These files should be unified in a single file, such as the one below.

[learn_scores]
10.1
1.1
-25.3
[test_scores]
0.0
1500.2
[pacman_policy]
0.1,-20.5,30.2
2,70,-100

Check why Pacman gets stuck in corners.

By visual inspection, Pacman sometimes stops at corners and take a while to escape from it. This behavior may be due to a wrong implementation of flee_behavior or a sequence of Stop actions.

Refactor out features model.

Currently, all features are implemented in agents module, which is simply wrong. Let's create a new module for it.

BFS paths are recalculated at each game step.

After transforming walls into a property, paths are recalculated whenever the walls change. However, walls change at each game step, MAS receiving it from the message sent by the simulator. This severely degrades system performance.

Remove unnecessary files

Several scripts, modules, and packages were used for testing purposes but are still committed to GitHub, such as example_predator_prey.py, bayes, and references. These files must be removed from the project.

Stop learning when testing the agent.

In testing mode, exploration rate drops to 0, however the agent keeps used rewards to learn. We must change it so as during test mode, agents don't learn anymore.

Agent policy restarts at each game.

Since an INIT message destroys current agents and build new ones, it also destroys the learned policy. Hence every game a new agent plays without any previous knowledge.

Solution: separate INIT (agent creation) from START(game beginning). Then, INIT happens once in the simulation.py script whereas START happens at the beginning of each game.

Tutorial

One of the main goals is that this code be reused, so a tutorial with lots of examples is required.

  1. How does this work again? A simple explanation on how to use the system, and how it works in broad terms (how systems communicate).
  2. Wait, what? A simple explanation on how this code is organized and which "conceptual blocks" (adapter system, controller system) are associated to which code files.
  3. It's simple, we kill the Pac-Man Simple explanations on how to create and include a new ghost agent (say, one whose only action is to turn right on corners).
  4. I know Kung Fu A simple tutorial on how to use a new learning method for the agent (different from Q-learning). It assumes, of course, that the method is already implemented, it just needs to be incorporated.
  5. A Brave New World A simple tutorial on how to use a different simulator (like Platformer AI).
  6. Hasta la vista, Baby Ideas on how to adapt the system to communicate through ROS.

Plot histogram.

It is useful to plot a histogram with data collected after running several trials that generated different policies. This statistic may provide whether most policies perform well or underperform.

Implement Q-learning with function approximation.

Currently, agents can't learn proper policies because the state space is quite large. Therefore, it is quite difficult for an agent to reach the same state over and over again (a condition necessary for proper learning).

Q-learning with function approximation best fits this scenario and, probably, may provide learning capabilities for the agents.

Learning agent that can finish the game.

I'm having some troubles in creating an agent that can win the game taking the ghosts into account. Thus, let's take a step back and simplify things.

  • Remove all ghosts from the layout
  • Implement intelligence that allows the Pacman to finish the game by eating all food

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.