matheusportela / multiagent-rl Goto Github PK

Multiagent reinforcement learning simulation framework - Undergraduate thesis in Mechatronics Engineering at the University of Brasília

Python 100.00%

reinforcement-learning multiagent-systems python q-learning bayesian-methods pacman-game

multiagent-rl's Introduction

Multiagent-RL

Introduction

This repository contains the code used in the undergraduate thesis in Mechatronics Engineering, at the University of Brasilia, entitled "Behavior selection for multiple autonomous agents with reinforcement learning in stochastic environments" (Portuguese only).

The idea is to have multiple simulated robotic agents learning to select appropriate behaviors in a stochastic environment. The uncertainty of a state is handled through Bayesian Programming, and the agents learn by applying Q-learning with function approximation.

Currently, the approach is tested in a predator-prey problem using a modified version of the Pac-Man game with introduced uncertainties. Therefore, this simplified multi-agent situation aims to answer the following question: can the ghosts learn to get the Pac-Man?

Installation

The Pac-Man AI Projects provides six Pac-Man-like simulators that are free to use for educational purposes. The one we will be using is Project 5: Classification, which provides an arena mimicking the complete Pac-Man game, including various ghosts.

This project requires the following Python packages:

Tkinter: graphical user interfaces
ZeroMQ: interprocess communication
Matplotlib: graphics plotting
Numpy: numerical computation

GNU/Unix

This assumes a GNU/Unix distribution (Ubuntu), but everything is in Pỳthon so the setup shouldn't be too different for other platforms.

Install by running the following commands.

sudo apt-get install python python-dev python-pip python-tk libzmq-dev python-matplotlib
sudo pip install pyzmq

Mac OS X

Installing on an OS X distribution requires a special setup using Homebrew and the XCode.

Homebrew: package manager
XCode: IDE developed by Apple

XCode shall be downloaded from the provided link or in your App Store. Then you need to run the following commands.

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install python --with-tcl-tk --enable-threads --with-x11
pip install matplotlib bumpy pyzmq

Documentation

All documentation are kept in our wiki, with usage examples and modules explanations.

Publications

multiagent-rl's People

Contributors

Stargazers

Watchers

Forkers

dtbinh gnramos hellokitty8 lionkt johndpope amoliu solversa ayoub-root benbenji yangyang56789 happyfaye sanyam07 oscarpindaro

multiagent-rl's Issues

Create number of food feature.

When Pacman is finishing the game, ghosts may rush to capture it otherwise the penalty would be too great.

Accept Pacman and ghosts AI agent via CLI.

Allow saving learned policies.

Exploration rate is not 0 for testing games.

Tune observation and prediction Gaussian standard deviation.

Test the same policy multiple times.

Rename pacman_mas.py to controller.py

It'll be necessary to update the documentation.

Is normalization interfering with learning?

The weights vector determines the importance for each feature in the estimated value for the current state. However, by normalizing it, we actually change the feature importance in relation with other actions.

Possibilities:

Remove normalization at all.
Normalize all weights instead of only the action weights.

Create white ghost measurement.

Plot weights evolution.

Weights structure must map actions to feature and then to weights.

Currently, weights is a dictionary that maps actions (or behaviors) to weights array, one for each feature, i.e. action -> [feature].

Instead, it must map an action to a feature and, then, to the weight, i.e. action -> feature -> weight. Then, should the features order change from a previous game to another one, where the policy is loaded, it'll map correctly the action to its weight.

Use BFS to calculate distances

Change ghosts rewards.

Currently, ghosts receive the negative of Pacman reward. This means that each game step where the Pacman does nothing, it will receive +1 reward, making the ghosts simply wait the Pacman to get lost and penalize itself.

It would be interesting that ghosts receive the following rewards:

+500: Capturing the Pacman
-500: Getting captured
-1: Otherwise

Communicate all agents positions from simulator to multiagent system.

Currently, each agent receives only its own position from the simulation. Instead, agents must receive the positions from every other agent, since it's necessary for cooperative learning.

Plot behavior count instead of probability.

Run tests in different conditions.

Testing with different conditions allows to understand how well policies learned with the same algorithm perform in different scenarios. We must test with the Pacman alone up to 4 ghosts in the same map.

Save simulation data in output files.

For instance:

Pacman and ghost agents class
Layout file
Date and time (either inside the file and in its name)

Save results in file.

Create ally distance feature.

So far, only enemy distances are used as a feature. Do the same for allies.

Learning rate must decay over time.

This guarantees learning convergence.

Provide CLI flag to select GUI output.

Simulator script must accept a CLI flag, such as -g or --gui to output game in a GUI.

Log weights evolution.

Each game step, we must log the policy weights for future analysis.

Improve results visualization.

For instance, multiple plots in the same figure.

Checking whether decaying learning rate improves the algorithm performance.

Communicate food position.

Currently, the simulator does not communicate any food position at all. This is desirable since it influences the game score.

Investigate why agent prefer fleeing than eating food.

The punishment for losing the game may be too high, making the agent adopt a conservative strategy.
By reducing the punishment or increasing the reward for winning games, the agent may become greedier.

Unify all output files.

Currently, three files are generated during a simulation: learn_scores.txt, test_scores.txt, and pacman_policy.txt. These files should be unified in a single file, such as the one below.

[learn_scores]
10.1
1.1
-25.3
[test_scores]
0.0
1500.2
[pacman_policy]
0.1,-20.5,30.2
2,70,-100

Check why Pacman gets stuck in corners.

By visual inspection, Pacman sometimes stops at corners and take a while to escape from it. This behavior may be due to a wrong implementation of flee_behavior or a sequence of Stop actions.

Log selected behaviors.

Each game step, we must log the selected behavior for future analysis.

Refactor out features model.

Currently, all features are implemented in agents module, which is simply wrong. Let's create a new module for it.

Test learning method with ghosts.

So far, only the Pacman is learning. Let's test how ghosts can learn.

Create pursue behavior for Pacman.

With this behavior, Pacman would be able to pursue ghosts. Hopefully, it'll only activate when they are white.

Create script to plot results.

BFS paths are recalculated at each game step.

After transforming walls into a property, paths are recalculated whenever the walls change. However, walls change at each game step, MAS receiving it from the message sent by the simulator. This severely degrades system performance.

Implement Q-learning for multiagent.

A deterministic version of Q-learning must be implemented, such as the one used in example_predator_prey.py.

Remove unnecessary files

Several scripts, modules, and packages were used for testing purposes but are still committed to GitHub, such as example_predator_prey.py, bayes, and references. These files must be removed from the project.

Stop learning when testing the agent.

In testing mode, exploration rate drops to 0, however the agent keeps used rewards to learn. We must change it so as during test mode, agents don't learn anymore.

Agent policy restarts at each game.

Since an INIT message destroys current agents and build new ones, it also destroys the learned policy. Hence every game a new agent plays without any previous knowledge.

Solution: separate INIT (agent creation) from START(game beginning). Then, INIT happens once in the simulation.py script whereas START happens at the beginning of each game.

Tutorial

One of the main goals is that this code be reused, so a tutorial with lots of examples is required.

How does this work again? A simple explanation on how to use the system, and how it works in broad terms (how systems communicate).
Wait, what? A simple explanation on how this code is organized and which "conceptual blocks" (adapter system, controller system) are associated to which code files.
It's simple, we kill the Pac-Man Simple explanations on how to create and include a new ghost agent (say, one whose only action is to turn right on corners).
I know Kung Fu A simple tutorial on how to use a new learning method for the agent (different from Q-learning). It assumes, of course, that the method is already implemented, it just needs to be incorporated.
A Brave New World A simple tutorial on how to use a different simulator (like Platformer AI).
Hasta la vista, Baby Ideas on how to adapt the system to communicate through ROS.

Configure results output file via CLI.

Accept game layout file via CLI.

Plot histogram.

It is useful to plot a histogram with data collected after running several trials that generated different policies. This statistic may provide whether most policies perform well or underperform.

Implement Q-learning with function approximation.

Currently, agents can't learn proper policies because the state space is quite large. Therefore, it is quite difficult for an agent to reach the same state over and over again (a condition necessary for proper learning).

Q-learning with function approximation best fits this scenario and, probably, may provide learning capabilities for the agents.

Learning agent that can finish the game.

I'm having some troubles in creating an agent that can win the game taking the ghosts into account. Thus, let's take a step back and simplify things.

Remove all ghosts from the layout
Implement intelligence that allows the Pacman to finish the game by eating all food