Git Product home page Git Product logo

prisonerdilemma's Introduction

Prisoner's dilemma : Machine Learning bots to solve the Prisoner's dilemma [help needed]

Python implementation of the Prisoner's dilemma with the classical strategies, and two machine learning agents: QLearning and DeepQLearning.

Why

I wanted to solve the Prisoner's dilemma using machine learning bots. There is already two agents available, QLearning and DeepQLearning (Deepmind). Both agents works on simple learning task:

  • learn how to play against a given strategy (titfortat, grim, ...).
  • learn to the play against another both in the non iterated version of the game.

However, both agents (QLearning/DeepQLearning) still fail to converge to work together on the iterated version of the game. It would make themself better if they would learn to cooperate with each other but it appears that they rarely do so. They frequently implement the "titfortat" strategy, but are trapped in a devious circle where they alternate "defect" and "cooperate".

Where to start

Installation

pip install numpy pandas

You can start by the Jupyter Notebook to see how to use the library and some example of game. Or check the test.py file for example of game.

TO DO
  • Make two machine learning bots cooperate in an iterated version of the game. (and understand why it doesn't do so as for now)

I would gladly accept your help on this issue. Thanks.

About the Prisoner's dilemma

Read more about the Prisoner's dilemma here

Two members of a criminal gang are arrested and imprisoned. 
Each prisoner is in solitary confinement with no means of speaking to 
or exchanging messages with the other.
The police admit they don't have enough evidence 
to convict the pair on the principal charge. 
They plan to sentence both to a year in prison on a lesser charge.
Simultaneously, the police offer each prisoner a Faustian bargain. 
Each prisoner is given the opportunity either to betray (defect) the other,
by testifying that the other committed the crime,
or to cooperate with the other by remaining silent. 
Here's how it goes:
- If A and B both defect the other
    each of them serves 2 years in prison
- If A defects B but B remains silent
    A will be set free and B will serve 3 years in prison
- If A and B both remain silent
    both of them will only serve 1 year in prison

Parameters :

The commonly used values T = 5, R = 3, P = 1, and S = 0. It's a positive valuation of the game,

  • 3 years of prison => 0 points (S for Sucker)
  • 2 years of prison => 1 points (P for Penalty)
  • 1 year of prison => 3 points (R for Reward)
  • 0 year of prison => 5 points (T for Temptation)

prisonerdilemma's People

Contributors

benderv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.