Git Product home page Git Product logo

pomdp_solver's Introduction

The purpose of this project is to create a more efficient means for a solution for Partially Observable Markov Decision Processes (POMDPs) .We wish to find the existence of a best possible solution if it exists otherwise we wish to provide an excellent approximation for the same.Potential applications inlcude a better quantification of uncertainty commonly encountered in Reinforcement Learning Applications ranging from co-operative usage in drones to designing better agents for more interactive gaming to more dynamic environments such as Weather Modelling and disease prediction.

One of the crucial issues that has ensured a very low adaptation of POMDPs despite their wide range of applicability involves the issue of scalability to real-world systems with an enormous range of uncertain states and observations.

We wish to tackle this process steadily by increasing the complexity of our testing environments as a step by step process.Therefore we choose to begin with what is arguably the classical training ground for POMDPs (analogous to the knapsack problem in Dynamic Problem or one of the standard datasets used extensively in ML research )

The Tiger Problem which is our training ground broadly involves the following situations:

A tiger is put with equal probability behind one of two doors, while treasure is put behind the other one. You are standing in front of the two closed doors and need to decide which one to open. If you open the door with the tiger, you will get hurt by the tiger (negative reward), but if you open the door with the treasure, you receive a positive reward. Instead of opening a door right away, you also have the option to wait and listen for tiger noises. But listening is neither free nor entirely accurate. You might hear the tiger behind the left door while it is actually behind the right door and vice versa.

The states of the system are tiger behind the left door (tiger-left) and tiger behind the right door (tiger-right).

Available actions are: open the left door (open-left), open the right door (open-right) or to listen (listen).

Rewards associated with these actions depend on the resulting state: +10 for opening the correct door (the door with treasure), -100 for opening the door with the tiger. A reward of -1 is the cost of listening.

As a result of listening, there are two observations: either you hear the tiger on the right (tiger-right), or you hear it on the left (tiger-left).

Now the tiger problem has already been solved by various methods and one standard method used today extensively in a wide range of situations is the Point Based Value Iteration.We shall take the performance of this algorithm as the benchmark in our journey to find a possibly better algorithm

Our implementation is python based and we base it on the excellent interface and implementation given by Malayandi at https://github.com/malayandi/Tiger-Problem-POMDP

The first approach that we are going to use is to model the POMDP as a hidden markov model acting on a point process .We choose to do this for the following reasons: 1.The approximation of POMDP as a hidden markov model has proved to pretty useful to researchers in the past 2 The point based decomposition used in Point Based Value Iteration inspires the usage of a point process moreover considering the dependence on the past histories we prefer a self exciting or Hawkes Process

pomdp_solver's People

Contributors

rsn870 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.