Git Product home page Git Product logo

markov_decision_processes's Introduction

Markov Decision Process (MDP): Solution by Exact and MaxEnt Methods

Stuart Truax, 2022-06

This repository has several implementations of solution methods for Markov decision processes in a gridworld.

The methods include:

  1. Exact methods:
  • Value iteration
  • Policy iteration
  • Linear programming
  1. Value iteration with maximum entropy (MaxEnt) regularization

A Markov decision process (MDP) is defined by:

  • $S$ : a set of states

  • $A$ : a set of actions

  • $H$ : a finite time horizon $i=1, ..., H$

  • $T$ : $S \times A \times S \times {1,...,H} \rightarrow [0,1]$ a transition probability function $P(s_{t+1} = s' | s_t = s, a_t = a)$

  • $R$ : $S \times A \times S \times {1,...,H} \rightarrow \mathbb{R}$ a reward function $R(s,a,s')$

  • $\gamma$: a discount factor.

The desired result is an optimal policy $\pi^{*}: S \times {0,...,H} \rightarrow A$ which describes the optimal action $a^{*}$to be undertaken for each state $s$.

Alternatively, to find $\pi$ such that:

$$ \text{max}{\pi} E[\sum{t=0}^{H}\gamma^{t} R(s_{t},a_t,s_{t+1})| \pi]$$

Results

The solution methods are performed on a "gridworld", a discretized X-Y plane where the voids and boundaries of the plane are impassable, and rewards for being a given state are indicated by the colors (rewards can be negative).

Below is a comparison of the value and policy maps generated for the MDP gridworld by the exact solution methods

Solution Method Value Function of Solution Policy Function of Solution
Linear programming
Value iteration
Policy iteration

The solution methods yield roughly the same trends in the value and policy functions.

markov_decision_processes's People

Stargazers

 avatar

Watchers

James Cloos avatar Stuart Truax avatar

Forkers

bhomaidan1990

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.