Git Product home page Git Product logo

ergo_mdp's Introduction

ergo_mdp

Ergodic economics simulations using MDP formalisms

This needs a bit of work, it's still broken

"...Apparently so, but suppose you throw a coin enough times... suppose one day, it lands on its edge."

Legacy of Kain: Soul Reaver II

Episodic MDPs, unlike their non-episodic counterparts, have proven ergodic properties

Bojun, Huang. "Steady State Analysis of Episodic Reinforcement Learning." Advances in Neural Information Processing Systems 33 (2020).

Peters, Ole. "The ergodicity problem in economics." Nature Physics 15.12 (2019): 1216-1221.

Moldovan, Teodor Mihai, and Pieter Abbeel. "Safe exploration in Markov decision processes." Proceedings of the 29th International Coference on International Conference on Machine Learning. 2012.

\lim_{T \to \inf} \frac{1}{T}\mathop{\mathbb{E}}\sum_{t = 1}^TR(s_t,a_t) = V_\pi(s_0)

x' = \left{ {\begin{array}{*{20}{c}} {x + 0.5x,\quad p = \frac{1}{2}} \ {x- 0.4x,\quad p = \frac{1}{2}} \end{array}} \right}

\lim_{T \to\inf}\frac{1}{T}\mathop{\mathbb{E}}\left[\sum_{t = 1}^TR(s_t,a_t) \right] = V_\pi(s_)

Taleb's take on this

\\R\left((x,win),null\right) = 0.5x \R\left((x,lose),null\right) = -0.4x  \R\left((x,choose),stop\right) = 0  \\P((x,win)|(x,choose),play) = 0.5\P((x,lose)|(x,choose),play) = 0.5\P((x,stopped)|(x,choose),stop) = 1\P((x+0.5x,choose)|(x,win),null) = 1\P((x-04x,choose)|(x,lose),null) = 1\\

I would argue that most MDPs of interest are clearly non-ergodic. An MDP combined with a stochastic policy \pi is ergodic if all deterministic policies result in Markov Reward Processes that are ergodic. Almost all RL algorithms assume ergodicity. Value Iteration, the prime one, will just pass back rewards

Equivalently, we can say that an agent with a stochastic policy should be able to visit all states. The problem of ergodicity is that it makes agents overoptimistic, as it kinds of assumes that kinds of possible errors and bad luck are eventually recoverable. If I train as if ergodicity true, a 99% chance of losing everything vs an 1% chance of winning big times will average out, and an agent might actually go for high payout.

To work around the luck of ergodicity we make absorbing states extremely unrewarding. If you break your little toy helicopter you will, you get a massive negative reward. The reward has to be big enough, so that between the choice of getting further away on average and breaking down every so often, breaking down every so often to be considered unacceptable.

Generally, until now, tinkering with the reward function is considered enough. The agent learns to avoid those absorbing states, so that, eventually, the ergodic property is reclaimed.

The problem with this approach is that it's not trivial to model this arbitrary reward functions.

Percentages of winners and losers

Wealth of winners and losers

Percentages of winners and losers

Wealth of winners and losers

Tree

Well, the model is bonkers. The vast population becomes broke; the probability of being extremely wealthy is less and less (but more wealthy as things move forward). At the very end, because you cannot subdivide an indivudal to fewer than one points and let them have infinite wealth, the whole wealth model collapses.

So what is the

ergo_mdp's People

Contributors

ssamot avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.