Git Product home page Git Product logo

Comments (2)

Atan03 avatar Atan03 commented on July 1, 2024 2

Thank you for your comprehensive explanation! It has been immensely helpful to me. Now, I have a comprehensive understanding of the formula. Following your guidance, I successfully implemented the OGDA on NFG and achieved convergence results. I am now closing this issue.

from open_spiel.

ryan-dorazio avatar ryan-dorazio commented on July 1, 2024 1

Hi @Atan03! I'm glad you are using my code and am happy to answer your question.

It seems that setting $\alpha =0$ in MMD causes it to degenerate into MD.

Exactly! MMD with $\alpha =0$ becomes mirror descent-ascent with dilated entropy.

do I only need to replace the term involving negative entropy in MMD with L2-norm and replace the softmax with projection onto the probability simplex?

It is true that you would need to change both the negative entropy and softmax terms, but you would also need to the change the gradient computation and specifically the part you pointed out with regards to the num_children. The subtraction of num_children trick is only valid for the case of dilated entropy but not dilation with the l2 norm.

Below I can run through the math on why this is the case.

Say we want to compute the partial derivative $\frac{d\psi}{d (s,a)}$ of dilated entropy $\psi$ with respect to the sequence $(s,a)$ corresponding to a state $s$ and action $a$. Basically there will be two types of terms for which we will be concerned with:

  1. The dialted entropy at state $s$
  2. All the children states of the sequence $C(s,a)$. These are all the possible states that can be reached after selecting action $a$ at state $s$.

Mathematically we have:

$$ \frac{d\psi}{d (s,a)} = \underset{(1)}{\frac{d}{d (s,a)}x_{p(s)}\psi_s\left(\frac{x_{(s, \cdot)}}{x_{p(s)}}\right)} + \underset{(2)}{\frac{d}{d (s,a)} \sum_{j \in C(s,a)}}x_{(s,a)}\psi_{j}\left( \frac{x_{(j,\cdot)}}{x_{(s,a)}}\right) $$

I've introduced some new notation, here is a breakdown:

  • $x_{(s,\cdot)}$ is the slice of the sequence form corresponding to state $s$
  • $p(s)$ is the parent sequence of state $s$
  • $\psi_s$ is the local dgf at state $s$, in our case we are picking negative entropy
  • $\pi_s = \frac{x_{(s,\cdot)}}{x_{p(s)}}$ the policy at state $s$.

The first part (1) by chain rule just reduces to the partial derivative of negative entropy (or wtv dgf you use) with respect to the policy at state $s$,
$\frac{d}{da} \psi_s(\pi_s).$

The second part is more interesting for us. If we just look at one child state $j \in C(s,a)$, then by the product rule and chain rule we get:
$\frac{d}{d(s,a)}x_{(s,a)}\psi_j\left(\frac{x_{(j,\cdot)}}{x_{(s,a)}}\right) = \psi_j\left(\pi_j\right) -\langle \nabla \psi_j(\pi_j), \pi_j \rangle.$

If we plug in negative entropy as $\psi_j$ we get
$\sum_a\pi_j(a)\log(\pi_j(a)) - \langle \log(\pi_j)+\mathbf{1}, \pi_j \rangle = -\sum_j \pi_j(a) = -1.$

Therefore, each child contributes a $-1$! Hence we only need to subtract the number of children, that is the size of the set $C(s,a)$. If you were to use the squared l2 norm you would instead have have to compute $\psi_j\left(\pi_j\right) -\langle \nabla \psi_j(\pi_j), \pi_j \rangle$ which should just be $-\frac{1}{2}||\pi_j||^2$.

from open_spiel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.