Hello, I am currently replicating the experiments of DOGDA and DOMWU as described in t

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Implementation of dilated form of MMD about open_spiel HOT 2 CLOSED

Atan03 commented on July 1, 2024

Implementation of dilated form of MMD

from open_spiel.

Comments (2)

Atan03 commented on July 1, 2024 2

Thank you for your comprehensive explanation! It has been immensely helpful to me. Now, I have a comprehensive understanding of the formula. Following your guidance, I successfully implemented the OGDA on NFG and achieved convergence results. I am now closing this issue.

from open_spiel.

ryan-dorazio commented on July 1, 2024 1

Hi @Atan03! I'm glad you are using my code and am happy to answer your question.

It seems that setting $\alpha =0$ in MMD causes it to degenerate into MD.

Exactly! MMD with $\alpha =0$ becomes mirror descent-ascent with dilated entropy.

do I only need to replace the term involving negative entropy in MMD with L2-norm and replace the softmax with projection onto the probability simplex?

It is true that you would need to change both the negative entropy and softmax terms, but you would also need to the change the gradient computation and specifically the part you pointed out with regards to the num_children. The subtraction of num_children trick is only valid for the case of dilated entropy but not dilation with the l2 norm.

Below I can run through the math on why this is the case.

Say we want to compute the partial derivative $\frac{d\psi}{d (s,a)}$ of dilated entropy $\psi$ with respect to the sequence $(s,a)$ corresponding to a state $s$ and action $a$. Basically there will be two types of terms for which we will be concerned with:

The dialted entropy at state $s$
All the children states of the sequence $C(s,a)$. These are all the possible states that can be reached after selecting action $a$ at state $s$.

Mathematically we have:

$$ \frac{d\psi}{d (s,a)} = \underset{(1)}{\frac{d}{d (s,a)}x_{p(s)}\psi_s\left(\frac{x_{(s, \cdot)}}{x_{p(s)}}\right)} + \underset{(2)}{\frac{d}{d (s,a)} \sum_{j \in C(s,a)}}x_{(s,a)}\psi_{j}\left( \frac{x_{(j,\cdot)}}{x_{(s,a)}}\right) $$

I've introduced some new notation, here is a breakdown:

$x_{(s,\cdot)}$ is the slice of the sequence form corresponding to state $s$
$p(s)$ is the parent sequence of state $s$
$\psi_s$ is the local dgf at state $s$, in our case we are picking negative entropy
$\pi_s = \frac{x_{(s,\cdot)}}{x_{p(s)}}$ the policy at state $s$.

The first part (1) by chain rule just reduces to the partial derivative of negative entropy (or wtv dgf you use) with respect to the policy at state $s$,
$\frac{d}{da} \psi_s(\pi_s).$

The second part is more interesting for us. If we just look at one child state $j \in C(s,a)$, then by the product rule and chain rule we get:
$\frac{d}{d(s,a)}x_{(s,a)}\psi_j\left(\frac{x_{(j,\cdot)}}{x_{(s,a)}}\right) = \psi_j\left(\pi_j\right) -\langle \nabla \psi_j(\pi_j), \pi_j \rangle.$

If we plug in negative entropy as $\psi_j$ we get
$\sum_a\pi_j(a)\log(\pi_j(a)) - \langle \log(\pi_j)+\mathbf{1}, \pi_j \rangle = -\sum_j \pi_j(a) = -1.$

Therefore, each child contributes a $-1$! Hence we only need to subtract the number of children, that is the size of the set $C(s,a)$. If you were to use the squared l2 norm you would instead have have to compute $\psi_j\left(\pi_j\right) -\langle \nabla \psi_j(\pi_j), \pi_j \rangle$ which should just be $-\frac{1}{2}||\pi_j||^2$.

from open_spiel.

Implementation of dilated form of MMD about open_spiel HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent