Comments (2)
Thank you for your comprehensive explanation! It has been immensely helpful to me. Now, I have a comprehensive understanding of the formula. Following your guidance, I successfully implemented the OGDA on NFG and achieved convergence results. I am now closing this issue.
from open_spiel.
Hi @Atan03! I'm glad you are using my code and am happy to answer your question.
It seems that setting
$\alpha =0$ in MMD causes it to degenerate into MD.
Exactly! MMD with
do I only need to replace the term involving negative entropy in MMD with L2-norm and replace the softmax with projection onto the probability simplex?
It is true that you would need to change both the negative entropy and softmax terms, but you would also need to the change the gradient computation and specifically the part you pointed out with regards to the num_children
. The subtraction of num_children
trick is only valid for the case of dilated entropy but not dilation with the l2 norm.
Below I can run through the math on why this is the case.
Say we want to compute the partial derivative
- The dialted entropy at state
$s$ - All the children states of the sequence
$C(s,a)$ . These are all the possible states that can be reached after selecting action$a$ at state$s$ .
Mathematically we have:
I've introduced some new notation, here is a breakdown:
-
$x_{(s,\cdot)}$ is the slice of the sequence form corresponding to state$s$ -
$p(s)$ is the parent sequence of state$s$ -
$\psi_s$ is the local dgf at state$s$ , in our case we are picking negative entropy -
$\pi_s = \frac{x_{(s,\cdot)}}{x_{p(s)}}$ the policy at state$s$ .
The first part (1) by chain rule just reduces to the partial derivative of negative entropy (or wtv dgf you use) with respect to the policy at state
The second part is more interesting for us. If we just look at one child state
If we plug in negative entropy as
Therefore, each child contributes a
from open_spiel.
Related Issues (20)
- Bug with nox HOT 3
- PPO and selfplay HOT 1
- Regarding the legality of commercially licensed board games HOT 2
- Block dominoes implementation HOT 2
- Adding a new python game HOT 7
- Problem with Julia API on Ubuntu 24.04 HOT 7
- Problem with Python AlphaZero using Keras 3 HOT 6
- Problem with RCFR using Keras 3 HOT 3
- Problem with TF2 version of Deep CFR using Keras 3 HOT 1
- Spades Implementation HOT 15
- Returned Policies and Exploitability HOT 2
- dqn_torch_test build failure HOT 6
- AlphaZero pseudo code available? HOT 2
- developing agents for team dominoes HOT 13
- [Puzzle] N-Queens HOT 7
- Spielviz gives AttributeError: module 'pyspiel' has no attribute 'GameParameter' HOT 28
- Failure in alpha_zero.py HOT 1
- chat_game_base.py prints lots of stuff during testing
- Issue with Downloading OpenSpiel Package via pip HOT 6
- Congestion games via C++ API HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from open_spiel.