Thank you for your contribution to provide population-based algorithms, such as fictit

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Some questions about population-based algorithms about open_spiel HOT 4 CLOSED

Root970103 commented on June 19, 2024

Some questions about population-based algorithms

from open_spiel.

Comments (4)

Root970103 commented on June 19, 2024

Specifically, I want to know how the performance of the algorithm can be evaluated. Similar to the evaluation of reinforcement learning algorithms can be represented by the reward obtained from the environment. Taking leduc poker as an example, how to prove the algorithm is effective? After the model training is completed, what we expect to save, rl model or the policy. I am not completely famaliar with this. I hope you can give some advice, thank you!

from open_spiel.

lanctot commented on June 19, 2024

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

from open_spiel.

Root970103 commented on June 19, 2024

Hi @Root970103,

If you're using PSRO or some form of fictitious play, the thing you save is either the average strategy, or the entire set of policies coupled with the meta-strategy. The latter can be turned into one policy using the policy_aggregator (if the game is small enough).

A good place to start is this example: https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/examples/psro_v2_example.py

Hope this helps, but please don't hesitate to ask more questions if it's not clear.

Thank you for your reply! I have run this example script. And I obeserved the changes in nash_conv.

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms？

It's very kind of you to give these advices.

from open_spiel.

lanctot commented on June 19, 2024

I also wonder if I can used the trained model (or policy ) against other algorithms in leduc poker env. For example, I want to test the trained model against CFR, the entir policies or the aggregate policy should be used?

Yes, you can extract the policy (that is what the NashConv computation needs) and you can simulate the policy against CFR's policy.

In addition, in an adversarial scenario, is it appropriate to use the q value to evaluate the algorithms？

Q-values are just estimates of values for a state and action. You can turn that into a policy by choosing argmax Q(s,a) but these will be deterministic. So if the environment requires any kind of mixing, you would lose that if you argmax over the Q-values.

from open_spiel.

Some questions about population-based algorithms about open_spiel HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent