datamllab / rlcard Goto Github PK
View Code? Open in Web Editor NEWReinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Home Page: http://www.rlcard.org
License: MIT License
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Home Page: http://www.rlcard.org
License: MIT License
########## Evaluation ##########
Timestep: 629402 Average reward is 0.458
########## Evaluation ##########
Timestep: 1258220 Average reward is 0.46
########## Evaluation ##########
Timestep: 1888626 Average reward is 0.514
########## Evaluation ##########
Timestep: 2516620 Average reward is 0.506
########## Evaluation ##########
Timestep: 3144764 Average reward is 0.492
########## Evaluation ##########
Timestep: 3774566 Average reward is 0.468
########## Evaluation ##########
Timestep: 4402996 Average reward is 0.422
this is my doudizhu_nfsp_result log, the more trained , the worse resullt, why?
Line 270 in rlcard/rlcard/agents/dqn_agent.py seems to have a typing error:
Line 270 in class Normalizer, method append is:
self.std = np.mean(self.state_memory, axis=0)
Should np.mean be np.std ?
agent = dqn_agent_pytorch.DQNAgent("dqn {}".format(i), action_num=env.action_num, state_shape=env.state_shape, mlp_layers=[128,128])
When trying to initialize some dqn pytorch agents I am getting the above error.
Am I doing sth. wrong here? And how can I solve the issue?
Edit:
In general are there some guides on how to build my own games and how to use pytorch for training?
Doudizhu needs three-person, how can I play with two ai in this game?
pyTorch implementation for agents
Limit Hold'em: escabeche, SmooCT, Hyperborean
http://rlcard.org/games.html#mahjong
'It is commonly played but 4 players. ' should be 'It is commonly played by 4 players. '
how to make a baccarat game with the rlcard framework?
Human-sized games could be too complex for the algorithms. We will implement smaller versions of the games like Dou Dizhu, Majong, UNO, to make it feasible for research. Thanks for the feedback from the anonymous reviewers.
Implement best response with step and step_back
in function determine_role of doudizhu game,you choose the index 0 as landlord default, why not add a action named "determine landlord" ,so we can train to decide which player should be the landlord. i just confused
It looks like nfsp_agent samples the best-response network in evaluation mode. I copied this behavior in the PyTorch implementation. However, Theorem 7 in [1] argues that it is the average strategy profile that converges to a Nash equillibrium. Sampling the best-response network produces a deterministic pure strategy, while the average policy network produces a stochastic behavioural strategy. This is discussed in Section 4.2 of [2]. Also, it looks like DeepMind's implementation [3] samples the average policy network in evaluation mode.
Am I missing something?
References:
[1] Heinrich et al. (2015) "Fictitious Self-Play in Extensive-Form Games"
[2] Heinrich and Silver (2016) "Deep Reinforcement Learning from Self-Play in Imperfect Information Games"
[3] Lanctot et al. (2019) "OpenSpiel: A Framework for Reinforcement Learning in Games"
i notice that DQNAgent->_build_model has only two fc net , is it too simple?, and can not get better performance? why not use conv net ?
Hey there,
I installed rlcard via pip install rlcard when trying the example uno.py I had to do some modifications as the code installed with pip was not the most recent one. I got the example to run after small modifications (getting most recent code samples).
Question
However after training a long training I still have a very very small reward:
timestep | 3939973
reward | 0.004
timestep | 3949951
reward | -0.04
What parameters for training do you use?
How long do you train?
What do I miss?
Does anyone here has different results?
I did not change any params:
with tf.Session() as sess:
# Initialize a global step
global_step = tf.Variable(0, name='global_step', trainable=False)
# Set up the agents
agent = DQNAgent(sess,
scope='dqn',
action_num=env.action_num,
replay_memory_size=20000,
replay_memory_init_size=memory_init_size,
state_shape=env.state_shape,
mlp_layers=[512, 512])
random_agent = RandomAgent(action_num=eval_env.action_num)
env.set_agents([agent, random_agent, random_agent, random_agent])
eval_env.set_agents([agent, random_agent, random_agent, random_agent])
LaoTie,clould you give an example of No-Limit Texas Holdem to help me understand how the "Normalizer" works?
Title stands: when can I use Gin Rummy? There's crashes and issues with the reward computation in it's current state.
I have finished code for the card game Gin Rummy. How do I submit it if that is ok with you?
Note that the DQN training of it was very poor (essentially nothing learned). I have an option to specify an extremely simple version where the actions are essentially just discarding cards and the player scores 1 if there are no kings or queens in the hand else scores 0. This got to an average reward of 0.7 half-way through the training, but then fell down to 0.2 and stayed there.
I am not sure that I am using the training methods correctly. I just modified how Mahjong did DQN learning.
When running the install script given in the readme, it produces the following error:
Could not find a version that satisfies the requirement tensorflow<2.0,>=1.14
When I set self.init_chips=2000 (1000bbs) in game.py, the code runs extremly slow. I found out it's because of the action space setting. The num of leagl actions is too large(from 2 to 2000).
Why is Uno in the list when Skip-Bo https://en.wikipedia.org/wiki/Skip-Bo and Phase 10 https://en.wikipedia.org/wiki/Phase_10 isn't, when in fact they are popular as well?
Here are some games that are similar to Dou Dizhu and Uno that may be interesting
how to use the pretrained model, such as nfsp agents. i want to play doudizhu with 3 players, all of them load the pretrained model
Hey, could you please release this library on PyPI, so people can just do pip install rlcard
instead of having to clone the repo first? It makes it easier to use your code.
Specifically, I'm going to release an RL library on PyPI soon using various RL environment libraries. I'd like it use rlcard in addition to others, but to depend on rlcard it would have to either be included with my package (which is undesirable) or be installed from pip per a requirements.txt file (and thus hosted on PyPI).
Your actual environments only depend on numpy and matplotlib. When you install rlcard via pip, it's because you're using the environments as part of a larger thing (in my case as a dependency of a package I'm going to release), not because you want to reproduce experiments with sample code.
The specific problem I have is that, as previously mentioned, I'm releasing a large library that depends on rlcard. Having that library in turn depend on tensorflow, tensorflow probability and sonnet is undesirable for me, as it will be for many people who'd like to use rlcard environments (the main use case), especially since you restrict TF to 1.14 or 1.15.
Can you remove those as requirements of RLCard in the PyPI release? Per the above, I think that removing the demo code from the PyPI release or having people install an additional appropriate version of tensorflow etc would be what you'd normally do in a situation like this.
Hi, I want to ask how to save and load the DQNAgent and NFSPAgent such that we can reuse it once the training is finished.
Thanks!
I want to push rlcard/tests/games/test_gin_rummy_games.py from my GitHub repo to the main dev repo. When I try to do that it seems that I have only the option to push all my changes. However, I just want to push this single file (which you requested that I do).
Right now, my GitHub repo has a lot of files that I did not intend to commit from my local repo. I am still learning git. I would think you don't care what is in my GitHub repo except for the pushes that I request. I have now incomplete versions of files that I am working on locally that got committed to my GitHub repo and which shouldn't be pushed to the dev repo.
rule-based AI for Dou Dizhu maybe help the agent train with reasonable actions ? improve rate of convergence?
Can I used the Agents as below for game Dou dizhu?
First, add my own Rule based model Agent, then
agent_CFR = cfr_gent()
agent_RuleBased = MyRuleAgent()
anent_NFSP = nfsp_agent()
env = rlcard.make('doudizhu')
env.set_agents([agent_CFR, agent_RuleBased, anent_NFSP])
Facing issue when trying to install torch, my workaround is to uncomment torch at setup.py
I saw this error when running pip install -e .
ERROR: Could not find a version that satisfies the requirement torch>=1.3 (from rlcard==0.1.6) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3 (from rlcard==0.1.6)
blackjack env works fine without torch.
Setup Versions:
conda 4.7.11
Python 3.7.5
Would it be a good idea to include Gong Zhu https://en.wikipedia.org/wiki/Gong_Zhu and Sheng Ji https://en.wikipedia.org/wiki/Sheng_ji into this, as it is popular among Chinese communities (and both are trick-taking games)?
File "/rlcard/agents/cfr_agent.py", line 72, in traverse_tree
utility = self.traverse_tree(new_probs, player_id)
File "/rlcard/rlcard/agents/cfr_agent.py", line 71, in traverse_tree
self.env.step(action)
File "/rlcard/rlcard/envs/env.py", line 62, in step
next_state, player_id = self.game.step(self.decode_action(action))
File "/rlcard/envs/doudizhu.py", line 94, in decode_action
for legal_action in legal_actions:
TypeError: 'NoneType' object is not iterable
Rummy is still very poplar in Europe and America, it might be a good idea to include one of these variants into your system. A side note: Rummy is "the western version of Mahjong".
I am working on a gui interface for my Gin Rummy program.
Is it ok with you for me to submit it?
There are two parts. Do you want the smaller part submitted first or both parts submitted at once?
The first part is a simple gui program with 8 python files. It does not interface with the rlcard environment. It has a menu bar, a preferences window, an about window, and the main window with 52 cards laid out in a 4 by 13 grid. A card can be clicked on and its name is printed in the console. A card can be right clicked or shift tapped and it flips over.
The second part has 22 python files. It interfaces with the rlcard environment of gin rummy.
Hi rlcard team, awesome work!
I'd like to know could I get perfect information for the game and how? e.g., can I get all cards information of three players in Doudizhu?
Thanks!
老乡,您好。请问怎么获得斗地主的训练数据?能否发布一个简单demo,实现斗地主人机对战。非常感谢!
save the total game tree?
if do that, how to load it.
and the same problem as deep cfr
when i want to find how to save the agent model ,i can not find the model save code,but the pretrained model leduc_holdem_nfsp exsit.
saver = tf.train.Saver(tf.model_variables())
saver.restore(self.sess, tf.train.latest_checkpoint(check_point_path))
so where is saver.save ?
Limit Hold'em: escabeche, SmooCT, Hyperborean
I am confused by the state encoding of Uno. According to the documentation, the default state is encoded into 7 feature planes with each plane having a one-hot encoding of all possible cards. Planes 0 to 2 represent the player's hand, as seen in the example below. However, Plane 0 is just the inverse of Plane 1 and Plane 2 is always all zeros. The same pattern is repeated for Planes 4 to 6. Is there any reason for this?
[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]]
<Wrap models. You need to inherit the Model class in rlcard/models.model.py. Then put all the models for the players into a list. Rewrite get_agent function and return this list.> can not find the get_agent function
<Load the model in environment. To load the model, modify load_pretrained_models in the corresponding game environment in rlcard/envs. Use the resgistered name to load the model.>
can not find load_pretrained_models function
For example, th InfoSet Number of Dou Dizhu is 10^53 ~10^83, the Avg.InfoSet Size is 10^23, how to explain them and calculate them? Thanks a lot :)
It seems that '33334444' is legal for four_two_pair type, and '3333444555' is legal for trio_pair_chain_2 type, but '34445555' is illegal for trio_solo_chain_2 type. is it a bug?
For example, as a human player, I choose 5553. What is my encoded action in [0, 308]?
You have implemented deep_cfr algorithm in your code, but there is not an example for it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.