datamllab / rlcard Goto Github PK

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.

License: MIT License

Python 100.00%

reinforcement-learning deep-reinforcement-learning game-ai poker card-game poker-game openai-gym gym-environment card-games blackjack

rlcard's People

Contributors

Stargazers

Watchers

Forkers

daochenzha jacketme yli96 splendor-kill cywangwhu todd-ty jhy1993 davidsirui xianjinzha wangxuejie9527 creatorcen xingxinyu andrewnc xzhichen btbujiangjun dapenggg awesome-archive leo-xxx trendingtechnology wangduan023 vsenxx inlog-eneko downseq stavskal dongdongbai deepalchemist xiasd dreadlord1984 interestaiprj chengzhongnan maxisbest davincios ktang2k arain-sh zsf1975 collector-m gsj2019 lhenry15 qingquansong czaoth viwii hsvgbkhgbv automatje myonetaps luoxiang11 zamberjo wwxfromtju haipiyixia huangshizhi xmgfx sddi sinanh kylinliu w604935856 casssini dolotech scottunderhill beknown-j lujunsincerely xiangnanyue unonth studiobnd alexmutad 154461013 algosenses cnhup stzwooju isoundy000 junyuguo q275212 alanzhu39 luoxz-ai 1978mountain pokerpros kswongjx ifeela thisispoker socketstack noahj08 cheng-yuhan res260 emilmirzayev liu6023952 zhengsx niccolosacchi davidh2019 ikukang carmark clarit7 nacoyang mengpi nikl888 fancyerii zrachess a69e michael-z mukund-v tongzhupku adrianp- yueyilia

rlcard's Issues

How to compute exploitability for No-limit Texas Hold'em

bad performance

########## Evaluation ##########
Timestep: 629402 Average reward is 0.458

########## Evaluation ##########
Timestep: 1258220 Average reward is 0.46

########## Evaluation ##########
Timestep: 1888626 Average reward is 0.514

########## Evaluation ##########
Timestep: 2516620 Average reward is 0.506

########## Evaluation ##########
Timestep: 3144764 Average reward is 0.492

########## Evaluation ##########
Timestep: 3774566 Average reward is 0.468

########## Evaluation ##########
Timestep: 4402996 Average reward is 0.422

this is my doudizhu_nfsp_result log, the more trained , the worse resullt, why?

Typing error in class Normalizer, method append, calculating std

Line 270 in rlcard/rlcard/agents/dqn_agent.py seems to have a typing error:

Line 270 in class Normalizer, method append is:

self.std = np.mean(self.state_memory, axis=0)

Should np.mean be np.std ?

module 'torch.nn' has no attribute 'Flatten'

agent = dqn_agent_pytorch.DQNAgent("dqn {}".format(i), action_num=env.action_num, state_shape=env.state_shape, mlp_layers=[128,128])

When trying to initialize some dqn pytorch agents I am getting the above error.
Am I doing sth. wrong here? And how can I solve the issue?

Edit:
In general are there some guides on how to build my own games and how to use pytorch for training?

How can i play doudizhu with AI

Doudizhu needs three-person, how can I play with two ai in this game?

Baseline Algorithms for Limit Hold'em

Limit Hold'em: escabeche, SmooCT, Hyperborean

There is a typo in docs

http://rlcard.org/games.html#mahjong
'It is commonly played but 4 players. ' should be 'It is commonly played by 4 players. '

why no baccarat?

how to make a baccarat game with the rlcard framework?

Implement smaller versions of games

Human-sized games could be too complex for the algorithms. We will implement smaller versions of the games like Dou Dizhu, Majong, UNO, to make it feasible for research. Thanks for the feedback from the anonymous reviewers.

Implement best response evaluation

Implement best response with step and step_back

doudizhu determine landlord

in function determine_role of doudizhu game，you choose the index 0 as landlord default, why not add a action named "determine landlord" ,so we can train to decide which player should be the landlord. i just confused

nfsp_agent samples best-response instead of average policy

It looks like nfsp_agent samples the best-response network in evaluation mode. I copied this behavior in the PyTorch implementation. However, Theorem 7 in [1] argues that it is the average strategy profile that converges to a Nash equillibrium. Sampling the best-response network produces a deterministic pure strategy, while the average policy network produces a stochastic behavioural strategy. This is discussed in Section 4.2 of [2]. Also, it looks like DeepMind's implementation [3] samples the average policy network in evaluation mode.

Am I missing something?

References:
[1] Heinrich et al. (2015) "Fictitious Self-Play in Extensive-Form Games"
[2] Heinrich and Silver (2016) "Deep Reinforcement Learning from Self-Play in Imperfect Information Games"
[3] Lanctot et al. (2019) "OpenSpiel: A Framework for Reinforcement Learning in Games"

DQNAgent net is too simple?

i notice that DQNAgent->_build_model has only two fc net , is it too simple?, and can not get better performance? why not use conv net ?

Uno is not learning anything?

Hey there,

I installed rlcard via pip install rlcard when trying the example uno.py I had to do some modifications as the code installed with pip was not the most recent one. I got the example to run after small modifications (getting most recent code samples).

Question
However after training a long training I still have a very very small reward:

timestep | 3939973
reward | 0.004

timestep | 3945224
reward | 0.036

timestep | 3949951
reward | -0.04

What parameters for training do you use?
How long do you train?
What do I miss?
Does anyone here has different results?

I did not change any params:



with tf.Session() as sess:

    # Initialize a global step
    global_step = tf.Variable(0, name='global_step', trainable=False)

    # Set up the agents
    agent = DQNAgent(sess,
                     scope='dqn',
                     action_num=env.action_num,
                     replay_memory_size=20000,
                     replay_memory_init_size=memory_init_size,
                     state_shape=env.state_shape,
                     mlp_layers=[512, 512])
    random_agent = RandomAgent(action_num=eval_env.action_num)
    env.set_agents([agent, random_agent, random_agent, random_agent])
    eval_env.set_agents([agent, random_agent, random_agent, random_agent])

Normalize the state

LaoTie,clould you give an example of No-Limit Texas Holdem to help me understand how the "Normalizer" works?

When will a stable version of Gin Rummy be released?

Title stands: when can I use Gin Rummy? There's crashes and issues with the reward computation in it's current state.

Code for new game Gin Rummy

I have finished code for the card game Gin Rummy. How do I submit it if that is ok with you?

Note that the DQN training of it was very poor (essentially nothing learned). I have an option to specify an extremely simple version where the actions are essentially just discarding cards and the player scores 1 if there are no kings or queens in the hand else scores 0. This got to an average reward of 0.7 half-way through the training, but then fell down to 0.2 and stayed there.

I am not sure that I am using the training methods correctly. I just modified how Mahjong did DQN learning.

Setup.py Could not find a version that satisfies the requirement tensorflow<2.0,>=1.14

When running the install script given in the readme, it produces the following error:
Could not find a version that satisfies the requirement tensorflow<2.0,>=1.14

Action Space setting of No-Limit Texas Holdem is Unreasonable

When I set self.init_chips=2000 (1000bbs) in game.py, the code runs extremly slow. I found out it's because of the action space setting. The num of leagl actions is too large(from 2 to 2000).

Questions regarding Mattel Games

Why is Uno in the list when Skip-Bo https://en.wikipedia.org/wiki/Skip-Bo and Phase 10 https://en.wikipedia.org/wiki/Phase_10 isn't, when in fact they are popular as well?

Expansion into other Shedding games

Here are some games that are similar to Dou Dizhu and Uno that may be interesting

Zheng Shangyou https://en.wikipedia.org/wiki/Winner_(card_game)
Japanese Daifugō https://en.wikipedia.org/wiki/Daifug%C5%8D
Russian Durak https://en.wikipedia.org/wiki/Durak
Baltic Müller Matz https://en.wikipedia.org/wiki/M%C3%BCller_Matz
German Hund https://en.wikipedia.org/wiki/Hund_(card_game)
Persian Pasur https://en.wikipedia.org/wiki/Pasur_(card_game)
English Switch https://en.wikipedia.org/wiki/Switch_(card_game)
International Cheat https://en.wikipedia.org/wiki/Cheat_(game)

How to access the last game, please! Thank you!

how to use the pretrained model

how to use the pretrained model, such as nfsp agents. i want to play doudizhu with 3 players, all of them load the pretrained model

PyPI Release?

Hey, could you please release this library on PyPI, so people can just do pip install rlcard instead of having to clone the repo first? It makes it easier to use your code.

Specifically, I'm going to release an RL library on PyPI soon using various RL environment libraries. I'd like it use rlcard in addition to others, but to depend on rlcard it would have to either be included with my package (which is undesirable) or be installed from pip per a requirements.txt file (and thus hosted on PyPI).

pip packaging issue

Your actual environments only depend on numpy and matplotlib. When you install rlcard via pip, it's because you're using the environments as part of a larger thing (in my case as a dependency of a package I'm going to release), not because you want to reproduce experiments with sample code.

The specific problem I have is that, as previously mentioned, I'm releasing a large library that depends on rlcard. Having that library in turn depend on tensorflow, tensorflow probability and sonnet is undesirable for me, as it will be for many people who'd like to use rlcard environments (the main use case), especially since you restrict TF to 1.14 or 1.15.

Can you remove those as requirements of RLCard in the PyPI release? Per the above, I think that removing the demo code from the PyPI release or having people install an additional appropriate version of tensorflow etc would be what you'd normally do in a situation like this.

How to save DQNAgent and NFSPAgent?

Hi, I want to ask how to save and load the DQNAgent and NFSPAgent such that we can reuse it once the training is finished.

Thanks!

How to push a single file from my dev repo to main dev repo ?

I want to push rlcard/tests/games/test_gin_rummy_games.py from my GitHub repo to the main dev repo. When I try to do that it seems that I have only the option to push all my changes. However, I just want to push this single file (which you requested that I do).

Right now, my GitHub repo has a lot of files that I did not intend to commit from my local repo. I am still learning git. I would think you don't care what is in my GitHub repo except for the pushes that I request. I have now incomplete versions of files that I am working on locally that got committed to my GitHub repo and which shouldn't be pushed to the dev repo.

rule-based AI for Dou Dizhu can guidance the agent train with astrict actions?

rule-based AI for Dou Dizhu maybe help the agent train with reasonable actions ? improve rate of convergence?

Can I add my own rule-based model for Dou dizhu？

Can I used the Agents as below for game Dou dizhu?

First, add my own Rule based model Agent, then

agent_CFR = cfr_gent()
agent_RuleBased = MyRuleAgent()
anent_NFSP = nfsp_agent()
env = rlcard.make('doudizhu')
env.set_agents([agent_CFR, agent_RuleBased, anent_NFSP])

torch or pytorch at setup.py

Facing issue when trying to install torch, my workaround is to uncomment torch at setup.py

I saw this error when running pip install -e .

ERROR: Could not find a version that satisfies the requirement torch>=1.3 (from rlcard==0.1.6) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.3 (from rlcard==0.1.6)

blackjack env works fine without torch.

Setup Versions:
conda 4.7.11
Python 3.7.5

Questions regarding Gong Zhu and Sheng ji

Would it be a good idea to include Gong Zhu https://en.wikipedia.org/wiki/Gong_Zhu and Sheng Ji https://en.wikipedia.org/wiki/Sheng_ji into this, as it is popular among Chinese communities (and both are trick-taking games)?

cfr for doudizhu: TypeError: 'NoneType' object is not iterable

File "/rlcard/agents/cfr_agent.py", line 72, in traverse_tree
utility = self.traverse_tree(new_probs, player_id)
File "/rlcard/rlcard/agents/cfr_agent.py", line 71, in traverse_tree
self.env.step(action)
File "/rlcard/rlcard/envs/env.py", line 62, in step
next_state, player_id = self.game.step(self.decode_action(action))
File "/rlcard/envs/doudizhu.py", line 94, in decode_action
for legal_action in legal_actions:
TypeError: 'NoneType' object is not iterable

Tree-based wrapper for environment for game tree traversal

Expansion into Rummy

Rummy is still very poplar in Europe and America, it might be a good idea to include one of these variants into your system. A side note: Rummy is "the western version of Mahjong".

GUI for Gin Rummy program

I am working on a gui interface for my Gin Rummy program.

Is it ok with you for me to submit it?

There are two parts. Do you want the smaller part submitted first or both parts submitted at once?

The first part is a simple gui program with 8 python files. It does not interface with the rlcard environment. It has a menu bar, a preferences window, an about window, and the main window with 52 cards laid out in a 4 by 13 grid. A card can be clicked on and its name is printed in the console. A card can be right clicked or shift tapped and it flips over.

The second part has 22 python files. It interfaces with the rlcard environment of gin rummy.

how to get the perfect information?

Hi rlcard team, awesome work!

I'd like to know could I get perfect information for the game and how? e.g., can I get all cards information of three players in Doudizhu?

Thanks!

老乡，您好。请问怎么获得斗地主的训练数据？

老乡，您好。请问怎么获得斗地主的训练数据？能否发布一个简单demo，实现斗地主人机对战。非常感谢！

how can i save the cfr model

save the total game tree?
if do that, how to load it.
and the same problem as deep cfr

i can not find the model save code

when i want to find how to save the agent model ,i can not find the model save code,but the pretrained model leduc_holdem_nfsp exsit.
saver = tf.train.Saver(tf.model_variables())
saver.restore(self.sess, tf.train.latest_checkpoint(check_point_path))
so where is saver.save ?

Is there an example about subgame solving

confusing example

Baseline Algorithms for Limit Hold'em

Limit Hold'em: escabeche, SmooCT, Hyperborean

State Encoding of Uno

I am confused by the state encoding of Uno. According to the documentation, the default state is encoded into 7 feature planes with each plane having a one-hot encoding of all possible cards. Planes 0 to 2 represent the player's hand, as seen in the example below. However, Plane 0 is just the inverse of Plane 1 and Plane 2 is always all zeros. The same pattern is repeated for Planes 4 to 6. Is there any reason for this?

State example obtained during an Uno game.

[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 0 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

 [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]]

datamllab / rlcard Goto Github PK

rlcard's People

Contributors

Stargazers

Watchers

Forkers

rlcard's Issues

timestep | 3945224 reward | 0.036

State example obtained during an Uno game.

Recommend Projects

Recommend Topics

Recommend Org

timestep | 3945224
reward | 0.036