Git Product home page Git Product logo

alpha-gobang-zero's People

Contributors

haormj avatar zhiyiyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alpha-gobang-zero's Issues

why add a negative sign before value in the function backup of the class Node ,I can‘t understand ?

#这是你的源代码
def __update(self, value: float):
        """ 更新节点的访问次数 `N(s, a)`、节点的累计平均奖赏 `Q(s, a)`

        Parameters
        ----------
        value: float
            用来更新节点内部数据
            
        """
        self.Q = (self.N * self.Q + value)/(self.N + 1)
        self.N += 1

    def backup(self, value: float):
        """ 反向传播 """
        if self.parent:
            self.parent.backup(-value)  #there is the negative sign, why ?

        self.__update(value)

thanks advance !

AlphaZeroMCTS get_action node backup 有点难理解

zhiyiYo,你好:

首先非常感谢开源这个项目,正好最近个人对于这个比较感兴趣就看了相关代码,也进行了一些小调整,目标就是通过这个ai战胜我对象,如下代码

https://github.com/zhiyiYo/Alpha-Gobang-Zero/blob/master/alphazero/alpha_zero_mcts.py#L72

应该就是为了判断,游戏的获胜方是否是获取action方,board.current_player 这个会随着do_action变化,感觉修改为

value = 1 if winner == chess_board.current_player else -1
node.backup(value)

当然个人也没有完全看懂,不确认这样修改是否是可以的

請問怎樣在訓練的過程中自動儲存train_losses和games的檔案?

你好
請問如何在訓練中的每一次完成博弈後,都會自動儲存train_losses和games的json檔案到log文件夾裏?
我也想好像作者一樣紀錄train loss和game(對弈的棋譜),但我發現只有(check_frequency=0)的時候,才會觸發BaseException as e才會進入save_model。

fix setting board length error

项目默认为99的棋盘,但是实际中棋盘太小,导致经常和ai出现平局,从而个人暂时想使用1515棋盘代替,但是在训练过程中会提示错误,是比较小的问题,个人等会会提交pr

MCTS中,反向传播为什么要传入负值

这里最后一行,在反向传播时为什么是赋值。对当前节点胜率的估计,为什么要设置成负值。

class AlphaZeroMCTS:
    """ 基于策略-价值网络的蒙特卡洛搜索树 """

    def __init__(self, policy_value_net: PolicyValueNet, c_puct: float = 4, n_iters=1200, is_self_play=False) -> None:
        ...

    def get_action(self, chess_board: ChessBoard) -> Union[Tuple[int, np.ndarray], int]:
        """ 根据当前局面返回下一步动作

        Parameters
        ----------
        chess_board: ChessBoard
            棋盘

        Returns
        -------
        action: int
            当前局面下的最佳动作

        pi: `np.ndarray` of shape `(board_len^2, )`
            执行动作空间中每个动作的概率,只在 `is_self_play=True` 模式下返回
        """
        for i in range(self.n_iters):
            # 拷贝棋盘
            board = chess_board.copy()

            # 如果没有遇到叶节点,就一直向下搜索并更新棋盘
            node = self.root
            while not node.is_leaf_node():
                action, node = node.select()
                board.do_action(action)

            # 判断游戏是否结束,如果没结束就拓展叶节点
            is_over, winner = board.is_game_over()
            p, value = self.policy_value_net.predict(board)
            if not is_over:
                # 添加狄利克雷噪声
                if self.is_self_play:
                    p = 0.75*p + 0.25 * \
                        np.random.dirichlet(0.03*np.ones(len(p)))
                node.expand(zip(board.available_actions, p))
            elif winner is not None:
                value = 1 if winner == board.current_player else -1
            else:
                value = 0

            # 反向传播
            node.backup(-value)

训练新模型时遇到问题

你好,我使用了你默认的参数进行训练,但是进行了一局自我博弈以后,就提示训练结束,已将当前模型保存到XXX。
请问这是什么原因呢?我检查的是运行到train.py的self.dataset.append(self.__self_play()) 这一行就不再继续运行了。

Figure

Hello,how do you draw these figures? I am writing a DQN paper these days and need to draw pictures like in your blog. Thx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.