zhiyiyo / alpha-gobang-zero Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 23.0 73.17 MB

A gobang robot based on reinforcement learning.

Home Page: https://www.cnblogs.com/zhiyiYo/p/14683450.html

License: GNU General Public License v3.0

Python 99.93% QMake 0.07%

alphazero gobang pyqt5 pytorch reinforcement-learning

alpha-gobang-zero's People

Contributors

Stargazers

Watchers

alpha-gobang-zero's Issues

why add a negative sign before value in the function backup of the class Node ,I can‘t understand ?

#这是你的源代码
def __update(self, value: float):
        """ 更新节点的访问次数 `N(s, a)`、节点的累计平均奖赏 `Q(s, a)`

        Parameters
        ----------
        value: float
            用来更新节点内部数据
            
        """
        self.Q = (self.N * self.Q + value)/(self.N + 1)
        self.N += 1

    def backup(self, value: float):
        """ 反向传播 """
        if self.parent:
            self.parent.backup(-value)  #there is the negative sign, why ?

        self.__update(value)

thanks advance ！

AlphaZeroMCTS get_action node backup 有点难理解

zhiyiYo，你好：

首先非常感谢开源这个项目，正好最近个人对于这个比较感兴趣就看了相关代码，也进行了一些小调整，目标就是通过这个ai战胜我对象，如下代码

https://github.com/zhiyiYo/Alpha-Gobang-Zero/blob/master/alphazero/alpha_zero_mcts.py#L72

应该就是为了判断，游戏的获胜方是否是获取action方，board.current_player 这个会随着do_action变化，感觉修改为

value = 1 if winner == chess_board.current_player else -1
node.backup(value)

当然个人也没有完全看懂，不确认这样修改是否是可以的

請問怎樣在訓練的過程中自動儲存train_losses和games的檔案?

你好
請問如何在訓練中的每一次完成博弈後，都會自動儲存train_losses和games的json檔案到log文件夾裏?
我也想好像作者一樣紀錄train loss和game(對弈的棋譜)，但我發現只有(check_frequency=0)的時候,才會觸發BaseException as e才會進入save_model。

fix setting board length error

项目默认为99的棋盘，但是实际中棋盘太小，导致经常和ai出现平局，从而个人暂时想使用1515棋盘代替，但是在训练过程中会提示错误，是比较小的问题，个人等会会提交pr

MCTS中，反向传播为什么要传入负值

这里最后一行，在反向传播时为什么是赋值。对当前节点胜率的估计，为什么要设置成负值。

class AlphaZeroMCTS:
    """ 基于策略-价值网络的蒙特卡洛搜索树 """

    def __init__(self, policy_value_net: PolicyValueNet, c_puct: float = 4, n_iters=1200, is_self_play=False) -> None:
        ...

    def get_action(self, chess_board: ChessBoard) -> Union[Tuple[int, np.ndarray], int]:
        """ 根据当前局面返回下一步动作

        Parameters
        ----------
        chess_board: ChessBoard
            棋盘

        Returns
        -------
        action: int
            当前局面下的最佳动作

        pi: `np.ndarray` of shape `(board_len^2, )`
            执行动作空间中每个动作的概率，只在 `is_self_play=True` 模式下返回
        """
        for i in range(self.n_iters):
            # 拷贝棋盘
            board = chess_board.copy()

            # 如果没有遇到叶节点，就一直向下搜索并更新棋盘
            node = self.root
            while not node.is_leaf_node():
                action, node = node.select()
                board.do_action(action)

            # 判断游戏是否结束，如果没结束就拓展叶节点
            is_over, winner = board.is_game_over()
            p, value = self.policy_value_net.predict(board)
            if not is_over:
                # 添加狄利克雷噪声
                if self.is_self_play:
                    p = 0.75*p + 0.25 * \
                        np.random.dirichlet(0.03*np.ones(len(p)))
                node.expand(zip(board.available_actions, p))
            elif winner is not None:
                value = 1 if winner == board.current_player else -1
            else:
                value = 0

            # 反向传播
            node.backup(-value)

训练新模型时遇到问题

你好，我使用了你默认的参数进行训练，但是进行了一局自我博弈以后，就提示训练结束，已将当前模型保存到XXX。
请问这是什么原因呢？我检查的是运行到train.py的self.dataset.append(self.__self_play()) 这一行就不再继续运行了。

Figure

Hello，how do you draw these figures? I am writing a DQN paper these days and need to draw pictures like in your blog. Thx

zhiyiyo / alpha-gobang-zero Goto Github PK

alpha-gobang-zero's People

Contributors

Stargazers

Watchers

Forkers

alpha-gobang-zero's Issues

why add a negative sign before value in the function backup of the class Node ,I can‘t understand ?

AlphaZeroMCTS get_action node backup 有点难理解

請問怎樣在訓練的過程中自動儲存train_losses和games的檔案?

fix setting board length error

MCTS中，反向传播为什么要传入负值

训练新模型时遇到问题

Figure

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent