Opponent modeling is a necessary element of the reinforcement learning setting, where other agents with competing goals adapt their strategies, yet it remains a challenging task because strategies can change. In this methodology of research, learning to act in ways that are rewarded is a sign of intelligence. Inspired by the recent progress of deep reinforcement learning, this research presents a neural-based model that jointly learns the policy and the actions of the opponent, encoding the observations of the opponent into a graph neural network (GNN). My main argument is that once the agent relies on acquiring some valuable skill by learning, there will be a selective advantage in that skill becoming innate. That is, the agent acquires a generative policy that enables it to predict the opponent's behavior and actions. I further propose a new approach to learning in this domain: To combine Graph Neural Network with actor-critic reinforcement learning (i.e. Proximal Policy Optimization), in which the agent uses the GNN to store the opponent's actions in a graph structure. As graph neural networks apply a different network to each agent and aggregate incoming edge information and by using a mixture of algorithms, the model automatically discovers different strategy patterns of the opponent without re-learning. This research, however, is an in-progress extensive work that is built on an earlier PhD thesis and other related studies that aim to show that graph neural networks are capable of forming a successful learning algorithm that can model and exploit opponent strategies.
ayashabbar / generative-graph-neural-network-ppo-opponent-modeling-in-multi-agent-reinforcement-learning Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License