bahriddin / deepmine Goto Github PK
View Code? Open in Web Editor NEWAI Research project
AI Research project
Our aim is to train an artificial intelligence player for Abalone game based on Alpha Zero's algorithm.
Abalone is an award-winning two-player abstract strategy board game. Players are represented by opposing black and white marbles on a hexagonal board with the objective of pushing six of the opponent's marbles off the edge of the board. Hexagonal board, more possible actions and more complex rules make this game harder to implement than Go.
We want to:
In order to prevent the game from being so complicated that the training process cannot be completed within three months, we have looked for other games as alternatives, such as Oware.
Related work:
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Chen, Y. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354.
Ozcan, E., & Hulagu, B. (2004). A simple intelligent agent for playing abalone game: Abla. In Proc. of the 13th Turkish Symposium on Artificial Intelligence and Neural Networks (pp. 281-290).
Lemmens, N. P. P. M. (2005, June). Constructing an abalone game-playing agent. In Bachelor Conference Knowledge Engineering, Universiteit Maastricht.
There are 2 teams in M x N world. Initially, team agent's position, position of
walls are placed symmetrically. Each team has same K number of agents. Each
team has to kill enemies. Killing condition is following:
Some of the complex situations:
All of these actions have the same price:
So turning back takes 2 steps: whether 2 times left or 2 times right.
Each team's goal to remove all members of competitors. Each team members will play turn-by-turn. So if we call teams A and B: A1-B1-A2-B2-...
If none of the teams won't win after (N+M) x K x c (c=const) moves, the game will be stopped and draw result would be recorded.
This is partially observable game: agents have exact knowledge about what they see and if enemy is closer than 4 blocks, they hear and guess enemy is close.
In simple shooting/fighting complex world team games (like Counter Strike, Quake or tank), can we create smart AI team to beat humans.
The Effect of Different Reinforcement Learning Algorithms on the Performance of AI for Survival Game in MineCraft.
The game was found in mob_fun.py. It is a demo of mob_spawner block - creates an arena, lines it with mob spawners of a given type, and then tries to keep an agent alive.
Mob spawners will continue to appear.
The agent will lose health if it is hit by the mob spawners. If the health value drops to 0 then the game ends. There are some apples distributed randomly in the arena, and the agent will get scores when eating apples. The purpose of our agent is to try to survive and get more scores. We may change some of its rules to be more suitable for our experiments.
Reinforcement learning algorithms used to train the agent
Simple Rules (Control) | DQN | Double DQN | Prioritized Replay | Dueling DQN | Policy Gradient | Actor Criti |
---|---|---|---|---|---|---|
3 times | 3 times | 3 times | 3 times | 3 times | 3 times | 3 times |
Gomoku, also called Gobang or Five in a Row, is an abstract strategy board game. It is traditionally played with Go pieces (black and white stones) on a Go board, using 15×15 of the 19×19 grid intersections. Players alternate turns placing a stone of their color on an empty intersection. The winner is the first player to form an unbroken chain of five stones horizontally, vertically, or diagonally.
We want to start researching from the following questions:
Methods:
Baseline:
Some naive strategies used by human beginners.
Related work:
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Chen, Y. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354.
Zhao, D., Zhang, Z., & Dai, Y. (2012). Self-teaching adaptive dynamic programming for Gomoku. Neurocomputing, 78(1), 23-29.
Shao, K., Zhao, D., Tang, Z., & Zhu, Y. (2016, November). Move prediction in Gomoku using deep learning. In Chinese Association of Automation (YAC), Youth Academic Annual Conference of (pp. 292-297). IEEE.
Tan, Q., & Hu, X. CS221 Project Final Report Gomoku Game Agent.
Authors of recently published research Alpha Zero stated that this technique could be easily generalised to other problems without significant human effort and it approached better than other state-of-the-art alternatives [1]. They tested the system in 3 board games, specifically, Go, Chess and Shogi. We would like to check it in the different game(s) with closer game-space and game-tree complexities.
This list can be modified in the later stages of the research.
We would like to find an answer to our research questions using at least one of this board games:
*Gomoku*, also called Gobang or Five in a Row, is an abstract strategy board game. It is traditionally played with Go pieces (black and white stones) on a Go board, using 15×15 of the 19×19 grid intersections. Because pieces are not moved or removed from the board, Gomoku may also be played as a paper and pencil game. The game is known in several countries under different names.
-- Wikipedia
It's space complexity in 15x15 board is about 3225 while game tree complexity 1070 [3].
Abalone is an award-winning two-player abstract strategy board game designed by Michel Lalet and Laurent Lévi in 1987. Players are represented by opposing black and white marbles on a hexagonal board with the objective of pushing six of the opponent's marbles off the edge of the board.
-- Wikipedia
It's space complexity is about 6.5 * 1023 while average branching factor 60 and game tree complexity 5 * 10154 [4].
Try to imagine a scenario in the future that several unmanned vehicles are scheduled to pick up and transport passengers. The map of the city is stored in each vehicles. The problem is how to plan the global optimal (or near optimal) path for each vehicles so that all the passengers do not need to wait too long. Also, we don't want the vehicles to waste their energy on unnecessary wandering.
We define several steps that can help us to solve this problem gradually.
In the first step, we will get rid of time and there is only one agent. All passengers are waiting on their positions at the beginning. They don't care about when they will be picked up, in other words, they will wait forever. Obviously, it is a classic planing problem. The optimal path can be planned by A* under these conditions.
In this step, we will put several agents in the scenario, so we will consider the real multi-agent problem. We will try to use different algorithms to make best schedule, which can minimise each passenger's waiting time, and shortest path, which save most fuel. Meanwhile, the agents need to avoid traffic accident.
Finally, we take time into consideration. We will try to enable the passengers to make their orders at any time they want. Thus, our agents need to replan their schedules at any possible time slots. Also, we introduce the concept of time window - if a passenger wait too long, he/she may lose patience then cancel the order. The agent's goal is to make every passengers happy if possible. If not, minimise the lose of passengers, meanwhile, minimise the consumption of energy.
To decide whether a plan is valid/optimal, we need to consider:
K-robust
The plan must be K-conflict free. K-conflict occurs iff there exists ∆ ∈ [0,k] such that two or more agents are located in the same location in time steps t and t + ∆.
The loss of passengers
As aforementioned, the passengers will cancel their orders when they lose patiences. It is the most important metric in step 3. The plan should minimise the loss of passengers then other metics matter.
Total waiting time of passengers
We don't want our passengers to wait too long, even though the waiting time is within their tolerance.
Total path length
This metric reflects the energy consumption by all unmanned vehicles.
Others
E.g. The variance of waiting time in different passengers.
We create a map representing a city road network, which will be stored in the agents' memory as grid.
Passengers can be represented by blocks located at any positions in the map. They can be picked up by the agents then dropped when they arrive the destination.
Players (agents) in the Minecraft play the role of unmanned vehicles in the real scenario, they perform one of the seven possible actions at each time step.
Completeness
The method must guarantee to find a solution when there is one
Optimality
The method must return the optimal method if it exists.
Time complexity
It is measured in the number of states(nodes) that being expanded during problem solving. Obviously, the less nodes we need to expand, the less time we need to get the result.
Space complexity
The memory the method need to use to solve the problem.
Scalability
Including scalability of agents/passengers/size of map.
It is a combination of Multi-agent Path Finding (MAPF) problem and scheduling/transportation problem. We can create smart AI to plan the global optimal (or near optimal) path for each agents.
We have found some previous works which focus on MAPF problem. We will adjust them to the problem we define and compare the performance among all implementation and get knowledge.
A* based solution
The A* family of algorithms are natural solvers for MAPF. It is the benchmark for other algorithms.
Standley’s enhancements
Three methods that substantially improve the basic A* setting were introduced by (Standley 2010).
Independence detection (ID)
The basic idea of ID is to divide the agents into independent groups. Two groups of agents are independent if there is an optimal solution for each group such that the two solutions do not conflict.
Conflict voidance table (CAT)
The CAT stores the location and time of every agent in every group. Then, when an MAPF solver is applied for a given group, ties between nodes with the same f-value are broken in favor of the node that has fewer entries in the CAT.
Operator decomposition (OD)
While the former two ideas are general and can be used by any MAPF solver, OD is specific for an A*-based solver. OD reduces the branching factor by introducing intermediate states between the regular states. Intermediate states are generated by applying an operator to a single agent only.
K-Robust Multi-Agent Path Finding (Atzmon)
https://aaai.org/ocs/index.php/SOCS/SOCS17/paper/viewFile/15797/15072
Conflict-Based Search For Optimal Multi-Agent Path Finding (Sharon)
https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/viewFile/5062/5239
Multi-Agent Pathfinding as a Combinatorial Auction (Amir)
https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9572/9496
A review of dynamic vehicle routing problems (Pillac)
https://hal.archives-ouvertes.fr/hal-00739779/document
Finding Optimal Solutions to Cooperative Pathfinding Problems(Standley)
https://pdfs.semanticscholar.org/2529/f40c4f79ef24165dbb1a8327770e37cced2d.pdf
Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm
https://link-springer-com.ezp.lib.unimelb.edu.au/article/10.1007/BF00940812
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.