dd_qnet's Introduction

Double Dueling Q Net

Requirements
Environment setup
Project structure
- Environments
- Models
Usage
Pretrain models

Requirements

Basic requirements:

python2
gym-gazebo
keras
tensorflow

Environment setup

The gym-gazebo environments can be found here.

Copy custom folder to gym-gazebo/gym_gazebo/envs.
Copy launch, models, worlds to gym-gazebo/gym_gazebo/envs/assets and skip duplicated files.
Replace __init__.py file in the gym-gazebo/gym_gazebo folder with the __init__.py file here.
Replace the line <arg name="world_file" default="/home/cloud/gym-gazebo/gym_gazebo/envs/assets/worlds/maze_color.world"/> in the gym-gazebo/gym_gazebo/envs/assets/launch/MazeColor.launch file with your own path to the maze_color.world file

Project structure

Environments

The environment files used for different training and tesing situation, they are all stored in custom_envs folder. To use one environment, copy the code in gazebo_turtlebot_maze_color*.py to gazebo_turtlebot_maze_color.py in gym-gazebo/gym_gazebo/envs/custom List environments:

gazebo_turtlebot_maze_color.py: enviroment for training CNN model which use both image and laser as state input. To change target position, change line 52 self.num_target = 1 to another index (from 0 to 2).

gazebo_turtlebot_maze_color_laser_only.py: enviroment for training laser model. The environment have 5 target position to be random at the start or after one episode end.

gazebo_turtlebot_maze_color_laser_only_ver2.py: environment for training laser model. The environment have 5 hint at 5 corner of the maze, the laser model can learn to go to the corner and turn back.

gazebo_turtlebot_maze_color_laser-image.py: environment for testing laser model with the image processing strategy. It have 5 different reward and hints positions which can be change by line 59 self.num_target = 1 in the code to another index (from 0 to 4).

Models

All the main codes are stored in src folder We have 2 type of model:

ddq_model.py: the model used CNN architecture and image with laser as input state. The model use the architecture proposed in this paper
laser_model.py: The model use DNN and laser as input state.

Usage

Image CNN model

Replace code in gym-gazebo/.../custom/gazebo_turtlebot_maze_color.py with the code in gazebo_turtlebot_maze_color.py to use this model.

Train:

python qlearning.py

For more parameters: python qlearning --help

Test model: python test.py <from_pretrain_dir> <epsilon>

python test.py ddq_model 0.01

DNN laser model

Replace code in gym-gazebo/.../custom/gazebo_turtlebot_maze_color.py with the code in gazebo_turtlebot_maze_color_laser_only.py or gazebo_turtlebot_maze_color_laser_only_ver2.py to use this model. Train:

python laser_learning.py

For more parameters: python qlearning --help

Test model: python test_laser_only.py <from_pretrain_dir> <epsilon>

python test_laser_only.py laser-only 0.1

Use laser model with hint detect strategy

Replace code in gym-gazebo/.../custom/gazebo_turtlebot_maze_color.py with the code in gazebo_turtlebot_maze_color_laser-image.py to use this model.

Run: python test_laser_image.py <from_pretrain_dir> <epsilon>

python test_laser_image.py laser-only 0.0

Pretrain models

Some of our pretrain model can be found here

It can be use as pretrain model or continue to train with the parameters --from_pretrain or --continue_from in the learning files. For examples: python laser_learn --continue_from laser-only --output_dir laser-only

dd_qnet's People

Contributors

Stargazers

dd_qnet's Issues

Enviroment for training laser model (gazebo_turtlebot_maze_color_laser_only)

Hi, when I train the laser model by running gazebo_turtlebot_maze_color_laser_only.py, I have an error.
In utils.py, "parser.add_argument("--num_pretrain_step", type=int, default=10000)". This definition shows that the first 10,000 steps are all randomly selecting actions. When it exceeds 10,000, it starts to select actions from the network.
But after the steps exceed 10000, I encountered the following error in terminal:

`Done epoch in 10 steps, 10 random steps, Total reward: -191, Total step: 4981, Average loss: 0.0
Setting target 1th
Traceback (most recent call last):

File "laser_learn.py", line 48, in
action = qnet.get_actions(state.reshape(1, -1))[0]
File "/home/nuc/gym-gazebo/dd_qnet/src/laser_model.py", line 55, in get_actions
qvalues = self.get_qvalues(states)
File "/home/nuc/gym-gazebo/dd_qnet/src/laser_model.py", line 47, in get_qvalues
predicted = self.model.predict(states)
File "/home/nuc/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1152, in predict
x, _, _ = self._standardize_user_data(x)
File "/home/nuc/.local/lib/python3.5/site-packages/keras/engine/training.py", line 754, in _standardize_user_data
exception_prefix='input')
File "/home/nuc/.local/lib/python3.5/site-packages/keras/engine/training_utils.py", line 136, in standardize_input_data
str(data_shape))

ValueError: Error when checking input: expected input_1 to have shape (100,) but got array with shape (1,)`

Looking forward to your reply, thank you.

mattbui / dd_qnet Goto Github PK