Using deep reinforcement learning to design a broadband acoustic cloak. Created under the supervision of PhD. Feruza Amirkulova and PhD Peter Gerstoft. With the help of: Linwei Zhou, Peter Lai, and Amaris De La Rosa.

Python 64.41% MATLAB 27.83% Jupyter Notebook 7.76%

reinforcement-learning-applied-to-metamaterial-design's Issues

Improve on old RL code

This is the first code written for this RL project using a static dataset. Somehow it preforms better than the methods we have now. It is simply a critic network which learns the resulting change in mean TSCS caused by an action. Optimized configurations are discovered by selecting random actions from a starting configuration and choosing the one which suppresses the scattering the most.

code is located in the src/Tristan_Shah/PyTorch_Projects/cleanImplementation.ipynb
https://drive.google.com/drive/folders/17lw1r6YJOb0TpFJqJ51uzfq-hK81HLB-?usp=sharing

Test cases:

Move code into repository
Replicate results achieved earlier (< 0.3 rms tscs)
Improve on the results (maybe use CNN)

Need to merge fc_model onto master

Need to combine feature on to the main branch because fc_model is becoming more successful

Save data from runs to database

Create some code which allows us to save (state, action, reward, next_state, done) tuples to database of some kind. Also include ability to save images.

Try using some RL techniques to test different hyperparameters on this data.

Test cases

able to save data
able to easily read in data to python
run experiments on data to determine which hyperparameters are the best. Specifically number of hidden layers, neurons per layer, gamma, optimizer type.

Convert environments to gym Env

Current behavior is that states are passed to Actor network which generates an action and scales it to a specified range. To simplify the code and add the ability to have different scales of actions we need to use gym env action spaces.

Test cases:

Change environments to gym.Env
Scale actions to action range specified in gym action space
Test ddpg training cycle to see if it converges

Parallelize code so we can run multiple environments at the same time.

Create a new branch to work on this issue.

Currently the code operates as one agent interacting with one environment. This slows the training down significantly. If we have multiple environments generating data at asynchronously then update the agents at each learning step we will speed up train time.

Note: Ray seems like a good library to use for this task.
https://github.com/ray-project/ray

Ray tutorial
https://www.youtube.com/watch?v=q_aTbb7XeL4

Test cases:

Able to control number of workers
See increased gpu utilization as a result of parallelization.

Try normal distributed actions with mean at output of actor network and scale equal to epsilon

Soft actor critic

Implement soft actor critic

Test cases:

Improves results

Create a better way of applying actions

Currently the way we apply an action to a configuration is to simply add the action vector to the coordinates of the current configuration. If the resulting configuration is invalid (overlapping cylinders or cylinders beyond walls) then we reject the next configuration and move back to the original one as well as give a negative reward. This current system probably causes the agent to not see as many states since every illegal move causes the environment to revert back to the same state it was in before the move. We need a way to apply partial actions to the environment in a consistent and time efficient manner.

Test Cases:

New step function allows partial actions

Design a better reward function which works for all wavenumber ranges

Try creating a better reward function which is universal for all wavenumber ranges. So far I have been getting ok results for simple reward functions but maybe there is a better solution.

You can modify the reward function (getReward) in the env.py file in the DDPG folder.

helpful vids:
https://www.youtube.com/watch?v=0R3PnJEisqk&t=4s
https://www.youtube.com/watch?v=PYylPRX6z4Q&t=38s

Test cases

new reward function improves lowest rms tscs discovered (< 0.45)

Use convolutional networks to process image data in addition to the standard information.

Create new branch to work on this feature.
In models.py in the DDPG folder you can create two more models (ImageActor, ImageCritic) which are able to process images in addition to the standard data passed.
Create a new ImageDDPG object which inherits from DDPG and overrides any methods you need to change.

Notes:

We already have a function in env.py which produces an image from a configuration of cylinders. Call env.getImage(env.config) to produce image.
You will also need to modify the way we store data, I suggest adding two additional fields to the namedtuple on line 51 in ddpg.py.

namedtuple('Transition',('s', 'img','a','r','s_', 'nextImage', 'done'))

Test cases:

We can specify the architecture of the Image nets through their constructors.
There is a file which we can run to initiate training which uses a ImageDDPG agent which uses images.
Run several experiments and see if it helps.

Exploration with parameter noise.

Create a new branch to work on this issue.

_Currently the way our DDPG explores is the actor generates an action which would be represented by an 8 by 1 vector and then noise sampled from a normal distribution with a mean of 0 and scale of epsilon is added to it. This is an ok way to explore but perhaps there is a better way. I found a paper by openAI which uses parameter noise in order to explore. https://openai.com/blog/better-exploration-with-parameter-noise/

Read this paper and implement it on our DDPG by adding new noisy neural networks to the models.py file._

Implement and test Ornstein Uhlenbeck noise

Test cases:

Parameter noise implemented in neural networks.
Run experiments to see if it helps.

Create and test policy gradient

Currently with DDQN using discrete actions and a step size of 0.5 the lowest scattering the agent can find is ~0.45. This is not low enough. With a continuous action space we may get better results.

Create a new way to train the network

To solve this issue, we can first train the agent to learn how to output a valid actions. Instead of come back to the original state when an invalid action was given, we can execute the invalid action and give the agent a penalty. This way the agent can learn from its own mistake.

After the training, we can transfer the weight to the new agent. Such way the numbers of invalid actions are minimized. We can use this new agent to speed up our training time and eliminate the exploration problem.

Get image model working

We need to implement a conv net with coord-conv which solves the TSCS environment.

Transfer to Matlab

Transfer the code to MATLAB, so we can use MATLAB's parallel computation toolbox.

Create a new branch and a new folder which contains the Matlab code

Test Cases:

Successfully implement our python environment as a matlab RL environment
Successfully train a matlab RL agent on environment

Increase number of cylinders using Multi-Agent DDPG

Attempting to increase the number of cylinders in the environment with a single agent shows no sign of convergence. Perhaps this is because the problem becomes much more complex when we increase the number of design parameters.

If we expand the number of agents maybe the problem will become simple enough for each agent to solve.

Here is a link to a paper on Multi-Agent DDPG: https://arxiv.org/pdf/1706.02275.pdf

https://towardsdatascience.com/openais-multi-agent-deep-deterministic-policy-gradients-maddpg-9d2dad34c82

Test cases:

Implement multi agent environment
Implement multi agent DDPG actors
Implement centralized DDPG critic
Show an improvement in performance to single agent DDPG

gladisor / reinforcement-learning-applied-to-metamaterial-design Goto Github PK

reinforcement-learning-applied-to-metamaterial-design's People

Contributors

Stargazers

Watchers

Forkers

reinforcement-learning-applied-to-metamaterial-design's Issues

Recommend Projects

Recommend Topics

Recommend Org