Git Product home page Git Product logo

adaptivesystems's Introduction

Adaptive Systems

Project Description is available at my university page under Proactive Troubleshooting [Aug2018-Present].

I will update this readme as I get more time, the individual files however have an extensive documentation.

dqn.py

This file gives a template for constructing a Deep Q Learning network. You can specify the no. of hidden units you want but for the moment it takes only one hidden layer.

IRL/irl_finite_space.py

This file implements the Finite Space IRL as put forth in Andrew Ng and Stuart Russel in Algorithms for Inverse Reinforcement Learning. I used pulp based linear program solver but many people prefer using cvxopt package as well. For a sample reward structure the following result was obtained, IRL in Finite Space

dqn_sin_stability.py

This is the most recent code I am stuck on (among many other codes :P ). This should ideally be a DQN implementation of stabilizing a sin function, or a general function. For any given continuous values of a noisy sin output, the agent should choose a noise correction scheme which smoothly approximates the sin function, or the function in consideration. Both the noise and the correction values can be continuous real values which makes this problem non trivial.

acla_with_approxq.py

Lately i realized that the function stabilization problem I was trying to handle couldnt be done without a continuos action space consideration. As the first step towards Continuous Actor Critic Learning Algorithm (CACLA) proposed by Hasselt and Wiering in Reinforcement Learning in Continuous Action Spaces I tried the the ACLA algorithm, which as you'd expect is just a slight variation of the DQN form. The file acla_with_approxq.py is an implementation of the ACLA with value function approximation. I tried out the following configurations,

  1. Actor and Critic with experience replays [This training was by far the slowest I ever saw. There was a progress of 180 episodes over 8hours on an AMD Ryzen Threadripper 2990WX 128 GB]
  2. Combinations of fixing Q targets in Critic and Actor
  3. Using dropouts in Actor
  4. Combinations of stochastic/batch updates of critic/actor

Out of all these, updating Critic with experience replay without target scaling and actor in batch along with target scaling gave the best performance. The environment was CartPole-v1. Timesteps of 500 were achieved within 100 episodes, a further learning rate decay for both actor and critic is expected to speed up this convergence. Experience Replay Critic + Batch Update Actor The actor still has a high variance even though I used a one step TD error as the advantage. However this variance seems to die off once the convergence is achieved, something I'm still trying to explain to myself.

acla_with_mc_returns.py

This file performs an actor critic learning algorithm with monte carlo estimates of the returns. Several experiments were performed and were found consistent with stochastic behaviors of the gradients. The stochastic parameter updates were best with an SGD with learning rate scheduling and nesterov accelerated gradient. However, a full batch gradient descent beat the sgd by a large margin and converged within 500 episodes for cartpole v1.

cacla.py
So finally I was able to write the Continuous Actor Critic Learning Algorithm. I benchmarked this against cartpole continuous environment. The training started pretty low on enthusiasm, but to my surprise the algorithm hit 1189 timesteps in the 420th episode! I used number of updates towards an action where variancet is the running variance of the TD(0) error and δt is the TD(0) error at time t. A gaussian exploration was used. The Actor was trained in a full batch mode, the Critic used an experience replay with fixed targets updated every copy_epochs episodes. CACLA It is worth appreciating the reduction in Actor's variance over time and the corresponding increase in the timesteps.

Preliminary Results

I ran an experiment where a noisy sin function was to be stabilised. The noise came from a another time dependent function with 4 unique levels. The environment is defined in env_definition.py. This can be interpreted as a function filtering from a convolution. The file experiment_sin_stability.py gives the continuous actor critic learning algorithm we used here. The following results were obtained, Training_status

The hits+partial hits were the percentage of points in the domain where the algorithm brought the noisy curve within [-0.1, +0.1] deviation of the actual value after correction. A maximum hit of 75.5% was observed. prelim_results

The original curve is the one observed with convolution, the corrected one if the one which the agent gives out after correction.

adaptivesystems's People

Contributors

dependabot[bot] avatar shaktikshri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.