Git Product home page Git Product logo

partition-hrl's People

Contributors

lorenzosteccanella avatar

Stargazers

 avatar

Watchers

 avatar  avatar

partition-hrl's Issues

jonctions between options

When an action leads to another element of the partition of states, add an extra term to the reward given to the option. This extra term is the maximum over all options of the value of the state where the option ended up.

bug with protocol 2 and 3

The following code in __main__.py causes a bug when running protocols 2 and 3.

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"  # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

tf.enable_eager_execution()
# todo fix this The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead


# Just to be sure that we don't have some others graph loaded
tf.reset_default_graph()
# todo fix this:  The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

How can we execute it only for protocol 4 (when agent = agent_a2c) ?

SIL

Remember this idea:
Instead of updating by selecting randomly among all the trajectories in the buffer, you can make a convex update only with the best trajectory.

In any case, a good trajectory is a trajectory that makes a good transition. This is checked in the code with the condition obs_equal(self.terminal_state, o_r_d_i[0]["agent"]) (see file agent_a2c, class option, function compute_total_reward.)

Maybe we should only consider in the buffer this subset of trajectories.

implemente ShowRender key_press function and render function of wrapper

The wrapper for gym-minigrid does not have a render function for the moment, so the basic render function is called. I need to include the agent and option view when pressing the right key.

I made a class ShowRenderMinigrid() with noting in key_press -> todo: implement this when the render function in wrapper is done

wrapper for function `step`

keep me updated when you have implemented the new step function that returns a vector with two entries:

  1. A low level representation for the option
  2. Your abstract text based representation.

You can make a wrapper for that, add a new key in protocol 7 (for instance a boolean names "text_based_abstraction"), and apply the new wrapper to the environment in main.py if "text_based_abstraction is True.

bug with protocols 1 & 2

function utils.obs_equal raises errors because I feed them with a None input.
This is certainly due to the initialization of policies (no state needed at the beginning). Let's see how we can improve that

check obs_a2c_stacked_frames_from_cluster please

Hello Lorenzo,
Can you check if you are happy with the options' observation returned by function
get_option_obs or if you prefer the one of parent class ?
I let you some comments in the file.
Thanks !

question for Lorenzo

Is it normal that the shape of value below is : value = [[number]] ?
I have tu return value[0][0] to get a number and not a list...

    def get_value(self, state):

        value = self.main_model_nn.prediction_critic([state])
        return value[0][0]

(see file agent_a2c.py)
thanks !

downsampling

ABSTRACT LEVEL (HIGH LEVEL)
Problem: Make a good downsampling so that the observation is different when the agent is in a new abstract state

Solution: make a new wrapper:

  1. Make a gray scaling with (for instance) 100 different values. Find the right number of different values...
  2. Take the average of colors in every region

make a simple baseline

Make a manager and an option in the tabular case (for example both can follow Q Learning strategy), running on gridenvs (or any other gridworld environment).

IW

implement it in agent.py (class Policy)
see function _update_states

Run experiments in the cluster

Run some experiment, try to tune the hyper parameters, for instance: the reward/penalties, the learning rate, size of batches etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.