aig-upf / partition-hrl Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 283 KB

Python 99.30% Shell 0.70%

partition-hrl's People

Contributors

Stargazers

Watchers

partition-hrl's Issues

what do we do at the abstract level when all options have the same value ?

Adapt what we did with the tree:
-> select with higher probability the nodes which lead to more nodes.

jonctions between options

When an action leads to another element of the partition of states, add an extra term to the reward given to the option. This extra term is the maximum over all options of the value of the state where the option ended up.

bug with protocol 2 and 3

The following code in __main__.py causes a bug when running protocols 2 and 3.

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"  # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

tf.enable_eager_execution()
# todo fix this The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead


# Just to be sure that we don't have some others graph loaded
tf.reset_default_graph()
# todo fix this:  The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

How can we execute it only for protocol 4 (when agent = agent_a2c) ?

SIL

Remember this idea:
Instead of updating by selecting randomly among all the trajectories in the buffer, you can make a convex update only with the best trajectory.

In any case, a good trajectory is a trajectory that makes a good transition. This is checked in the code with the condition obs_equal(self.terminal_state, o_r_d_i[0]["agent"]) (see file agent_a2c, class option, function compute_total_reward.)

Maybe we should only consider in the buffer this subset of trajectories.

implemente ShowRender key_press function and render function of wrapper

The wrapper for gym-minigrid does not have a render function for the moment, so the basic render function is called. I need to include the agent and option view when pressing the right key.

I made a class ShowRenderMinigrid() with noting in key_press -> todo: implement this when the render function in wrapper is done

wrapper for function `step`

keep me updated when you have implemented the new step function that returns a vector with two entries:

A low level representation for the option
Your abstract text based representation.

You can make a wrapper for that, add a new key in protocol 7 (for instance a boolean names "text_based_abstraction"), and apply the new wrapper to the environment in main.py if "text_based_abstraction is True.

bug with protocols 1 & 2

function utils.obs_equal raises errors because I feed them with a None input.
This is certainly due to the initialization of policies (no state needed at the beginning). Let's see how we can improve that

check obs_a2c_stacked_frames_from_cluster please

Hello Lorenzo,
Can you check if you are happy with the options' observation returned by function
get_option_obs or if you prefer the one of parent class ?
I let you some comments in the file.
Thanks !

question for Lorenzo

Is it normal that the shape of value below is : value = [[number]] ?
I have tu return value[0][0] to get a number and not a list...

    def get_value(self, state):

        value = self.main_model_nn.prediction_critic([state])
        return value[0][0]

(see file agent_a2c.py)
thanks !

new protocol to run experiments on the cluster (remove display)

Make a new protocol to remove the state display so that it can be run in the cluster.

downsampling

ABSTRACT LEVEL (HIGH LEVEL)
Problem: Make a good downsampling so that the observation is different when the agent is in a new abstract state

Solution: make a new wrapper:

Make a gray scaling with (for instance) 100 different values. Find the right number of different values...
Take the average of colors in every region

aig-upf / partition-hrl Goto Github PK

partition-hrl's People

Contributors

Stargazers

Watchers

partition-hrl's Issues

Recommend Projects

Recommend Topics

Recommend Org