partition-hrl's People
partition-hrl's Issues
what do we do at the abstract level when all options have the same value ?
Adapt what we did with the tree:
-> select with higher probability the nodes which lead to more nodes.
jonctions between options
When an action leads to another element of the partition of states, add an extra term to the reward given to the option. This extra term is the maximum over all options of the value of the state where the option ended up.
bug with protocol 2 and 3
The following code in __main__.py
causes a bug when running protocols 2 and 3.
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
tf.enable_eager_execution()
# todo fix this The name tf.enable_eager_execution is deprecated. Please use tf.compat.v1.enable_eager_execution instead
# Just to be sure that we don't have some others graph loaded
tf.reset_default_graph()
# todo fix this: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
How can we execute it only for protocol 4 (when agent = agent_a2c) ?
SIL
Remember this idea:
Instead of updating by selecting randomly among all the trajectories in the buffer, you can make a convex update only with the best trajectory.
In any case, a good trajectory is a trajectory that makes a good transition. This is checked in the code with the condition obs_equal(self.terminal_state, o_r_d_i[0]["agent"])
(see file agent_a2c, class option, function compute_total_reward
.)
Maybe we should only consider in the buffer this subset of trajectories.
implemente ShowRender key_press function and render function of wrapper
The wrapper for gym-minigrid does not have a render function for the moment, so the basic render function is called. I need to include the agent and option view when pressing the right key.
I made a class ShowRenderMinigrid() with noting in key_press -> todo: implement this when the render function in wrapper is done
wrapper for function `step`
keep me updated when you have implemented the new step
function that returns a vector with two entries:
- A low level representation for the option
- Your abstract text based representation.
You can make a wrapper for that, add a new key in protocol 7 (for instance a boolean names "text_based_abstraction"), and apply the new wrapper to the environment in main.py if "text_based_abstraction is True.
bug with protocols 1 & 2
function utils.obs_equal
raises errors because I feed them with a None
input.
This is certainly due to the initialization of policies (no state needed at the beginning). Let's see how we can improve that
check obs_a2c_stacked_frames_from_cluster please
Hello Lorenzo,
Can you check if you are happy with the options' observation returned by function
get_option_obs
or if you prefer the one of parent class ?
I let you some comments in the file.
Thanks !
question for Lorenzo
Is it normal that the shape of value below is : value = [[number]] ?
I have tu return value[0][0]
to get a number and not a list...
def get_value(self, state):
value = self.main_model_nn.prediction_critic([state])
return value[0][0]
(see file agent_a2c.py)
thanks !
new protocol to run experiments on the cluster (remove display)
Make a new protocol to remove the state display so that it can be run in the cluster.
downsampling
ABSTRACT LEVEL (HIGH LEVEL)
Problem: Make a good downsampling so that the observation is different when the agent is in a new abstract state
Solution: make a new wrapper:
- Make a gray scaling with (for instance) 100 different values. Find the right number of different values...
- Take the average of colors in every region
make a simple baseline
Make a manager and an option in the tabular case (for example both can follow Q Learning strategy), running on gridenvs (or any other gridworld environment).
IW
implement it in agent.py (class Policy)
see function _update_states
Run experiments in the cluster
Run some experiment, try to tune the hyper parameters, for instance: the reward/penalties, the learning rate, size of batches etc.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.