lucasalegre / mbcd Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection"
License: MIT License
Code for the paper "Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection"
License: MIT License
Hi,
as you can see from the screenshot I linked, there could be a problem in the calculation of the log probability in the function def get_logprob2(self, x, means, variances):
at line 65 in mbcd.py
file.
The log probability goes above zero consistently after some thousand steps. Should it be in the interval (-inf and 0] ?
As you can see from the debugger the log_prob is 0.92 and the prob is 2.52.
Thank you very much for your time.
Hi,
when executing the code there are some imports that are not found in the file mbcd/models/bnn.py.
16 from drl_cd.models.utils import get_required_argument, TensorStandardScaler
17 from drl_cd.models.fc import FC
18
19 from drl_cd.utils.logger import Progress, Silent
drl_cd
doesn't exist. I suppose it has to be substituted by mbcd
. Is this right?
Furthermore, in the file experiments/mbcd_run.py at line
70 model.deepRLCD.save_current()
71 model.deepRLCD.save_models()
deepRLCD
can't be found and the execution stops.
Traceback (most recent call last):
File "/home/valerio/PycharmProjects/mbcd/experiments/mbcd_run.py", line 95, in <module>
main(config)
File "/home/valerio/PycharmProjects/mbcd/experiments/mbcd_run.py", line 71, in main
model.deepRLCD.save_current()
AttributeError: 'SAC' object has no attribute 'deepRLCD'
Hi,
I have some doubts on the likelihood estimation performed by the function def get_logprob2(self, x, means, variances):
at line 77 in mbcd.py
file.
From the comment at line 76 the variable log_prob
should have [num_networks, batch_size]
size but the result of line 77 is a scalar. The following lines(80) assume that the result is a matrix,
76 ## [ num_networks, batch_size ]
77 log_prob = -1/2 * (k*np.log(2*np.pi) + np.log(variance).sum(-1) + (np.power(x-mean, 2)/variance).sum(-1)) # [1,]
78
79 ## [ batch_size ]
80 prob = np.exp(log_prob).sum(axis=0)
81
82 ## [ batch_size ]
83 log_prob = np.log(prob + 1e-8) # Avoid log of zero
Is that a problem?
Furthermore, I want to ask you some clarification about the var_mean
formulas at line 85 and 86.
In the code there are two formulas which leads to different results.
85 var_mean = np.var(means, axis=0).max(axis=-1)
86 var_mean = np.linalg.norm(np.std(means, axis=0), axis=-1)
Which of the two is the right one to use?
Thank you in advance
Hi,
I'm trying to get deterministic results once I fix a seed.
At the moment I'm passing the seed argument when execution starts and I fixed the 'n_cpu_tf_sess' variable to 1 as suggested in the code comments.
Doing so, the result is deterministic until policy learning starts. So from timestep 0 to 'learning_starts' timesteps, the result is repeatable. When policy learning starts after the initial exploring phase the results are non longer deterministic.
Is there anything else I need to set up in order to get deterministic results?
Thank you for your time,
Valerio
Hi,
I have a doubt about the following method.
Line 637 in 4be85b9
self.rollout_schedule
attribute.649
we initialize for the first time a new attribute named self._next_idx
. The attribute is not use anywhere else and so its value is not used. Maybe the line was intended to be self.replay_buffer._next_idx = len(self.replay_buffer)
?
Thank you in advance for your time
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.