lucasalegre / mbcd Goto Github PK

Code for the paper "Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection"

License: MIT License

Python 100.00%

mbcd's People

Contributors

Stargazers

Watchers

Forkers

valerio-colombo phoenixera estherderman lsb829

mbcd's Issues

Out of bound log prob

Hi,
as you can see from the screenshot I linked, there could be a problem in the calculation of the log probability in the function def get_logprob2(self, x, means, variances): at line 65 in mbcd.py file.
The log probability goes above zero consistently after some thousand steps. Should it be in the interval (-inf and 0] ?
As you can see from the debugger the log_prob is 0.92 and the prob is 2.52.
Thank you very much for your time.

Errors on imports

Hi,
when executing the code there are some imports that are not found in the file mbcd/models/bnn.py.

16 from drl_cd.models.utils import get_required_argument, TensorStandardScaler
17 from drl_cd.models.fc import FC
18
19 from drl_cd.utils.logger import Progress, Silent

drl_cd doesn't exist. I suppose it has to be substituted by mbcd. Is this right?

Furthermore, in the file experiments/mbcd_run.py at line

70 model.deepRLCD.save_current()
71 model.deepRLCD.save_models()

deepRLCD can't be found and the execution stops.

Traceback (most recent call last):
  File "/home/valerio/PycharmProjects/mbcd/experiments/mbcd_run.py", line 95, in <module>
    main(config)
  File "/home/valerio/PycharmProjects/mbcd/experiments/mbcd_run.py", line 71, in main
    model.deepRLCD.save_current()
AttributeError: 'SAC' object has no attribute 'deepRLCD'

Log-likelihood doubt

Hi,
I have some doubts on the likelihood estimation performed by the function def get_logprob2(self, x, means, variances): at line 77 in mbcd.py file.
From the comment at line 76 the variable log_prob should have [num_networks, batch_size] size but the result of line 77 is a scalar. The following lines(80) assume that the result is a matrix,

76    ## [ num_networks, batch_size ]
77    log_prob = -1/2 * (k*np.log(2*np.pi) + np.log(variance).sum(-1) + (np.power(x-mean, 2)/variance).sum(-1))  # [1,]
78
79    ## [ batch_size ]
80    prob = np.exp(log_prob).sum(axis=0)
81
82    ## [ batch_size ]
83    log_prob = np.log(prob + 1e-8)  # Avoid log of zero

Is that a problem?

Furthermore, I want to ask you some clarification about the var_mean formulas at line 85 and 86.
In the code there are two formulas which leads to different results.

85    var_mean = np.var(means, axis=0).max(axis=-1)
86    var_mean = np.linalg.norm(np.std(means, axis=0), axis=-1)

Which of the two is the right one to use?
Thank you in advance

Reproducibility issues

Hi,
I'm trying to get deterministic results once I fix a seed.
At the moment I'm passing the seed argument when execution starts and I fixed the 'n_cpu_tf_sess' variable to 1 as suggested in the code comments.
Doing so, the result is deterministic until policy learning starts. So from timestep 0 to 'learning_starts' timesteps, the result is repeatable. When policy learning starts after the initial exploring phase the results are non longer deterministic.
Is there anything else I need to set up in order to get deterministic results?
Thank you for your time,

Valerio

Doubt on rollout length calculation

Hi,
I have a doubt about the following method.

mbcd/mbcd/sac_mbcd.py

Line 637 in 4be85b9

def set_rollout_length(self):

If I'm not mistaken, this method should set the rollout length based on self.rollout_schedule attribute.
I don't understand why at line 649 we initialize for the first time a new attribute named self._next_idx. The attribute is not use anywhere else and so its value is not used. Maybe the line was intended to be self.replay_buffer._next_idx = len(self.replay_buffer)?

Thank you in advance for your time

lucasalegre / mbcd Goto Github PK

mbcd's People

Contributors

Stargazers

Watchers

Forkers

mbcd's Issues

Out of bound log prob

Errors on imports

Log-likelihood doubt

Reproducibility issues

Doubt on rollout length calculation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent