Git Product home page Git Product logo

Comments (11)

vitchyr avatar vitchyr commented on August 24, 2024 2

Yes, currently it's not implemented but your suggestions should work. After that just call train (and you'll probably want to change it so that you can pass in the epoch to start at). I think the annoying part will just be loading an existing progress.csv rather than overriding it (assume that's what you want to do)

from rlkit.

dwiel avatar dwiel commented on August 24, 2024 2

@yusukeurakami you ever figure this out?

from rlkit.

yusukeurakami avatar yusukeurakami commented on August 24, 2024

@vitchyr
I have a question related to training resume. I am trying to load the pre-trained SAC policy and re-train it. I loaded all five network (qf1, qf2, target_qf1, target_qf2, policy), and start training but policy_loss and qf_loss exploded (e+8~e+10).
I thought it's because I forgot to save the "log_alpha" so I save/load it and re-start the training but still not working. It there any thought about this phenomenon?
When I turn off the Auto Entropy Tuning function, I can resume with no problem.

from rlkit.

vitchyr avatar vitchyr commented on August 24, 2024

Hmmm, it's a bit hard to say without knowing more. I imagine if you look at the values of alpha itself, it blows up rather quickly. There can only really be three reasons, since the alpha loss depends on three quantities:

  1. log_pi. Are you loading highly off-policy data? Perhaps the log-likelihood of some old action is extremely low given the loaded policy.
  2. self.target_entropy. Any chance the target entropy is something really wonky? For example, if you're doing discrete actions, then it should be positive.

from rlkit.

richardrl avatar richardrl commented on August 24, 2024

@yusukeurakami If you're using ADAM, you'll also want to reload the optimizer parameters or set the learning rate lower.

Is the Q function output exploding?

from rlkit.

yusukeurakami avatar yusukeurakami commented on August 24, 2024

@vitchyr @richardrl Thank you for your reply.
I'm now re-running the experiment to get those values.

from rlkit.

yusukeurakami avatar yusukeurakami commented on August 24, 2024

@vitchyr @richardrl
This is how actually the reward and the alpha looks like when I restart the training. As you see, alpha blows up immediately after when I resume the training and reward goes down and never comes back. qf1_loss and qf2_loss get crazy as well.

image

image

image

  1. log_pi : I am doing domain randomization on my original environment, however, the environments that used when I resumed training is from the same distribution as last the first training. So, it shouldn't be that off-policy. I even saved and load the entire replay buffer and resume. It looks fine for 10 updates or so, but it went crazy laster as well.

  2. self.target_entropy : My environment is continuous and there are 6 actuator. Self.target_entropy is -6 at any time. Does it sound right?

  3. optimizer : Yes. I am using Adam, but I've already tried reloading the optimizer parameters. It mitigates the symptom a little but ended up in the same phenomenon.

  4. Q-function output : As you see in the graph above, since qf1_loss is going crazy, I think that the qf values are also exploding.

F.Y.I.

  1. I am saving the model with torch.save instead of pickle as following. I think this won't change any thing but just in case I am reporting it.
snapshot = self._get_snapshot()
torch.save(snapshot, save/path)
  1. My other hyperparameter as follow.
    if args.algo == 'sac':
        algorithm = "SAC"
    variant = dict(
    algorithm=algorithm,
    version="normal",
    layer_size=100,
    replay_buffer_size=int(1E6),
    algorithm_kwargs=dict(
        num_epochs=6000,
        num_eval_steps_per_epoch=512, #512
        num_trains_per_train_loop=1000, #1000
        num_expl_steps_per_train_loop=512, #512
        min_num_steps_before_training=512, #1000
        max_path_length=512, #512
        batch_size=128,
        ),
    trainer_kwargs=dict(
        discount=0.99,
        soft_target_tau=5e-3,
        target_update_period=1,
        policy_lr=1E-3,
        qf_lr=1E-3,
        reward_scale=0.1,
        use_automatic_entropy_tuning=True,
        ),
    )
  1. I've changed some parameters in batch_rl_algorithm.py.
            num_train_loops_per_epoch=10,
            min_num_steps_before_training=0,

It will be great if you have any advice.

from rlkit.

vitchyr avatar vitchyr commented on August 24, 2024

from rlkit.

yusukeurakami avatar yusukeurakami commented on August 24, 2024

It looks like you set use_automatic_entropy_tuning=False in the original settings, but somehow the entropy tuning is set to True when you load it. How are you resuming training? Are you sure you're using the same hyperparameters for the restarted SAC?

Sorry I've pasted the wrong hyperparameters... I've edited on above but it was like follows. I am sure that I haven't changed my hyperparameters when I resumed the training.

        reward_scale=0.1,
        use_automatic_entropy_tuning=True,

from rlkit.

vitchyr avatar vitchyr commented on August 24, 2024

from rlkit.

nanbaima avatar nanbaima commented on August 24, 2024

Does anyone knows if it would also be possible to resume an env and the pre-trained dataset from this env (with the same size state), but instead of using the previous reward, simply change it and continues to train the data set starting from the previous one (in other words: resume training but with new rewards to be added)?

from rlkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.