toni-sm / skrl Goto Github PK

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab

Home Page: https://skrl.readthedocs.io/

License: MIT License

Python 100.00%

reinforcement-learning python openai-gym isaac-gym pytorch deep-learning deepmind gym isaac-sim nvidia-omniverse

skrl's Introduction

skrl's People

Contributors

Stargazers

Watchers

skrl's Issues

[Error] PPO on a single environment

Sorry, this might be a trivial question

I am trying the recurrent PPO examples with my own single environment. In any case I will get this size error.

File "ppo_lstm", line 66, in compute
rnn_input = states.view(-1, self.sequence_length, states.shape[-1])  # (N, L, Hin): N=batch_size, L=sequence_length
RuntimeError: shape '[-1, 128, 23]' is invalid for input of size 736

23 is the observation size * 32 mini-batch size = 736

I registered the environment to vectorize it, setting the num_workers to 1, but I am not 100% sure if it worked. Making gym.vector.make() at least didn't give an error:

gym.envs.registration.register(id="PendulumNoVel-v1", entry_point="Custom_Working_Single_File_Env_With_Step_And_Reset:CustomEnv")
env = gym.vector.make("PendulumNoVel-v1", num_envs=1, asynchronous=False)
env = wrap_env(env)

I also tried without vectorizing, but it also didn`t work.
SAC is working fine of course without vectorizing.

Basic information

skrl version: 1.0.0-rc1
Python version: 3.8
OS: Win11
Torch + gym

Is there a way to automatically save the best result along with the checkpoints?

Hi @Toni-SM ,

Above you can see the training result of FrankaReach task. As you can see the reward drops after a certain step. But as the best result from the beginning is not automatically saved, one has to check the tensorboard and select the appropriate checkpoint from the logs. Is there a way to automatically save the best result along with the checkpoints like in ISAAC GYM?

[Question] Is there support for MARL?

As per title: is there support for multi-agent learning environments, such as PettingZoo for instance?

(docs) Add example with image input e.g. OpenAI Atari and a CNN policy

Would be very useful with an example with image input e.g. OpenAI Atari and a CNN policy.

I notice that MAPPO is supportted in 0.11.0

Hi, I notice that MAPPO is supportted in 0.11.0, and I'm really eager for using MAPPO algorithm in NVIDIA Isaac Sim, but this work may have not shown for us. Could you please tell me what time may I use MAPPO? In addition, If I create MAPPO class in skrl.agents.torch by myself, is it possible to work?
Very thanks!

Example Pendulum-v0 deprecated for gym 0.21.0+

Running the first OpenAI Pendulum-v0 example from the documentation gives a deprecation error.
https://skrl.readthedocs.io/en/latest/intro/examples.html

Error
gym.error.DeprecatedEnv: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])

It looks like it has been renamed to Pendulum-v1 in gym v0.21.0.
openai/gym@d199778#diff-4fc33321bcd3c321db321c28fee8b7ae2b0101d0e24c2d5d4d911ae647061110

Workaround for now, change to Pendulum-v1 in line 49 of the example: env = gym.make("Pendulum-v0") if gym v0.21.0 or later.

Tested on:

skrl 0.1.0
gym 0.21.0
isaacgym 1.0rc3
python 3.7.12

Loading the .pth file trained in the Isaac Gym environment

Hi,

I really like this feature in skrl which enables simultaneous deploying of agents (https://skrl.readthedocs.io/en/latest/intro/examples.html). However, it would be great if skrl for example can also load the .pth file trained with Isaac Gym Library(https://github.com/NVIDIA-Omniverse/IsaacGymEnvs) for benchmark purposes. Currently, one gets the following errors:


  File "ppo_ant.py", line 135, in <module>
    models_ppo["policy"].load("./ant_best.pth")
  File "/home/user/skrl/skrl/models/torch/base.py", line 302, in load
    self.load_state_dict(torch.load(path))
  File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Policy:
	Missing key(s) in state_dict: "log_std_parameter", "net.0.weight", "net.0.bias", "net.2.weight", "net.2.bias", "net.4.weight", "net.4.bias", "net.6.weight", "net.6.bias". 
	Unexpected key(s) in state_dict: "running_mean_std", "reward_mean_std", "model", "epoch", "optimizer", "frame", "last_mean_rewards", "env_state".

Looking forward to the example in Isaac Gym

I look forward to seeing usage examples ported to Isaac Gym. Are there any plans to update the tutorial documentation in the near future?

Environment reset function

Hi, I am looking into using skrl+isaacgym as future research tools. Many thanks to the authors for providing such a quality library.

I am a bit confused by the implementation of IssacGymPreview4Wrapper and the Trainers here, the following are the reset function of the wrapper and its usage in the trainer:

def reset(self) -> Tuple[torch.Tensor, Any]:
    """Reset the environment

    :return: Observation, info
    :rtype: torch.Tensor and any other info
    """
    if self._reset_once:
        self._obs_dict = self._env.reset()
        self._reset_once = False
    return self._obs_dict["obs"], {}

# reset environments
with torch.no_grad():
    if terminated.any() or truncated.any():
        states, infos = self.env.reset()
    else:
        states.copy_(next_states)

It seems that, when using multiple environments, one one of them terminates, and all of them will get reset? Or is there some mechanism on the isaac gym side that deal with this case, so that only the terminated ones get reset?

If I am correct (all of them get reset if one of them terminates), why design like this? Not many algorithm can take advantages of multi-environment, but the PPO implementations usually do not do this.

Thank you in advance for any explanation!

No module named omni.isaac.contrib_envs and omni.isaac.orbit_envs

Description

I am using the latest orbit with skrl1.1.0. And I am trying to run example codes provide in you docs(like torch_ant_ppo.py ), But I got No module named omni.isaac.contrib_envs. After
search the commit on orbit, I find that they have Renames Gym Envs related extensions to Tasks . So maybe we should also use this under load_isaac_orbit_env. Or even change the function name?

   # import orbit extensions
    import omni.isaac.contrib_tasks  # type: ignore
    import omni.isaac.orbit_tasks  # type: ignore
    from omni.isaac.orbit_tasks.utils import parse_env_cfg  # type: ignore

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch 2.0.1

Additional system information

3.10.14

Evaluating policy on real-world setup

Hello,

We have trained a policy that we would like to test on a real-world setup. Does SKRL have any built-in support for this, or do you have any recommended method of doing this?

-Anton

Media

web_viewer~9.mp4
https://user-images.githubusercontent.com/22400377/157323911-40729895-6175-48d2-85d7-c1b30fe0ee9c.mp4

reaching_franka.mp4
https://user-images.githubusercontent.com/22400377/190899202-6b80c48d-fc49-48e9-b277-24814d0adab1.mp4
reaching_franka_camera.mp4
https://user-images.githubusercontent.com/22400377/190899205-752f654e-9310-4696-a6b2-bfa57d5325f2.mp4
reaching_franka_training_omniverse_isaacgym.png
https://user-images.githubusercontent.com/22400377/190921341-6feb255a-04d4-4e51-bc7a-f939116dd02d.png
reaching_franka_omniverse_isaacgym.mp4 (slow)
https://user-images.githubusercontent.com/22400377/190926792-6e788eaf-1600-4b13-b8c8-e0e0a09e4827.mp4
reaching_franka_omniverse_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/211668430-7cd4668b-e79a-46a9-bdbc-3212388b6b6d.mp4
reaching_franka_training_isaacgym.png
https://user-images.githubusercontent.com/22400377/193546966-bcf966e6-98d8-4b41-bc15-bd7364a79381.png
reaching_franka_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/193537523-e0f0f8ad-2295-410c-ba9a-2a16c827a498.mp4

reaching_iiwa_python.mp4
https://user-images.githubusercontent.com/22400377/212192766-9698bfba-af27-41b8-8a11-17ed3d22c020.mp4
reaching_iiwa_ros_ros2.mp4
https://user-images.githubusercontent.com/22400377/212192817-12115478-e6a8-4502-b33f-b072664b1959.mp4
reaching_iiwa_training_omniverse_isaacgym.png
https://user-images.githubusercontent.com/22400377/212194442-f6588b98-38af-4f29-92a3-3c853a7e31f4.png
reaching_iiwa_omniverse_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/211668313-7bcbcd41-cde5-441e-abb4-82fff7616f06.mp4

reaching_franka_trained_checkpoints.zip
https://github.com/Toni-SM/skrl/files/9595293/trained_checkpoints.zip

reaching_iiwa_trained_checkpoints.zip
https://github.com/Toni-SM/skrl/files/10406561/trained_checkpoints.zip
reaching_iiwa_omniverse_isaacgym_simulation_files.zip
https://github.com/Toni-SM/skrl/files/10409551/simulation_files.zip

py36_linux_frankx.zip
py37_linux_frankx.zip
py38_linux_frankx.zip
py39_linux_frankx.zip

"pygame.error:display Surface quit" in train.eval under render_mode="human"

Change sarsa_gym_taxi.py or sarsa_gymnasium_taxi.py to reproduce the problems:

env = gym.make("Taxi-v3", render_mode="human") # set the render mode
cfg_trainer = {"timesteps": 100, "headless": True} # set a smaller timestep
trainer.eval() # add the eval method after trainer.train()

Expected outcome:
Show the environment under the train method and the eval method

Actual outcome:
The environment under the train method is ok. But, the eval method gets the following error:
Traceback (most recent call last):
File ".\sarsa_gym_taxi.py", line 82, in
trainer.eval()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\trainers\torch\sequential.py", line 145, in eval
self.single_agent_eval()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\trainers\torch\base.py", line 211, in single_agent_eval
states, infos = self.env.reset()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\envs\torch\wrappers.py", line 471, in reset
observation, info = self._env.reset()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\time_limit.py", line 68, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\order_enforcing.py", line 42, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\env_checker.py", line 47, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 277, in reset
self.render()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 290, in render
return self._render_gui(self.render_mode)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 366, in _render_gui
self.window.blit(self.background_img, cell)
pygame.error: display Surface quit

Comment:
Because display is ok if eval without train, I think that the problem might be related to 'pygame stop in train' and 'pygame start in eval'. Thanks for any advice.

[Bug] Indexing issue in memory sampling function

Hi @Toni-SM ,

First of all, thank you for this excellent, well-documented library!

I might be off here, but when playing with the sample method of the RandomMemory class, I've encountered the following PyTorch error:
IndexError: too many indices for tensor of dimension 2
raised in the sample_by_index method of the Memory class.

This error comes from trying to index self.tensors_view[name] with a list of tensors, i.e. batch when indexes is of type torch.Tensor. When indexes is a list, it works fine.

A quick fix is to return
[[self.tensors_view[name][[batch]] for name in names] for batch in batches]
instead of
[[self.tensors_view[name][batch] for name in names] for batch in batches]
when indexes is of type torch.Tensor.

I hope this issue doesn't come from my end (I apologize in advance if that's the case), as I may be using these methods wrong.

Error Running Orbit Example

(orbit2) kaito@comet:~/Documents/Expt/Orbit/Project_Code$ orbit -p ppo_lift_franka.py 
[INFO] Using python from: /home/kaito/mambaforge-pypy3/envs/orbit2/bin/python                                                                                                                                     
Traceback (most recent call last):
  File "ppo_lift_franka.py", line 5, in <module>
    from skrl.models.torch import Model, GaussianMixin, DeterministicMixin
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/skrl/models/torch/__init__.py", line 1, in <module>
    from skrl.models.torch.base import Model
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/skrl/models/torch/base.py", line 4, in <module>
    import gymnasium
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/__init__.py", line 12, in <module>
    from gymnasium.envs.registration import (
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/envs/__init__.py", line 382, in <module>
    load_plugin_envs()
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/envs/registration.py", line 565, in load_plugin_envs
    for plugin in metadata.entry_points(group=entry_point):
TypeError: entry_points() got an unexpected keyword argument 'group'

I've tried to run the example for Orbit using SKRL, I have created a new conda environment via orbit and this issue still persists...any fixes ???

A little bug on environment wrapper

Discussed in #70

^{Originally posted by 403forbiddennn April 19, 2023}
In the Isaac Gym wrapper class, the render method is inappropriately overridden by your wrapper and thus can not render successfully. For example, the render method of IsaacGymPreview3Wrapper is:

 def render(self, *args, **kwargs) -> None:
        """Render the environment
        """
        pass

which overrides the render() in VecTask.

Wall clock time in Isaac Gym benchmarks?

NVIDIA Isaac Gym

Environment	PPO
Allegro Hand ¹	3942.69
Ant	5466.3 +/- 279.61
Anymal	61.86 +/- 1.81
Anymal Terrain	19.82 +/- 0.57
Ball Balance	288.07 +/- 25.54
Cartpole ²	494.34 +/- 0.87
Franka Cabinet	3134.0 +/- 182.99
Humanoid	6474.34 +/- 696.27
Ingenuity	7066.82 +/- 488.97
Quadcopter	1237.75 +/- 127.05
Shadow Hand	7898.38 +/- 180.75

Environment	AMP
Humanoid	295.65 +/- 0.86

The following charts show the episode's mean length in timesteps (left) and the mean total reward (right)

Allegro Hand (PPO)

Ant (PPO)

Anymal (PPO)

Anymal Terrain (PPO)

Ball Balance (PPO)

Cartpole (PPO)

Franka Cabinet (PPO)

Humanoid (PPO)

Humanoid (AMP: imitate different pre-recorded human animations)

Ingenuity (PPO)

Quadcopter (PPO)

Shadow Hand (PPO)

Originally posted by @Toni-SM in #32 (comment)

Hi, I was looking through the benchmark results here (above) for Isaac Gym and was wondering if you could provide also wall clock time in them or if you have the info about how long it took to train each of them? Since for Isaac Gym training that is the critical variable for me to understand the performance due to the ability to vary the number of environments. Thanks :)

How to update the actions only once during the whole episode

Hi,

thank you for this great open-sourced library. Currently, I am trying to use PPO from this library in conjunction with ISAAC GYM.

More specifically, I am trying to find out a way to update the actions only once during the whole episode, which means one action from the action buffer should be sampled at the beginning of the episode and should remain constant until the episode ends. Is there a way for this?

Mean rewards are not calculated properly

Description

The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary
self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards)). Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is written
self.writer.add_scalar(k, np.mean(v), timestep) and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.

eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps

step1: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step2: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step3: Episode finishes with cumulative reward -30: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30]
step4: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30]
step5: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30]
step6: Episode finished with cumulative reward -4: step3: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17]
step7: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17]
step8: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -17]
step9 : Episode finished with reward -10: self._track_rewards = [-30, -4, -10] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -22]

At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT

SOLUTION: self._track_rewards.clear() after every time data is added to self.tracking_data["Reward / Total reward (mean)"]

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response

The demo of AnymalTerrain had bad result in skrl

Hi! Thanks for this wonderful library.
I found bad result with AnymalTerrain in skrl. The ppo of rl_games could get a useful policy after 1500 iterations(36000 timesteps), but for ppo in skrl(same hyper parameters, task file and timesteps), the result was so bad with low reward.
I'm quite confused for the result.

ModuleNotFoundError: No module named 'skrl.envs.wrappers'

Hello,

I tried running on of the examples of Isaac Gym and I receive this error. from skrl.envs.wrappers.torch import wrap_envModuleNotFoundError: No module named 'skrl.envs.wrappers'. I check the repo and it looks like wrap_env doesn't exist there anymore.

PPO discrete action for gym.Env

Hi @Toni-SM I'm new in RL
but , do u have any example of code for PPO discrete action space?
Thx

Retraining the policy on real-world setup

Hello,

I have seen your script "environment.py" in the below discussion which gives a rough baseline for evaluating the trained policy in the real world setup. I would like to ask whether there is a way to extend this script so that the trained policy can be retrained in the real world setup in order to minimize the existing sim2real gap.

Discussed in #10

^{Originally posted by AntonBock May 2, 2022}
Hello,

We have trained a policy that we would like to test on a real-world setup. Does SKRL have any built-in support for this, or do you have any recommended method of doing this?

-Anton

Failing to wrap Isaac Gym Preview 4 Environments if No Wrapper Type is Given

Description

When running the getting started tutorial:

import isaacgymenvs

# import the environment wrapper
from skrl.envs.wrappers.torch import wrap_env

# create/load the environment using the easy-to-use API from NVIDIA
env = isaacgymenvs.make(seed=0,
                        task="Cartpole",
                        num_envs=512,
                        sim_device="cuda:0",
                        rl_device="cuda:0",
                        graphics_device_id=0,
                        headless=False)

# wrap the environment
env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="isaacgym-preview4")'

env = wrap_env(env) does not work by default.

What skrl version are you using?

1.2.0

What ML framework/library version are you using?

PyTorch (2.3.1+cu121)

Additional system information

Python 3.8.19 on Linux

Is there examples about using multiple Inputs observations?

Hi @Toni-SM ,

Are there examples of using multiple Inputs observations?

For example, one input is an image and another input is a vector.

Documentation -> WrapperTag: "isaaclab" is wrong, write "isaac-orbit" instead

Description

In the Documentation at Wrapping-API both for pytorch and jax in the documentation file is indicated the wrapper tag: "isaaclab"

instead looking in the code it is used "isaac-orbit" (it works)

here a snap of the error occurred

What skrl version are you using?

1.1.0

What ML framework/library version are you using?

Pytorch Version: 2.2.2+cu118, Pip 24.0, conda 24.5.0, Isaaclab 4.0.0

Additional system information

No response

Random action only samples from the first action space dimension

Description

Random actions are done by taking the low and high values of the first dimension on the action space a,d then uniformly sampling from [low, high] for each dimension of an action.

   self._random_distribution = torch.distributions.uniform.Uniform(
                    low=torch.tensor(self.action_space.low[0], device=self.device, dtype=torch.float32),
                    high=torch.tensor(self.action_space.high[0], device=self.device, dtype=torch.float32))

The issue is that if i have the following action space for example gym.Box(low=[-5, -3], high=[5,3]) any sampled action[1] will be in [-5,5] instead of [-3,3]

SOLUTION IS:

self._random_distribution = torch.distributions.uniform.Uniform(
                  low=torch.tensor(self.action_space.low, device=self.device, dtype=torch.float32),
                  high=torch.tensor(self.action_space.high, device=self.device, dtype=torch.float32))

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response

Getting blank stderr while training with OIGE

Description

I get duplicate progress bars from stderr. This code used to work perfectly fine in the previous version of OIGE. I have upgraded to 2023.1.0 yesterday. I have been getting this behavior since then.

The training seems to work correctly though. No error and the policy is trained as expected.

I do get this before the training starts though. Not sure if this is related to the problem.

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

PyTorch Version: 2.0.1, OIGE 2023.1.0a0

Additional system information

Linux (Ubuntu 22.04)

How to implement the curriculum learning using the existing data

Hi,

I would like to implement the so-called curriculum learning using skrl, where I initialize the training with a pre-recorded data and gradually decrease the usage of this pre-recorded data.
The part that I do not understand is the way the code is structured. Taking the "FrankaCabinet" as an example:


agent = PPO(models=models_ppo,
            memory=memory, 
            cfg=cfg_ppo, 
            observation_space=env.observation_space, 
            action_space=env.action_space,
            device=device)

# Configure and instantiate the RL trainer
cfg_trainer = {"timesteps": 24000, "headless": True}
trainer = SequentialTrainer(cfg=cfg_trainer, env=env, agents=agent)

# start training
trainer.train()

Above code is used to initialize the agent and start the training. Assuming I have the pre-recorded joint trajectory of Franka arm as Numpy array, I would like to overwrite action (which is the output of the agent) with this Numpy array to guide the robot arm towards the desired behavior. However, in this way, the whole training would be messed up, as the provided action is actually crap. So, by simply overwriting the action values, the pre-recorded numpy array can not be appropriately used.

Do you have advice/tips for this case?

Rename isaacgym env loaders

The function load_isaacgym_env_preview3 works with IsaacGym Preview 3 and 4, well, mostly, see #20.
It will probably also (mostly) work with the upcoming versions given all the work NVIDIA put into separating the envs from the isaacgym package.
Renaming load_isaacgym_env_preview3 to something like load_isaacgym_env_preview3_4 or load_isaacgym_env_preview3_and_up.

Thus, I'd propose a new naming scheme which is more clear for users of Preview 4 (and possible other upcoming previews).

My proposal is renaming load_isaacgym_env_preview3 to load_isaacgym_env_preview and load_isaacgym_env_preview2 to load_isaacgym_env_preview_legacy.

Obviously, combining the code of both functions into one which can handle all current and future preview release would be the easiest solution, for end-users that is.

I'd be happy to implement those changes but wanted to discuss it first.

Cheers 🎉

PS: All this might change once IsaacGym is out of preview, or a future previews breaks everything again.

Support gym's new step API

gyms version 0.25.0 updates the step API.
step is now supposed to return terminated: bool and truncated: bool instead of is_done: bool.

A quick and dirty fix would be the following.
However, the release notes of gym (and a promised blog post to be released) mention, that done is not equal to termination.
As I'm not yet sure, how much of an impact that would be for skrl, I'm opening this issue.

PS: @Toni-SM, I tried to open a new card in the project, but I don't have the rights to do so.
I'd be honored if you would consider adding me to the project! :)

diff --git a/skrl/envs/torch/wrappers.py b/skrl/envs/torch/wrappers.py
index 62110e0..ffd6489 100644
--- a/skrl/envs/torch/wrappers.py
+++ b/skrl/envs/torch/wrappers.py
@@ -271,6 +271,11 @@ class GymWrapper(Wrapper):
         except Exception as e:
             print("[WARNING] Failed to check for a vectorized environment: {}".format(e))

+        if hasattr(self, "new_step_api"):
+            self._new_step_api = self._env.new_step_api
+        else:
+            self._new_step_api = False
+
     @property
     def state_space(self) -> gym.Space:
         """State space
@@ -359,13 +364,16 @@ class GymWrapper(Wrapper):
         :return: The state, the reward, the done flag, and the info
         :rtype: tuple of torch.Tensor and any other info
         """
-        observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
+        if self._new_step_api:
+            observation, reward, done, _, info = self._env.step(self._tensor_to_action(actions))
+        else:
+            observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
         # convert response to torch
         return self._observation_to_tensor(observation), \
                torch.tensor(reward, device=self.device, dtype=torch.float32).view(self.num_envs, -1), \
                torch.tensor(done, device=self.device, dtype=torch.bool).view(self.num_envs, -1), \
                info

     def reset(self) -> torch.Tensor:
         """Reset the environment

[Question] TD3_DEFAULT_CONFIG

Hi @Toni-SM

While utilizing TD3 agent on the pendulum-v1 env with the default config, I checked that "smooth_regularization_noise" is None in default config.

however, that cfg make this below error.

skrl/agents/torch/td3/td3.py in _update(self, timestep, timesteps)
393 # target policy smoothing
394 next_actions, _, _ = self.target_policy.act({"states": sampled_next_states, **rnn_policy}, role="target_policy")
--> 395 noises = torch.clamp(self._smooth_regularization_noise.sample(next_actions.shape),
396 min=-self._smooth_regularization_clip,
397 max=self._smooth_regularization_clip)

AttributeError: 'NoneType' object has no attribute 'sample'

I add the line cfg_td3["smooth_regularization_noise"] = GaussianNoise(mean=0, std=1) to avoid that error and it works.

Could you check this error and kindly let me know what would be a suitable default value for smooth_regularization_noise in TD3?

dm_control wrapper

Description

Hi there,

I have been using skrl with OIGE, but when I try the "Getting Started" code for dm_control :

# import the environment wrapper and the deepmind suite
from skrl.envs.wrappers.torch import wrap_env
from dm_control import suite

# load the environment
env = suite.load(domain_name="cartpole", task_name="swingup")

# wrap the environment
env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'

I get the following error:

[skrl:INFO] Environment class: dm_env._environment.Environment
INFO:skrl:Environment class: dm_env._environment.Environment
[skrl:INFO] Environment wrapper: DeepMind
INFO:skrl:Environment wrapper: DeepMind
Traceback (most recent call last):
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/base.py", line 24, in __init__
    self._action_space = self._env.single_action_space
AttributeError: 'Environment' object has no attribute 'single_action_space'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elle/code/devel/skrl_play/test123.py", line 9, in <module>
    env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/__init__.py", line 103, in wrap_env
    return DeepMindWrapper(env)
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/deepmind_envs.py", line 19, in __init__
    super().__init__(env)
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/base.py", line 27, in __init__
    self._action_space = self._env.action_space
AttributeError: 'Environment' object has no attribute 'action_space'. Did you mean: 'action_spec'?

This is with Python 3.10.14, dm_control==1.0.20 and skrl==1.1.0. It seems that in the inheritance of DeepMindWrapper from in the base Wrapper class, self._env.action_space is being called before self._spec_to_space can happen.

Also just as a heads up, when I try with skrl==1.2.0, I now get this:

Traceback (most recent call last):
  File "/home/elle/code/devel/skrl_play/test123.py", line 9, in <module>
    env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/__init__.py", line 154, in wrap_env
    raise ValueError(f"Unknown wrapper type: {wrapper}")
ValueError: Unknown wrapper type: ['dm_env._environment.Environment']

Thank you!

What skrl version are you using?

1.1.0

What ML framework/library version are you using?

torch==2.3.1

Additional system information

Python 3.10.14

A wrong report in base.py

There seems to be a bug when I try to run the program in the example.

Traceback (most recent call last):
  File "/home/cwj/my_project/RoboticLab/my_imply/skrl_test/test.py", line 97, in <module>
    device=device)
  File "/home/cwj/my_project/RoboticLab/skrl/skrl/agents/torch/ppo/ppo.py", line 128, in __init__
    self.memory.create_tensor(name="states", size=self.observation_space, dtype=torch.float32)
  File "/home/cwj/my_project/RoboticLab/skrl/skrl/memories/torch/base.py", line 105, in create_tensor
    tensor.fill_(torch.nan)
AttributeError: module 'torch' has no attribute 'nan'

In pytorch 1.9 or lower, there seems no torch.nan, replace it as math.nan can solve the problem.

In skrl/memories/torch/base.py line 105.

tensor.fill_(math.nan)

toni-sm / skrl Goto Github PK

skrl's Introduction

skrl's People

Contributors

Stargazers

Watchers

Forkers

skrl's Issues

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Discussed in #70

NVIDIA Isaac Gym

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Discussed in #10

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Description

What skrl version are you using?

What ML framework/library version are you using?

Additional system information

Recommend Projects

Recommend Topics

Recommend Org