Git Product home page Git Product logo

skrl's Introduction

profile

skrl's People

Contributors

alessandroassirelli98 avatar juhannc avatar lopatovsky avatar simonbogh avatar toni-sm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skrl's Issues

[Error] PPO on a single environment

Sorry, this might be a trivial question

I am trying the recurrent PPO examples with my own single environment. In any case I will get this size error.

File "ppo_lstm", line 66, in compute
rnn_input = states.view(-1, self.sequence_length, states.shape[-1])  # (N, L, Hin): N=batch_size, L=sequence_length
RuntimeError: shape '[-1, 128, 23]' is invalid for input of size 736

23 is the observation size * 32 mini-batch size = 736

I registered the environment to vectorize it, setting the num_workers to 1, but I am not 100% sure if it worked. Making gym.vector.make() at least didn't give an error:

gym.envs.registration.register(id="PendulumNoVel-v1", entry_point="Custom_Working_Single_File_Env_With_Step_And_Reset:CustomEnv")
env = gym.vector.make("PendulumNoVel-v1", num_envs=1, asynchronous=False)
env = wrap_env(env)

I also tried without vectorizing, but it also didn`t work.
SAC is working fine of course without vectorizing.

Basic information

  • skrl version: 1.0.0-rc1
  • Python version: 3.8
  • OS: Win11
  • Torch + gym

Is there a way to automatically save the best result along with the checkpoints?

Hi @Toni-SM ,

Screenshot from 2022-05-24 10-36-00

Above you can see the training result of FrankaReach task. As you can see the reward drops after a certain step. But as the best result from the beginning is not automatically saved, one has to check the tensorboard and select the appropriate checkpoint from the logs. Is there a way to automatically save the best result along with the checkpoints like in ISAAC GYM?

I notice that MAPPO is supportted in 0.11.0

Hi, I notice that MAPPO is supportted in 0.11.0, and I'm really eager for using MAPPO algorithm in NVIDIA Isaac Sim, but this work may have not shown for us. Could you please tell me what time may I use MAPPO? In addition, If I create MAPPO class in skrl.agents.torch by myself, is it possible to work?
Very thanks!

Example Pendulum-v0 deprecated for gym 0.21.0+

Running the first OpenAI Pendulum-v0 example from the documentation gives a deprecation error.
https://skrl.readthedocs.io/en/latest/intro/examples.html

Error
gym.error.DeprecatedEnv: Env Pendulum-v0 not found (valid versions include ['Pendulum-v1'])

It looks like it has been renamed to Pendulum-v1 in gym v0.21.0.
openai/gym@d199778#diff-4fc33321bcd3c321db321c28fee8b7ae2b0101d0e24c2d5d4d911ae647061110

Workaround for now, change to Pendulum-v1 in line 49 of the example: env = gym.make("Pendulum-v0") if gym v0.21.0 or later.

Tested on:

  • skrl 0.1.0
  • gym 0.21.0
  • isaacgym 1.0rc3
  • python 3.7.12

Loading the .pth file trained in the Isaac Gym environment

Hi,

I really like this feature in skrl which enables simultaneous deploying of agents (https://skrl.readthedocs.io/en/latest/intro/examples.html). However, it would be great if skrl for example can also load the .pth file trained with Isaac Gym Library(https://github.com/NVIDIA-Omniverse/IsaacGymEnvs) for benchmark purposes. Currently, one gets the following errors:


  File "ppo_ant.py", line 135, in <module>
    models_ppo["policy"].load("./ant_best.pth")
  File "/home/user/skrl/skrl/models/torch/base.py", line 302, in load
    self.load_state_dict(torch.load(path))
  File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Policy:
	Missing key(s) in state_dict: "log_std_parameter", "net.0.weight", "net.0.bias", "net.2.weight", "net.2.bias", "net.4.weight", "net.4.bias", "net.6.weight", "net.6.bias". 
	Unexpected key(s) in state_dict: "running_mean_std", "reward_mean_std", "model", "epoch", "optimizer", "frame", "last_mean_rewards", "env_state". 

Environment reset function

Hi, I am looking into using skrl+isaacgym as future research tools. Many thanks to the authors for providing such a quality library.

I am a bit confused by the implementation of IssacGymPreview4Wrapper and the Trainers here, the following are the reset function of the wrapper and its usage in the trainer:

def reset(self) -> Tuple[torch.Tensor, Any]:
    """Reset the environment

    :return: Observation, info
    :rtype: torch.Tensor and any other info
    """
    if self._reset_once:
        self._obs_dict = self._env.reset()
        self._reset_once = False
    return self._obs_dict["obs"], {}
# reset environments
with torch.no_grad():
    if terminated.any() or truncated.any():
        states, infos = self.env.reset()
    else:
        states.copy_(next_states)

It seems that, when using multiple environments, one one of them terminates, and all of them will get reset? Or is there some mechanism on the isaac gym side that deal with this case, so that only the terminated ones get reset?

If I am correct (all of them get reset if one of them terminates), why design like this? Not many algorithm can take advantages of multi-environment, but the PPO implementations usually do not do this.

Thank you in advance for any explanation!

No module named omni.isaac.contrib_envs and omni.isaac.orbit_envs

Description

I am using the latest orbit with skrl1.1.0. And I am trying to run example codes provide in you docs(like torch_ant_ppo.py ), But I got No module named omni.isaac.contrib_envs. After
search the commit on orbit, I find that they have Renames Gym Envs related extensions to Tasks . So maybe we should also use this under load_isaac_orbit_env. Or even change the function name?

   # import orbit extensions
    import omni.isaac.contrib_tasks  # type: ignore
    import omni.isaac.orbit_tasks  # type: ignore
    from omni.isaac.orbit_tasks.utils import parse_env_cfg  # type: ignore

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch 2.0.1

Additional system information

3.10.14

Evaluating policy on real-world setup

Hello,

We have trained a policy that we would like to test on a real-world setup. Does SKRL have any built-in support for this, or do you have any recommended method of doing this?

-Anton

Media

web_viewer~9.mp4
https://user-images.githubusercontent.com/22400377/157323911-40729895-6175-48d2-85d7-c1b30fe0ee9c.mp4

reaching_franka.mp4
https://user-images.githubusercontent.com/22400377/190899202-6b80c48d-fc49-48e9-b277-24814d0adab1.mp4
reaching_franka_camera.mp4
https://user-images.githubusercontent.com/22400377/190899205-752f654e-9310-4696-a6b2-bfa57d5325f2.mp4
reaching_franka_training_omniverse_isaacgym.png
https://user-images.githubusercontent.com/22400377/190921341-6feb255a-04d4-4e51-bc7a-f939116dd02d.png
reaching_franka_omniverse_isaacgym.mp4 (slow)
https://user-images.githubusercontent.com/22400377/190926792-6e788eaf-1600-4b13-b8c8-e0e0a09e4827.mp4
reaching_franka_omniverse_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/211668430-7cd4668b-e79a-46a9-bdbc-3212388b6b6d.mp4
reaching_franka_training_isaacgym.png
https://user-images.githubusercontent.com/22400377/193546966-bcf966e6-98d8-4b41-bc15-bd7364a79381.png
reaching_franka_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/193537523-e0f0f8ad-2295-410c-ba9a-2a16c827a498.mp4

reaching_iiwa_python.mp4
https://user-images.githubusercontent.com/22400377/212192766-9698bfba-af27-41b8-8a11-17ed3d22c020.mp4
reaching_iiwa_ros_ros2.mp4
https://user-images.githubusercontent.com/22400377/212192817-12115478-e6a8-4502-b33f-b072664b1959.mp4
reaching_iiwa_training_omniverse_isaacgym.png
https://user-images.githubusercontent.com/22400377/212194442-f6588b98-38af-4f29-92a3-3c853a7e31f4.png
reaching_iiwa_omniverse_isaacgym.mp4
https://user-images.githubusercontent.com/22400377/211668313-7bcbcd41-cde5-441e-abb4-82fff7616f06.mp4

reaching_franka_trained_checkpoints.zip
https://github.com/Toni-SM/skrl/files/9595293/trained_checkpoints.zip

reaching_iiwa_trained_checkpoints.zip
https://github.com/Toni-SM/skrl/files/10406561/trained_checkpoints.zip
reaching_iiwa_omniverse_isaacgym_simulation_files.zip
https://github.com/Toni-SM/skrl/files/10409551/simulation_files.zip

py36_linux_frankx.zip
py37_linux_frankx.zip
py38_linux_frankx.zip
py39_linux_frankx.zip

"pygame.error:display Surface quit" in train.eval under render_mode="human"

Change sarsa_gym_taxi.py or sarsa_gymnasium_taxi.py to reproduce the problems:

  1. env = gym.make("Taxi-v3", render_mode="human") # set the render mode
  2. cfg_trainer = {"timesteps": 100, "headless": True} # set a smaller timestep
  3. trainer.eval() # add the eval method after trainer.train()

Expected outcome:
Show the environment under the train method and the eval method

Actual outcome:
The environment under the train method is ok. But, the eval method gets the following error:
Traceback (most recent call last):
File ".\sarsa_gym_taxi.py", line 82, in
trainer.eval()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\trainers\torch\sequential.py", line 145, in eval
self.single_agent_eval()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\trainers\torch\base.py", line 211, in single_agent_eval
states, infos = self.env.reset()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\skrl\envs\torch\wrappers.py", line 471, in reset
observation, info = self._env.reset()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\time_limit.py", line 68, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\order_enforcing.py", line 42, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\wrappers\env_checker.py", line 47, in reset
return self.env.reset(**kwargs)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 277, in reset
self.render()
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 290, in render
return self._render_gui(self.render_mode)
File "C:\Users\hp\Anaconda3\envs\skrl-env\lib\site-packages\gym\envs\toy_text\taxi.py", line 366, in _render_gui
self.window.blit(self.background_img, cell)
pygame.error: display Surface quit

Comment:
Because display is ok if eval without train, I think that the problem might be related to 'pygame stop in train' and 'pygame start in eval'. Thanks for any advice.

[Bug] Indexing issue in memory sampling function

Hi @Toni-SM ,

First of all, thank you for this excellent, well-documented library!

I might be off here, but when playing with the sample method of the RandomMemory class, I've encountered the following PyTorch error:
IndexError: too many indices for tensor of dimension 2
raised in the sample_by_index method of the Memory class.

This error comes from trying to index self.tensors_view[name] with a list of tensors, i.e. batch when indexes is of type torch.Tensor. When indexes is a list, it works fine.

A quick fix is to return
[[self.tensors_view[name][[batch]] for name in names] for batch in batches]
instead of
[[self.tensors_view[name][batch] for name in names] for batch in batches]
when indexes is of type torch.Tensor.

I hope this issue doesn't come from my end (I apologize in advance if that's the case), as I may be using these methods wrong.

Error Running Orbit Example

(orbit2) kaito@comet:~/Documents/Expt/Orbit/Project_Code$ orbit -p ppo_lift_franka.py 
[INFO] Using python from: /home/kaito/mambaforge-pypy3/envs/orbit2/bin/python                                                                                                                                     
Traceback (most recent call last):
  File "ppo_lift_franka.py", line 5, in <module>
    from skrl.models.torch import Model, GaussianMixin, DeterministicMixin
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/skrl/models/torch/__init__.py", line 1, in <module>
    from skrl.models.torch.base import Model
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/skrl/models/torch/base.py", line 4, in <module>
    import gymnasium
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/__init__.py", line 12, in <module>
    from gymnasium.envs.registration import (
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/envs/__init__.py", line 382, in <module>
    load_plugin_envs()
  File "/home/kaito/mambaforge-pypy3/envs/orbit2/lib/python3.7/site-packages/gymnasium/envs/registration.py", line 565, in load_plugin_envs
    for plugin in metadata.entry_points(group=entry_point):
TypeError: entry_points() got an unexpected keyword argument 'group'

I've tried to run the example for Orbit using SKRL, I have created a new conda environment via orbit and this issue still persists...any fixes ???

A little bug on environment wrapper

Discussed in #70

Originally posted by 403forbiddennn April 19, 2023
In the Isaac Gym wrapper class, the render method is inappropriately overridden by your wrapper and thus can not render successfully. For example, the render method of IsaacGymPreview3Wrapper is:

 def render(self, *args, **kwargs) -> None:
        """Render the environment
        """
        pass

which overrides the render() in VecTask.

Wall clock time in Isaac Gym benchmarks?

NVIDIA Isaac Gym

Environment PPO
Allegro Hand 1 3942.69
Ant 5466.3 +/- 279.61
Anymal 61.86 +/- 1.81
Anymal Terrain 19.82 +/- 0.57
Ball Balance 288.07 +/- 25.54
Cartpole 2 494.34 +/- 0.87
Franka Cabinet 3134.0 +/- 182.99
Humanoid 6474.34 +/- 696.27
Ingenuity 7066.82 +/- 488.97
Quadcopter 1237.75 +/- 127.05
Shadow Hand 7898.38 +/- 180.75
Environment AMP
Humanoid 295.65 +/- 0.86

The following charts show the episode's mean length in timesteps (left) and the mean total reward (right)

Allegro Hand (PPO)
AllegroHand

Ant (PPO)
Ant

Anymal (PPO)
Anymal

Anymal Terrain (PPO)
AnymalTerrain

Ball Balance (PPO)
BallBalance

Cartpole (PPO)
Cartpole

Franka Cabinet (PPO)
FrankaCabinet

Humanoid (PPO)
Humanoid

Humanoid (AMP: imitate different pre-recorded human animations)
HumanoidAMP

Ingenuity (PPO)
Ingenuity

Quadcopter (PPO)
Quadcopter

Shadow Hand (PPO)
ShadowHand

Originally posted by @Toni-SM in #32 (comment)

Hi, I was looking through the benchmark results here (above) for Isaac Gym and was wondering if you could provide also wall clock time in them or if you have the info about how long it took to train each of them? Since for Isaac Gym training that is the critical variable for me to understand the performance due to the ability to vary the number of environments. Thanks :)

How to update the actions only once during the whole episode

Hi,

thank you for this great open-sourced library. Currently, I am trying to use PPO from this library in conjunction with ISAAC GYM.

More specifically, I am trying to find out a way to update the actions only once during the whole episode, which means one action from the action buffer should be sampled at the beginning of the episode and should remain constant until the episode ends. Is there a way for this?

Mean rewards are not calculated properly

Description

The mean rewards are computed by adding the mean of all stored cumulative rewards to the self.tracking_data dictionary
self.tracking_data["Reward / Total reward (mean)"].append(np.mean(track_rewards)). Then every time the data is mean to be written the mean of all the rewards store in self.tracking_data["Reward / Total reward (mean)"] is written
self.writer.add_scalar(k, np.mean(v), timestep) and the tracking data is cleared. The issue is that points are appended every time there is data inside self._track_rewards the cumulative rewards storage. This results all the cumulative rewards that have been added in storage since the last write to be meaned, added to the tracking data and meaned again on write.

eg. say each episode is 3 steps, only 1 env instance running. Writing done every 9 steps

step1: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step2: self._track_rewards = [] self.tracking_data["Reward / Total reward (mean)"]=[]
step3: Episode finishes with cumulative reward -30: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30]
step4: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30]
step5: self._track_rewards = [-30] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30]
step6: Episode finished with cumulative reward -4: step3: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17]
step7: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17]
step8: self._track_rewards = [-30, -4] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -17]
step9 : Episode finished with reward -10: self._track_rewards = [-30, -4, -10] self.tracking_data["Reward / Total reward (mean)"]=[-30, -30, -30, -17, -17, -22]

At the end of step 9 the mean cumulative reward of the past 3 episodes is 14.(6) . The cumulative reward that is being written to tensorboard is -29.2 VERY DIFFERENT

SOLUTION: self._track_rewards.clear() after every time data is added to self.tracking_data["Reward / Total reward (mean)"]

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response

The demo of AnymalTerrain had bad result in skrl

Hi! Thanks for this wonderful library.
I found bad result with AnymalTerrain in skrl. The ppo of rl_games could get a useful policy after 1500 iterations(36000 timesteps), but for ppo in skrl(same hyper parameters, task file and timesteps), the result was so bad with low reward.
I'm quite confused for the result.

ModuleNotFoundError: No module named 'skrl.envs.wrappers'

Hello,

I tried running on of the examples of Isaac Gym and I receive this error. from skrl.envs.wrappers.torch import wrap_envModuleNotFoundError: No module named 'skrl.envs.wrappers'. I check the repo and it looks like wrap_env doesn't exist there anymore.

Retraining the policy on real-world setup

Hello,

I have seen your script "environment.py" in the below discussion which gives a rough baseline for evaluating the trained policy in the real world setup. I would like to ask whether there is a way to extend this script so that the trained policy can be retrained in the real world setup in order to minimize the existing sim2real gap.

Discussed in #10

Originally posted by AntonBock May 2, 2022
Hello,

We have trained a policy that we would like to test on a real-world setup. Does SKRL have any built-in support for this, or do you have any recommended method of doing this?

-Anton

Failing to wrap Isaac Gym Preview 4 Environments if No Wrapper Type is Given

Description

When running the getting started tutorial:

import isaacgymenvs

# import the environment wrapper
from skrl.envs.wrappers.torch import wrap_env

# create/load the environment using the easy-to-use API from NVIDIA
env = isaacgymenvs.make(seed=0,
                        task="Cartpole",
                        num_envs=512,
                        sim_device="cuda:0",
                        rl_device="cuda:0",
                        graphics_device_id=0,
                        headless=False)

# wrap the environment
env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="isaacgym-preview4")'

env = wrap_env(env) does not work by default.

What skrl version are you using?

1.2.0

What ML framework/library version are you using?

PyTorch (2.3.1+cu121)

Additional system information

Python 3.8.19 on Linux

Documentation -> WrapperTag: "isaaclab" is wrong, write "isaac-orbit" instead

Description

In the Documentation at Wrapping-API both for pytorch and jax in the documentation file is indicated the wrapper tag: "isaaclab"

instead looking in the code it is used "isaac-orbit" (it works)

here a snap of the error occurred
image

What skrl version are you using?

1.1.0

What ML framework/library version are you using?

Pytorch Version: 2.2.2+cu118, Pip 24.0, conda 24.5.0, Isaaclab 4.0.0

Additional system information

No response

Random action only samples from the first action space dimension

Description

Random actions are done by taking the low and high values of the first dimension on the action space a,d then uniformly sampling from [low, high] for each dimension of an action.

   self._random_distribution = torch.distributions.uniform.Uniform(
                    low=torch.tensor(self.action_space.low[0], device=self.device, dtype=torch.float32),
                    high=torch.tensor(self.action_space.high[0], device=self.device, dtype=torch.float32))

The issue is that if i have the following action space for example gym.Box(low=[-5, -3], high=[5,3]) any sampled action[1] will be in [-5,5] instead of [-3,3]

SOLUTION IS:

self._random_distribution = torch.distributions.uniform.Uniform(
                  low=torch.tensor(self.action_space.low, device=self.device, dtype=torch.float32),
                  high=torch.tensor(self.action_space.high, device=self.device, dtype=torch.float32))

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

pytorch

Additional system information

No response

Getting blank stderr while training with OIGE

Description

I get duplicate progress bars from stderr. This code used to work perfectly fine in the previous version of OIGE. I have upgraded to 2023.1.0 yesterday. I have been getting this behavior since then.

image
The training seems to work correctly though. No error and the policy is trained as expected.

I do get this before the training starts though. Not sure if this is related to the problem.
image

What skrl version are you using?

1.0.0

What ML framework/library version are you using?

PyTorch Version: 2.0.1, OIGE 2023.1.0a0

Additional system information

Linux (Ubuntu 22.04)

How to implement the curriculum learning using the existing data

Hi,

I would like to implement the so-called curriculum learning using skrl, where I initialize the training with a pre-recorded data and gradually decrease the usage of this pre-recorded data.
The part that I do not understand is the way the code is structured. Taking the "FrankaCabinet" as an example:


agent = PPO(models=models_ppo,
            memory=memory, 
            cfg=cfg_ppo, 
            observation_space=env.observation_space, 
            action_space=env.action_space,
            device=device)

# Configure and instantiate the RL trainer
cfg_trainer = {"timesteps": 24000, "headless": True}
trainer = SequentialTrainer(cfg=cfg_trainer, env=env, agents=agent)

# start training
trainer.train()

Above code is used to initialize the agent and start the training. Assuming I have the pre-recorded joint trajectory of Franka arm as Numpy array, I would like to overwrite action (which is the output of the agent) with this Numpy array to guide the robot arm towards the desired behavior. However, in this way, the whole training would be messed up, as the provided action is actually crap. So, by simply overwriting the action values, the pre-recorded numpy array can not be appropriately used.

Do you have advice/tips for this case?

Rename isaacgym env loaders

The function load_isaacgym_env_preview3 works with IsaacGym Preview 3 and 4, well, mostly, see #20.
It will probably also (mostly) work with the upcoming versions given all the work NVIDIA put into separating the envs from the isaacgym package.
Renaming load_isaacgym_env_preview3 to something like load_isaacgym_env_preview3_4 or load_isaacgym_env_preview3_and_up.

Thus, I'd propose a new naming scheme which is more clear for users of Preview 4 (and possible other upcoming previews).

My proposal is renaming load_isaacgym_env_preview3 to load_isaacgym_env_preview and load_isaacgym_env_preview2 to load_isaacgym_env_preview_legacy.

Obviously, combining the code of both functions into one which can handle all current and future preview release would be the easiest solution, for end-users that is.

I'd be happy to implement those changes but wanted to discuss it first.

Cheers ๐ŸŽ‰

PS: All this might change once IsaacGym is out of preview, or a future previews breaks everything again.

Support gym's new step API

gyms version 0.25.0 updates the step API.
step is now supposed to return terminated: bool and truncated: bool instead of is_done: bool.

A quick and dirty fix would be the following.
However, the release notes of gym (and a promised blog post to be released) mention, that done is not equal to termination.
As I'm not yet sure, how much of an impact that would be for skrl, I'm opening this issue.

PS: @Toni-SM, I tried to open a new card in the project, but I don't have the rights to do so.
I'd be honored if you would consider adding me to the project! :)

diff --git a/skrl/envs/torch/wrappers.py b/skrl/envs/torch/wrappers.py
index 62110e0..ffd6489 100644
--- a/skrl/envs/torch/wrappers.py
+++ b/skrl/envs/torch/wrappers.py
@@ -271,6 +271,11 @@ class GymWrapper(Wrapper):
         except Exception as e:
             print("[WARNING] Failed to check for a vectorized environment: {}".format(e))

+        if hasattr(self, "new_step_api"):
+            self._new_step_api = self._env.new_step_api
+        else:
+            self._new_step_api = False
+
     @property
     def state_space(self) -> gym.Space:
         """State space
@@ -359,13 +364,16 @@ class GymWrapper(Wrapper):
         :return: The state, the reward, the done flag, and the info
         :rtype: tuple of torch.Tensor and any other info
         """
-        observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
+        if self._new_step_api:
+            observation, reward, done, _, info = self._env.step(self._tensor_to_action(actions))
+        else:
+            observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
         # convert response to torch
         return self._observation_to_tensor(observation), \
                torch.tensor(reward, device=self.device, dtype=torch.float32).view(self.num_envs, -1), \
                torch.tensor(done, device=self.device, dtype=torch.bool).view(self.num_envs, -1), \
                info

     def reset(self) -> torch.Tensor:
         """Reset the environment

[Question] TD3_DEFAULT_CONFIG

Hi @Toni-SM

While utilizing TD3 agent on the pendulum-v1 env with the default config, I checked that "smooth_regularization_noise" is None in default config.

however, that cfg make this below error.

skrl/agents/torch/td3/td3.py in _update(self, timestep, timesteps)
393 # target policy smoothing
394 next_actions, _, _ = self.target_policy.act({"states": sampled_next_states, **rnn_policy}, role="target_policy")
--> 395 noises = torch.clamp(self._smooth_regularization_noise.sample(next_actions.shape),
396 min=-self._smooth_regularization_clip,
397 max=self._smooth_regularization_clip)

AttributeError: 'NoneType' object has no attribute 'sample'

I add the line cfg_td3["smooth_regularization_noise"] = GaussianNoise(mean=0, std=1) to avoid that error and it works.

Could you check this error and kindly let me know what would be a suitable default value for smooth_regularization_noise in TD3?

dm_control wrapper

Description

Hi there,

I have been using skrl with OIGE, but when I try the "Getting Started" code for dm_control :

# import the environment wrapper and the deepmind suite
from skrl.envs.wrappers.torch import wrap_env
from dm_control import suite

# load the environment
env = suite.load(domain_name="cartpole", task_name="swingup")

# wrap the environment
env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'

I get the following error:

[skrl:INFO] Environment class: dm_env._environment.Environment
INFO:skrl:Environment class: dm_env._environment.Environment
[skrl:INFO] Environment wrapper: DeepMind
INFO:skrl:Environment wrapper: DeepMind
Traceback (most recent call last):
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/base.py", line 24, in __init__
    self._action_space = self._env.single_action_space
AttributeError: 'Environment' object has no attribute 'single_action_space'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elle/code/devel/skrl_play/test123.py", line 9, in <module>
    env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/__init__.py", line 103, in wrap_env
    return DeepMindWrapper(env)
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/deepmind_envs.py", line 19, in __init__
    super().__init__(env)
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/base.py", line 27, in __init__
    self._action_space = self._env.action_space
AttributeError: 'Environment' object has no attribute 'action_space'. Did you mean: 'action_spec'?

This is with Python 3.10.14, dm_control==1.0.20 and skrl==1.1.0. It seems that in the inheritance of DeepMindWrapper from in the base Wrapper class, self._env.action_space is being called before self._spec_to_space can happen.

Also just as a heads up, when I try with skrl==1.2.0, I now get this:

Traceback (most recent call last):
  File "/home/elle/code/devel/skrl_play/test123.py", line 9, in <module>
    env = wrap_env(env)  # or 'env = wrap_env(env, wrapper="dm")'
  File "/home/elle/code/envs/sk_test/lib/python3.10/site-packages/skrl/envs/wrappers/torch/__init__.py", line 154, in wrap_env
    raise ValueError(f"Unknown wrapper type: {wrapper}")
ValueError: Unknown wrapper type: ['dm_env._environment.Environment']

Thank you!

What skrl version are you using?

1.1.0

What ML framework/library version are you using?

torch==2.3.1

Additional system information

Python 3.10.14

A wrong report in base.py

There seems to be a bug when I try to run the program in the example.

Traceback (most recent call last):
  File "/home/cwj/my_project/RoboticLab/my_imply/skrl_test/test.py", line 97, in <module>
    device=device)
  File "/home/cwj/my_project/RoboticLab/skrl/skrl/agents/torch/ppo/ppo.py", line 128, in __init__
    self.memory.create_tensor(name="states", size=self.observation_space, dtype=torch.float32)
  File "/home/cwj/my_project/RoboticLab/skrl/skrl/memories/torch/base.py", line 105, in create_tensor
    tensor.fill_(torch.nan)
AttributeError: module 'torch' has no attribute 'nan'

In pytorch 1.9 or lower, there seems no torch.nan, replace it as math.nan can solve the problem.

In skrl/memories/torch/base.py line 105.

tensor.fill_(math.nan)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.