liuzuxin / osrl Goto Github PK
View Code? Open in Web Editor NEWπ€ Elegant implementations of offline safe RL algorithms in PyTorch
Home Page: https://offline-saferl.org
License: Apache License 2.0
π€ Elegant implementations of offline safe RL algorithms in PyTorch
Home Page: https://offline-saferl.org
License: Apache License 2.0
Hi,
I was wondering if this framework can be used for a Multi Agent RL problem.
if not, can you list the limitations that would be stop this model from being used in a Multi Agent RL problem.
If yes, can you explain the additional steps that will be required for me to use this model in a Multi Agent RL problem.
Thanks
Hello, I trained the model and want to see the render results in eval phase.I made the following changes in BCQLTrainer.py .
def rollout(self):
"""
Evaluates the performance of the model on a single episode.
"""
obs, info = self.env.reset()
episode_ret, episode_cost, episode_len = 0.0, 0.0, 0
for _ in range(self.model.episode_len):
act, _ = self.model.act(obs)
obs_next, reward, terminated, truncated, info = self.env.step(act)
cost = info["cost"] * self.cost_scale
obs = obs_next
episode_ret += reward
episode_len += 1
episode_cost += cost
if terminated or truncated:
break
self.env.render()
return episode_ret, episode_len, episode_cost
In eval_bcql.py
env = wrap_env(
env=gym.make(cfg["task"], render_mode="human"),
reward_scale=cfg["reward_scale"],
)
But I got it
C:\Users\s3424\anaconda3\envs\RLenv\python.exe D:/Code/Python/OSRL/examples/eval/eval_bcql.py
OApackage is not installed, can not use CDT.
load config from D:\Code\Python\OSRL\examples\train\logs\OfflineCarCircle1Gymnasium-v0-cost-10\BCQL_episode_len300-790e\BCQL_episode_len300-790e\config.yaml
load model from D:\Code\Python\OSRL\examples\train\logs\OfflineCarCircle1Gymnasium-v0-cost-10\BCQL_episode_len300-790e\BCQL_episode_len300-790e\checkpoint/model.pt
Traceback (most recent call last):
File "D:\Code\Python\OSRL\examples\eval\eval_bcql.py", line 82, in <module>
eval()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\pyrallis\argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "D:\Code\Python\OSRL\examples\eval\eval_bcql.py", line 74, in eval
ret, cost, length = trainer.evaluate(args.eval_episodes)
File "D:\Code\Python\OSRL\osrl\algorithms\bcql.py", line 327, in evaluate
epi_ret, epi_len, epi_cost = self.rollout()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\Code\Python\OSRL\osrl\algorithms\bcql.py", line 353, in rollout
self.env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\core.py", line 418, in render
return self.env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\core.py", line 418, in render
return self.env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\core.py", line 418, in render
return self.env.render()
[Previous line repeated 1 more time]
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\wrappers\order_enforcing.py", line 70, in render
return self.env.render(*args, **kwargs)
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\wrappers\env_checker.py", line 63, in render
return env_render_passive_checker(self.env, *args, **kwargs)
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\utils\passive_env_checker.py", line 391, in env_render_passive_checker
result = env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\core.py", line 418, in render
return self.env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\core.py", line 418, in render
return self.env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\wrappers\order_enforcing.py", line 70, in render
return self.env.render(*args, **kwargs)
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\wrappers\env_checker.py", line 63, in render
return env_render_passive_checker(self.env, *args, **kwargs)
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\gymnasium\utils\passive_env_checker.py", line 391, in env_render_passive_checker
result = env.render()
File "C:\Users\s3424\anaconda3\envs\RLenv\lib\site-packages\safety_gymnasium\builder.py", line 312, in render
assert self.render_parameters.mode, 'Please specify the render mode when you make env.'
AssertionError: Please specify the render mode when you make env.
How should I solve this problem.
Hi,
I am trying to run the evaluation using the commands provided on the readme. Seems like there is no config file in the checkpoints folder, just the model.pt and model_best.pt. Here is the list of all files in the directory
When I try to run the Eval_cdt.py I get the following error where it says that config file location is not a directory. What am I doing wrong here?
The path to my model and best_models are "/home/xubuntu/Desktop/homelocal/3rd Year/DT_RL/OSRL/logs/OfflineCarPush1Gymnasium-v0-cost-5/CDT_cost5-eea0/CDT_cost5-eea0/checkpoint/mode.pt "
I even tried to give the path argument as you have mentioned in the command in your readme but still the same issue. See the below screenshot please.
In this screenshot I entered the model path inside the eval_cdt.py file and hence path argument is not used here
Hi! Great work, surely, safety is a very important topic in offline RL. However, we are a little bit puzzled by the complete lack of citations for the CORL library, considering that:
We would like to remind you that CORL is licensed under the Apache License. If you borrow code like this, you should give credit to the source. Also, such misconduct could actually be a significant violation of the NeurIPS Datasets and Benchmarks Track rules.
Thus, we would highly appreciate if you credited CORL both in the code and publication. Thanks in advance!
I have successfully written a custom environment in the gymnasium and used it in CDT successfully,Here's the environment I createdοΌ
but I ran into two problems:
2.There are some parameters in cdt_configs.py that I don't know what will happen if I change them?
For example, num_heads, target_returns, cost_limit, deg, max_rew_decrease, max_reward, reward_scale should I change,?How should I change,?And what does them do? Or Is there any documentation for these parameters?
May you help me?I'm really confused about these parameters!
Hi,
I would like this model to take visual observations as inputs (for example from Atari dataset) rather than trajectories. Is there a way I could do that?
Thanks
Hi @liuzuxin, great work! I encountered the following bugs and problems when using the OSRL library:
python ./examples/train/train_cdt.py --task OfflineCarPush1Gymnasium-v0 --cost_limit 5 --device "cuda:3"
but I got:
/root/anaconda3/envs/osrl/lib/python3.10/site-packages/numpy/lib/polynomial.py:667: RuntimeWarning: invalid value encountered in divide
lhs /= scale
** On entry to DLASCL parameter number 4 had an illegal value
Traceback (most recent call last):
File "/root/zyn/ydj_data_collection/OSRL/./examples/train/train_cdt.py", line 226, in <module>
train()
File "/root/anaconda3/envs/osrl/lib/python3.10/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/root/zyn/ydj_data_collection/OSRL/./examples/train/train_cdt.py", line 127, in train
dataset = SequenceDataset(
File "/root/zyn/ydj_data_collection/OSRL/osrl/common/dataset.py", line 726, in __init__
self.idx, self.aug_data, self.pareto_frontier, self.indices = augmentation(
File "/root/zyn/ydj_data_collection/OSRL/osrl/common/dataset.py", line 369, in augmentation
pareto_frontier = np.poly1d(np.polyfit(cost_ret_pareto, rew_ret_pareto, deg=deg))
File "<__array_function__ internals>", line 200, in polyfit
File "/root/anaconda3/envs/osrl/lib/python3.10/site-packages/numpy/lib/polynomial.py", line 668, in polyfit
c, resids, rank, s = lstsq(lhs, rhs, rcond)
File "<__array_function__ internals>", line 200, in lstsq
File "/root/anaconda3/envs/osrl/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 2285, in lstsq
x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
File "/root/anaconda3/envs/osrl/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 101, in _raise_linalgerror_lstsq
raise LinAlgError("SVD did not converge in Linear Least Squares")
numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares
wandb: Waiting for W&B process to finish... (failed 1).
How should I solve this problem?
Hi,
I am trying to run the train_cdt.py as per instruction listed on github readme page. However, when I try to install the OApackage I get the following error.
β― pip install OApackage==2.7.6
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting OApackage==2.7.6
Downloading OApackage-2.7.6.tar.gz (1.2 MB)
ββββββββββββββββββββββββββββββββββββββββ 1.2/1.2 MB 11.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
Γ python setup.py egg_info did not run successfully.
β exit code: 1
β°β> [12 lines of output]
/tmp/pip-install-1t4f2txe/oapackage_afdf46ec7c0d4ba1a2622e95ae5261fd/setup.py:128: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(swig_version) >= LooseVersion(swig_minimum_version):
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-1t4f2txe/oapackage_afdf46ec7c0d4ba1a2622e95ae5261fd/setup.py", line 297, in <module>
raise Exception('could not find a recent version if SWIG')
Exception: could not find a recent version if SWIG
swig_version 3.0.12, swig_executable /usr/bin/swig3.0
Readthedocs environment: False
checkZlib: compile and link
find_packages: ['oapackage', 'tests']
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Γ Encountered error while generating package metadata.
β°β> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
I tried installing the dependencies within the conda environment as well as outside the environment, but the error persists. Any help in resolving the error will be appreciated.
I ran the commandpython examples/train/train_bcql.py --task OfflineCarCircle-v0
as illustrated in the example.However, i found the output of the training result is empty.I looked at the web of wandb but it didn't generate any icons, just some empty folders.And logs/OfflineCarCircle-v0-cost-10/BCQL-3d74/BCQL-3d74/progress.txt
is also empty.There mush be something wrong,but i don't know why.Iβm looking forward to your help.
Thank you for the wonderful work. I have read the paper about Constrained Decision Transformer. Could you explain why you apply cost_transform
when calculating a cost-to-go (CDT's forward: 50-x
and dataset's sample_prob: 70-x
)? Also, if this is mentioned in the paper, I would appreciate it if you could tell me where it is written. Thank you.
hi, I got an error after install all the packages and run your code.
Traceback (most recent call last):
File "/home/ubuntu/OSRL/examples/train/train_bc.py", line 15, in
from fsrl.utils import WandbLogger
File "/home/ubuntu/anaconda3/envs/OSRL/lib/python3.9/site-packages/fsrl/utils/init.py", line 3, in
from fsrl.utils.logger import BaseLogger, DummyLogger, TensorboardLogger, WandbLogger
File "/home/ubuntu/anaconda3/envs/OSRL/lib/python3.9/site-packages/fsrl/utils/logger/init.py", line 4, in
from fsrl.utils.logger.tb_logger import TensorboardLogger
File "/home/ubuntu/anaconda3/envs/OSRL/lib/python3.9/site-packages/fsrl/utils/logger/tb_logger.py", line 5, in
from torch.utils.tensorboard import SummaryWriter
File "/home/ubuntu/anaconda3/envs/OSRL/lib/python3.9/site-packages/torch/utils/init.py", line 4, in
from .throughput_benchmark import ThroughputBenchmark
File "/home/ubuntu/anaconda3/envs/OSRL/lib/python3.9/site-packages/torch/utils/throughput_benchmark.py", line 2, in
import torch._C
ModuleNotFoundError: No module named 'torch._C'
In my env,as the number of training rounds increases, the values of cost and reward change irregularly,The training results are shown below,it doesn't look normal.
However,When I choose other env(like OfflinePointCircle2Gymnasium-v0), the cost gradually declines as the number of training rounds increases, the ret rises as the number of training rounds increases,it look normal.
Here's the parameters in my cdt_config.py
@dataclass
class CDTTrainConfig:
# wandb params
project: str = "OSRL-baselines"
group: str = None
name: Optional[str] = None
prefix: Optional[str] = "CDT"
suffix: Optional[str] = ""
logdir: Optional[str] = "logs"
verbose: bool = True
# dataset params
outliers_percent: float = None
noise_scale: float = None
inpaint_ranges: Tuple[Tuple[float, float], ...] = None
epsilon: float = None
density: float = 1.0
# model params
embedding_dim: int = 128
num_layers: int = 3
num_heads: int = 8
action_head_layers: int = 1
seq_len: int = 10
episode_len: int = 300
attention_dropout: float = 0.1
residual_dropout: float = 0.1
embedding_dropout: float = 0.1
time_emb: bool = True
# training params
#task: str = "OfflinePointCircle2Gymnasium-v0"
task: str = "Autobidding-v0"
dataset: str = None
learning_rate: float = 1e-4
betas: Tuple[float, float] = (0.9, 0.999)
weight_decay: float = 1e-4
clip_grad: Optional[float] = 0.25
batch_size: int = 8
update_steps: int = 5000
lr_warmup_steps: int = 200
reward_scale: float = 0.1
cost_scale: float = 1
num_workers: int = 0
# evaluation params
target_returns: Tuple[Tuple[float, ...],
...] = ((450.0, 10), (500.0, 20), (550.0, 50)) # reward, cost
#The cost limit corresponds to the cost threshold for your problem, it should be the same as your target cost return for CDT.
cost_limit: int = 100
eval_episodes: int = 10
eval_every: int = 100
# general params
seed: int = 0
device: str = "cuda:0"
threads: int = 6
# augmentation param
deg: int = 4
pf_sample: bool = False
beta: float = 1.0
augment_percent: float = 0.2
# maximum absolute value of reward for the augmented trajs
max_reward: float = 1.0
# minimum reward above the PF curve
min_reward: float = 0.2
# the max drecrease of ret between the associated traj
# w.r.t the nearest pf traj
max_rew_decrease: float = 0.5
# model mode params
use_rew: bool = True
use_cost: bool = True
cost_transform: bool = True
cost_prefix: bool = False
add_cost_feat: bool = False
mul_cost_feat: bool = False
cat_cost_feat: bool = False
loss_cost_weight: float = 0.02
loss_state_weight: float = 0
cost_reverse: bool = False
# pf only mode param
pf_only: bool = False
rmin: float = 300
cost_bins: int = 60
npb: int = 5
cost_sample: bool = True
linear: bool = True # linear or inverse
start_sampling: bool = False
prob: float = 0.2
stochastic: bool = True
init_temperature: float = 0.1
no_entropy: bool = False
# random augmentation
random_aug: float = 0
aug_rmin: float = 400
aug_rmax: float = 500
aug_cmin: float = -2
aug_cmax: float = 25
cgap: float = 5
rstd: float = 1
cstd: float = 0.2
@dataclass
class CDTCarCircleConfig(CDTTrainConfig):
pass
@dataclass
class AutobiddingConfig(CDTTrainConfig):
# model params
seq_len: int = 10
episode_len: int = 1000
# training params
task: str = "Autobidding-v0"
target_returns: Tuple[Tuple[float, ...],
...] = ((15.0, 20), (15.0, 40), (15.0, 80))# reward, cost
# augmentation param
deg: int = 1
max_reward: float = 2
min_reward: float = 1
max_rew_decrease: float = 0.3
device: str = "cuda:0"
The format of my data is shown below:
May you help me?What do you think is the problem?
Although you mentioned that your code is inspired by CORL but I find differences as well other than CTG. Can you help me understand your code better by answering the following:
Thanks
Thank you for your outstanding work. I have read the 'Constrained Decision Transformer for Offline Safe Reinforcement Learning'. I have two questions I'd like to kindly ask:
Thank you.
(I also posed a question in issue #7 , but I separated the issues for clarity and to make it easier for future readers to understand.)
Hi,
I see that one of your baseline method is a simple DT-Cost. But I am unable to find the files in examples folder for DT-Cost. Can you kindly share those files for training.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.