Light

mahaitongdae / feasible-actor-critic Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 3.0 30.32 MB

Code for paper Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety.

License: MIT License

Python 100.00%

feasible-actor-critic's Introduction

👋 Hi, I’m Haitong Ma @mahaitongdae. DAE represents the old name of my undergraduate department, department of automotive engineering. It is now School of Vehicle and Mobility at Tsinghua University. After graduating from Tsinghua University, I am fortunate to pursue my PhD at Harvard SEAS working with Prof. Na Li.
👀 I’m interested in the intersection between control, learning and optimization, and broad areas where these techniques could be applied to.
📄 My previous research mostly focus on developing provable learning-based safety guarantees for dynamical systems (by CBF or reachability analysis) using reinforcement learning.
🌱 I’m currently learning about the crazyflies, which is an open-sourced quadrotor developed by Bitcraze.

📫 Feel free to reach me through issues and email!

feasible-actor-critic's People

Contributors

Stargazers

Watchers

Forkers

hlhang9527 jakobthumm stephlee12

feasible-actor-critic's Issues

Penalty term in the actor loss

Hi,
in

Feasible-Actor-Critic/learners/sac.py

Line 400 in 54c20df

 penalty_terms = self.tf.reduce_mean(self.tf.multiply(self.tf.stop_gradient(lams), QC)) 

you calculate the cost penalty for the actor loss as $\lambda(s_t) \cdot Q_c(s_t, a_t)$.
Whereas, in your paper in eq. 4.2, you define the cost penalty as $\lambda(s_t) \cdot (Q_c(s_t, a_t) - d)$.
Is there any reason you omitted the cost limit in the actor loss? Do you want to train a policy that causes as few cost occurrences as possible?
I noticed that the $\lambda$ MLP correctly has the cost limits in its loss.

how to create fewer actors or increase the resources available to this Ray cluster?

I am a newbie using ray.i have a problem
/home/ltt/anaconda3/envs/FAC/bin/python /home/ltt/Downloads/Feasible-Actor-Critic/train_script4fsac.py test_dir test_iter_list
2021-12-28 06:22:08.727682: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/ltt/.mujoco/mujoco200/bin
2021-12-28 06:22:08.727697: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-28 06:22:09.591541: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-28 06:22:09.591685: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/ltt/.mujoco/mujoco200/bin
2021-12-28 06:22:09.591694: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-12-28 06:22:09.591714: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ubuntu): /proc/driver/nvidia/version does not exist
/home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
INFO:main:begin training agents with parameter Namespace(act_dim=2, action_range=1.0, alg_name='FAC', alpha='auto', alpha_lr_schedule=[8e-05, 1000000, 3e-06], batch_size=1024, buffer_log_interval=40000, buffer_type='cost', constrained=True, cost_bias=0.0, cost_gamma=0.99, cost_lim=10.0, cost_value_lr_schedule=[8e-05, 4000000, 1e-06], delayed_update=4, demo=False, deterministic_policy=False, double_Q=True, double_QC=False, dual_ascent_interval=12, env_id='Safexp-PointButton1-v0', eval_interval=10000, eval_log_interval=1, eval_render=False, evaluator_type='EvaluatorWithCost', explore_sigma=None, fixed_steps=1000, gamma=0.99, gradient_clip_norm=10.0, grads_max_reuse=2, grads_queue_size=25, lam_gradient_clip_norm=3.0, lam_lr_schedule=[5e-05, 333333, 3e-06], log_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09/logs', log_interval=100, max_buffer_size=500000, max_iter=4000000, max_sampled_steps=0, max_weight_sync_delay=300, mlp_lam=True, mode='training', model_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09/models', model_load_dir=None, model_load_ite=None, mu_bias=0.0, num_agent=1, num_batch_reuse=1, num_buffers=4, num_eval_agent=1, num_eval_episode=5, num_future_data=0, num_learners=4, num_workers=4, obs_dim=76, obs_ptype='scale', obs_scale=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], off_policy=True, optimizer_type='OffPolicyAsyncWithCost', policy_hidden_activation='elu', policy_lr_schedule=[3e-05, 1000000, 1e-06], policy_model_cls='MLP', policy_num_hidden_layers=2, policy_num_hidden_units=256, policy_only=False, policy_out_activation='linear', policy_type='PolicyWithMu', ppc_load_dir=None, random_seed=0, replay_alpha=0.6, replay_batch_size=256, replay_beta=0.4, replay_starts=3000, result_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09', rew_ptype='scale', rew_scale=1.0, rew_shift=0.0, save_interval=200000, target=True, target_entropy=-2, tau=0.005, test_dir='test_dir', test_iter_list='test_iter_list', value_hidden_activation='elu', value_lr_schedule=[8e-05, 4000000, 1e-06], value_model_cls='MLP', value_num_hidden_layers=2, value_num_hidden_units=256, worker_log_interval=5, worker_type='OffPolicyWorkerWithCost')
2021-12-28 06:22:11,003 INFO services.py:1174 -- View the Ray dashboard at http://127.0.0.1:8265
2021-12-28 06:22:14.252747: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-28 06:22:14.252936: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:worker:Worker initialized
INFO:optimizer:start filling the replay
(pid=6552) /home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=6552) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=6553) /home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=6553) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=6553) INFO:worker:Worker initialized
(pid=6553) INFO:worker:Worker_info: {'worker_id': 1, 'num_sample': 0, 'num_costs': 0, 'cost_rate': 0}
2021-12-28 06:22:32,308 WARNING worker.py:1108 -- The actor or task with ID ffffffffffffffff69a6825d641b461327313d1c01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {0.000000/2.000000 CPU, 0.927734 GiB/0.927734 GiB memory, 1.000000/1.000000 node:192.168.21.140, 0.292969 GiB/0.292969 GiB object_store_memory}
. In total there are 0 pending tasks and 11 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.