Git Product home page Git Product logo

feasible-actor-critic's Introduction

  • ๐Ÿ‘‹ Hi, Iโ€™m Haitong Ma @mahaitongdae. DAE represents the old name of my undergraduate department, department of automotive engineering. It is now School of Vehicle and Mobility at Tsinghua University. After graduating from Tsinghua University, I am fortunate to pursue my PhD at Harvard SEAS working with Prof. Na Li.
  • ๐Ÿ‘€ Iโ€™m interested in the intersection between control, learning and optimization, and broad areas where these techniques could be applied to.
  • ๐Ÿ“„ My previous research mostly focus on developing provable learning-based safety guarantees for dynamical systems (by CBF or reachability analysis) using reinforcement learning.
  • ๐ŸŒฑ Iโ€™m currently learning about the crazyflies, which is an open-sourced quadrotor developed by Bitcraze.
  • ๐Ÿ“ซ Feel free to reach me through issues and email!

Haitong's GitHub stats

feasible-actor-critic's People

Contributors

mahaitongdae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

feasible-actor-critic's Issues

Penalty term in the actor loss

Hi,
in

penalty_terms = self.tf.reduce_mean(self.tf.multiply(self.tf.stop_gradient(lams), QC))
you calculate the cost penalty for the actor loss as $\lambda(s_t) \cdot Q_c(s_t, a_t)$.
Whereas, in your paper in eq. 4.2, you define the cost penalty as $\lambda(s_t) \cdot (Q_c(s_t, a_t) - d)$.
Is there any reason you omitted the cost limit in the actor loss? Do you want to train a policy that causes as few cost occurrences as possible?
I noticed that the $\lambda$ MLP correctly has the cost limits in its loss.

how to create fewer actors or increase the resources available to this Ray cluster?

I am a newbie using ray.i have a problem
/home/ltt/anaconda3/envs/FAC/bin/python /home/ltt/Downloads/Feasible-Actor-Critic/train_script4fsac.py test_dir test_iter_list
2021-12-28 06:22:08.727682: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/ltt/.mujoco/mujoco200/bin
2021-12-28 06:22:08.727697: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-28 06:22:09.591541: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-28 06:22:09.591685: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/ltt/.mujoco/mujoco200/bin
2021-12-28 06:22:09.591694: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-12-28 06:22:09.591714: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ubuntu): /proc/driver/nvidia/version does not exist
/home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
INFO:main:begin training agents with parameter Namespace(act_dim=2, action_range=1.0, alg_name='FAC', alpha='auto', alpha_lr_schedule=[8e-05, 1000000, 3e-06], batch_size=1024, buffer_log_interval=40000, buffer_type='cost', constrained=True, cost_bias=0.0, cost_gamma=0.99, cost_lim=10.0, cost_value_lr_schedule=[8e-05, 4000000, 1e-06], delayed_update=4, demo=False, deterministic_policy=False, double_Q=True, double_QC=False, dual_ascent_interval=12, env_id='Safexp-PointButton1-v0', eval_interval=10000, eval_log_interval=1, eval_render=False, evaluator_type='EvaluatorWithCost', explore_sigma=None, fixed_steps=1000, gamma=0.99, gradient_clip_norm=10.0, grads_max_reuse=2, grads_queue_size=25, lam_gradient_clip_norm=3.0, lam_lr_schedule=[5e-05, 333333, 3e-06], log_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09/logs', log_interval=100, max_buffer_size=500000, max_iter=4000000, max_sampled_steps=0, max_weight_sync_delay=300, mlp_lam=True, mode='training', model_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09/models', model_load_dir=None, model_load_ite=None, mu_bias=0.0, num_agent=1, num_batch_reuse=1, num_buffers=4, num_eval_agent=1, num_eval_episode=5, num_future_data=0, num_learners=4, num_workers=4, obs_dim=76, obs_ptype='scale', obs_scale=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], off_policy=True, optimizer_type='OffPolicyAsyncWithCost', policy_hidden_activation='elu', policy_lr_schedule=[3e-05, 1000000, 1e-06], policy_model_cls='MLP', policy_num_hidden_layers=2, policy_num_hidden_units=256, policy_only=False, policy_out_activation='linear', policy_type='PolicyWithMu', ppc_load_dir=None, random_seed=0, replay_alpha=0.6, replay_batch_size=256, replay_beta=0.4, replay_starts=3000, result_dir='./results/FAC/PointButton/PointButton1-2021-12-28-06-22-09', rew_ptype='scale', rew_scale=1.0, rew_shift=0.0, save_interval=200000, target=True, target_entropy=-2, tau=0.005, test_dir='test_dir', test_iter_list='test_iter_list', value_hidden_activation='elu', value_lr_schedule=[8e-05, 4000000, 1e-06], value_model_cls='MLP', value_num_hidden_layers=2, value_num_hidden_units=256, worker_log_interval=5, worker_type='OffPolicyWorkerWithCost')
2021-12-28 06:22:11,003 INFO services.py:1174 -- View the Ray dashboard at http://127.0.0.1:8265
2021-12-28 06:22:14.252747: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-28 06:22:14.252936: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:worker:Worker initialized
INFO:optimizer:start filling the replay
(pid=6552) /home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=6552) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=6553) /home/ltt/anaconda3/envs/FAC/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=6553) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=6553) INFO:worker:Worker initialized
(pid=6553) INFO:worker:Worker_info: {'worker_id': 1, 'num_sample': 0, 'num_costs': 0, 'cost_rate': 0}
2021-12-28 06:22:32,308 WARNING worker.py:1108 -- The actor or task with ID ffffffffffffffff69a6825d641b461327313d1c01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {0.000000/2.000000 CPU, 0.927734 GiB/0.927734 GiB memory, 1.000000/1.000000 node:192.168.21.140, 0.292969 GiB/0.292969 GiB object_store_memory}
. In total there are 0 pending tasks and 11 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.