danaugrs / huskarl Goto Github PK
View Code? Open in Web Editor NEWDeep Reinforcement Learning Framework + Algorithms
License: MIT License
Deep Reinforcement Learning Framework + Algorithms
License: MIT License
Hi, great work!
Will you plan to add prioritized replay in short time? For example, I found great implementation: https://github.com/alexbooth/DDQN-PER/blob/master/replay_memory.py
What is SOTA in Curiosity-Driven Exploration currently? Will you plan implement that or simple version?
I got this error when running ddpg_pendulum.py:
OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program.
That is dangerous, since it can degrade performance or cause incorrect results.
The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library.
As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results.
For more information, please see http://openmp.llvm.org/
Abort trap: 6
Should I just bypass it by using the unsupported workaround?
When will PPO be released?
Hi, im not sure if i've missed it maybe, but is there any function to restore the saved weights to continue training or just to be able to run a test based on the saved weights so that we don't have to run train every time. Im thinking along the lines of the 'load_weights' function from tensorflow. Thanks !
Add the recent Soft Actor Critic algorithm to the list of supported agents. As described in https://arxiv.org/abs/1801.01290.
Are LSTM networks supported by A2C agent? If so, can you provide a small sample?
Hi! Thanks for your great work. I modified DDPG.py
and want to implement discrete control in “CartPole-v0” environment. But the model learned nothing. I believe the reason behind this is there is no exploration noise in this code. The main changes is to tf.argmax(actor_action) but avoid "No gradient" issue. My code is as followed:
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
from huskarl.policy import PassThrough
from huskarl.core import Agent
from huskarl.memory import ExperienceReplay
import tensorflow.keras.backend as K
class DDPG(Agent):
"""Deep Deterministic Policy Gradient
"Continuous control with deep reinforcement learning" (Lillicrap et al., 2015)
"""
def __init__(self, actor=None, critic=None, optimizer_critic=None, optimizer_actor=None,
policy=None, test_policy=None, memsize=100_000, target_update=1e-3,
gamma=0.9, batch_size=32, nsteps=1, discrete=False):
"""
TODO: Describe parameters
"""
self.actor = actor
self.critic = critic
self.optimizer_critic = Adam(lr=5e-3) if optimizer_critic is None else optimizer_critic
self.optimizer_actor = Adam(lr=5e-3) if optimizer_actor is None else optimizer_actor
self.policy = PassThrough() if policy is None else policy
self.test_policy = PassThrough() if test_policy is None else test_policy
self.memsize = memsize
self.memory = ExperienceReplay(memsize, nsteps)
self.target_update = target_update
self.gamma = gamma
self.batch_size = batch_size
self.nsteps = nsteps
self.training = True
self.discrete = discrete
# Clone models to use for delayed Q targets
self.target_actor = tf.keras.models.clone_model(self.actor)
self.target_critic = tf.keras.models.clone_model(self.critic)
self.critic.compile(optimizer=self.optimizer_critic, loss='mse')
# To train the actor we want to maximize the critic's output (action value) given the predicted action as input
# Namely we want to change the actor's weights such that it picks the action that has the highest possible value
state_input = self.critic.input[1]
if self.discrete:
# K.argmax cannot be used since 'No Gradient Issuse'
critic_output = self.critic([K.dot(K.softmax(self.actor(state_input) * 1e5), K.variable([[0], [1]])), state_input])
# Change K.variable([[0], [1]]) to top discrete action number
else:
critic_output = self.critic([self.actor(state_input), state_input])
my_loss = -tf.keras.backend.mean(critic_output)
with my_loss.graph.as_default(): # THIS IS A WORKAROUND, SEE https://github.com/tensorflow/tensorflow/issues/26098
actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights, loss=my_loss)
self.actor_train_on_batch = tf.keras.backend.function(inputs=[state_input], outputs=[self.actor(state_input)], updates=actor_updates)
def save_weights(self, filename, overwrite=False):
"""Saves the model parameters to the specified file(s)."""
self.actor.save_weights(filename+"_actor", overwrite=overwrite)
self.critic.save_weights(filename+"_critic", overwrite=overwrite)
def act(self, state, instance=0):
"""Returns the action to be taken given a state."""
action = self.actor.predict(np.array([state]))[0]
if self.discrete:
action = np.argmax(action)
else:
action = action
if self.training:
return self.policy[instance].act(action) if isinstance(self.policy, list) else self.policy.act(action)
else:
return self.test_policy[instance].act(action) if isinstance(self.test_policy, list) else self.test_policy.act(action)
def push(self, transition, instance=0):
"""Stores the transition in memory."""
self.memory.put(transition, instance)
def train(self, step):
"""Trains the agent for one step."""
if len(self.memory) == 0:
return
# Update target network
if self.target_update >= 1 and step % self.target_update == 0:
# Perform a hard update
self.target_actor.set_weights(self.actor.get_weights())
self.target_critic.set_weights(self.critic.get_weights())
elif self.target_update < 1:
# Perform a soft update
a_w = np.array(self.actor.get_weights())
ta_w = np.array(self.target_actor.get_weights())
self.target_actor.set_weights(self.target_update*a_w + (1-self.target_update)*ta_w)
c_w = np.array(self.critic.get_weights())
tc_w = np.array(self.target_critic.get_weights())
self.target_critic.set_weights(self.target_update*c_w + (1-self.target_update)*tc_w)
# Train even when memory has fewer than the specified batch_size
batch_size = min(len(self.memory), self.batch_size)
# Sample from memory (experience replay)
state_batch, action_batch, reward_batches, end_state_batch, not_done_mask = self.memory.get(batch_size)
# Compute the value of the last next states
target_qvals = np.zeros(batch_size)
non_final_last_next_states = [es for es in end_state_batch if es is not None]
if len(non_final_last_next_states) > 0:
non_final_mask = list(map(lambda s: s is not None, end_state_batch))
target_actions = self.target_actor.predict_on_batch(np.array(non_final_last_next_states))
if self.discrete:
target_actions = np.argmax(target_actions, 1).astype('float32')
else:
target_actions = target_actions
target_qvals[non_final_mask] = self.target_critic.predict_on_batch([target_actions, np.array(non_final_last_next_states)]).squeeze()
# Compute n-step discounted return
# If episode ended within any sampled nstep trace - zero out remaining rewards
for n in reversed(range(self.nsteps)):
rewards = np.array([b[n] for b in reward_batches])
target_qvals *= np.array([t[n] for t in not_done_mask])
target_qvals = rewards + (self.gamma * target_qvals)
self.critic.train_on_batch([np.array(action_batch), np.array(state_batch)], target_qvals)
self.actor_train_on_batch([np.array(state_batch)])
Any idea to improve this? Thanks!
What about REINFORCE algorithm?
I probably simply do not see the function, but there does not seem to be a way to close the visualized plots once they are open. I attempted using:
plt.close()
plt.close('all')
but neither worked. Also simply attempting to close the window by clicking had no effect. The only way I could close the windows was by closing the entire workspace/killing the process.
Let me know what the proper way is!
Are you taking pull requests for extending some of the classes with other environments? I really like this framework and have been getting it to work with habit-api https://github.com/facebookresearch/habitat-api (I had tried with another but it was too difficult) and the sim.train I don't think works as is, I've written my own train() loop and it seems like it works but there are a bunch of nuances with habitat-api it seems (like you can't have multiple instances it seems due to an opengl thing? Im still not sure as trying to understand both frameworks and habitat-api/habitat-sim is pretty extensive) What I have right now is something like:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
import huskarl as hk
import habitat
class SimpleRLEnv(habitat.RLEnv):
def get_reward_range(self):
return [-1, 1]
def get_reward(self, observations):
return 0
def get_done(self, observations):
return self.habitat_env.episode_over
def get_info(self, observations):
return self.habitat_env.get_metrics()
config = habitat.get_config(config_paths="configs/tasks/pointnav.yaml")
create_env = lambda: SimpleRLEnv(config=config).unwrapped
dummy_env = create_env()
obs = dummy_env.observation_space.sample()
input_shape = obs["rgb"].shape
action_space_n = dummy_env.action_space.n
dummy_env.close()
model = Sequential([
Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
Conv2D(16, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Dropout(0.25),
Flatten(),
Dense(16, activation='relu')
])
agent = hk.agent.DQN(model, actions=action_space_n, nsteps=2)
# These are what would need to work with habitat-api i believe
# sim = hk.Simulation(create_env, agent)
# sim.train(max_steps=30)
# sim.test(max_steps=10)
instances = 1
max_steps = 50
episode_reward_sequences = []
episode_step_sequences = []
episode_rewards = 0
envs = create_env()
states = envs.reset()
for step in range(max_steps):
# Most of this is copied from simulation._sp_train()
action = agent.act(states["rgb"])
next_state, reward, done, other_ = envs.step(action)
agent.push(hk.memory.Transition(states["rgb"], action, reward, None if done else next_state["rgb"]))
episode_rewards += reward
if done:
episode_reward_sequences.append(episode_rewards)
episode_step_sequences.append(step)
episode_rewards = 0
states = envs.reset()
else:
states = next_state
if step % 5 == 0: print(f"step is: {step} and pointgoal is: {states['pointgoal']}")
agent.train(step)
Also I think I can add PPO to the agents but wasn't working fully yet.
Could this be extended to support a distributed algorithm like Impala?
Dqn_prioritized is not multiplied by importance sampling weight in training, whether this is a problem.
the import of
from tensorflow.keras.optimizers import Adam causes:
use: from tensorflow.python.keras.optimizers instead?
`ModuleNotFoundError Traceback (most recent call last)
in
4 import tensorflow.python.keras.optimizers
5 #from tensorflow.keras.optimizers import Adam
----> 6 import huskarl as hk
7 import gym
8
~\AppData\Roaming\Python\Python37\site-packages\huskarl_init_.py in
1 from huskarl.core import HkException
2 from huskarl.simulation import Simulation
----> 3 import huskarl.agent
~\AppData\Roaming\Python\Python37\site-packages\huskarl\agent_init_.py in
----> 1 from huskarl.agent.dqn import DQN
2 from huskarl.agent.a2c import A2C
3 from huskarl.agent.ddpg import DDPG
~\AppData\Roaming\Python\Python37\site-packages\huskarl\agent\dqn.py in
1 from tensorflow.keras.layers import Dense, Lambda
2 from tensorflow.keras.models import Model
----> 3 from tensorflow.keras.optimizers import Adam
4 import tensorflow as tf
5 import numpy as np
ModuleNotFoundError: No module named 'tensorflow.keras.optimizers'
`
Line 27 in 8708d96
def unpack(traces):
"""Returns states, actions, rewards, end_states, and a mask for episode boundaries given traces."""
states = [t[0].state for t in traces]
actions = [t[0].action for t in traces]
rewards = [[e.reward for e in t] for t in traces]
end_states = [t[-1].nextState for t in traces]
not_done_mask = [[1 if n.nextState is not None else 0 for n in t] for t in traces]
return states, actions, rewards, end_states, not_done_mask
For my understanding, unpack function should unpack the traces into separated list for every elements.
But from the code, says states, it only take the first state on every buffer. and ignore all others.
I don't understand why.
I've recently installed huskral and I've come accross this error running the demo cartpole:
$ python cartpole.py
2020-02-07 17:39:02.076013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\logger.py:30: UserWarning: WARN: Box bound precision lowered
by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
2020-02-07 17:39:04.211558: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-07 17:39:04.232403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-07 17:39:04.244615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-07 17:39:04.251647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-07 17:39:04.263631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-07 17:39:04.267875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-07 17:39:04.274896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-07 17:39:04.280780: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-07 17:39:04.291507: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-07 17:39:04.296764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-07 17:39:04.299782: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-02-07 17:39:04.304124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-07 17:39:04.318416: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-07 17:39:04.321747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-07 17:39:04.328921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-07 17:39:04.332537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-07 17:39:04.335739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-02-07 17:39:04.338978: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-07 17:39:04.348008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-07 17:39:04.352672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-07 17:39:04.832276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-07 17:39:04.836232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-02-07 17:39:04.838399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-02-07 17:39:04.848517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6349 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-02-07 17:39:05.228721: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
Traceback (most recent call last):
File "cartpole.py", line 47, in <module>
sim.train(max_steps=5000, instances=instances, plot=plot_rewards)
File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\simulation.py", line 32, in train
self._sp_train(max_steps, instances, visualize, plot)
File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\simulation.py", line 66, in _sp_train
self.agent.train(step)
File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\agent\a2c.py", line 95, in train
target_qvals[non_final_mask] = self.model.predict_on_batch(np.array(non_final_last_next_states))[:,-1].squeeze()
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'squeeze'
Any idea why?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.