danaugrs / huskarl Goto Github PK

Deep Reinforcement Learning Framework + Algorithms

Home Page: https://medium.com/@tensorflow/introducing-huskarl-the-modular-deep-reinforcement-learning-framework-e47d4b228dd3

License: MIT License

Python 100.00%

algorithms artificial-intelligence deep-learning python reinforcement-learning tensorflow

huskarl's People

Contributors

Stargazers

Watchers

huskarl's Issues

Prioritized Experience Replay

Hi, great work!

Will you plan to add prioritized replay in short time? For example, I found great implementation: https://github.com/alexbooth/DDQN-PER/blob/master/replay_memory.py

Curiosity-Driven Exploration

What is SOTA in Curiosity-Driven Exploration currently? Will you plan implement that or simple version?

OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.

I got this error when running ddpg_pendulum.py:

OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. 
That is dangerous, since it can degrade performance or cause incorrect results. 
The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. 
As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. 
For more information, please see http://openmp.llvm.org/
Abort trap: 6

Should I just bypass it by using the unsupported workaround?

Will Proximal Policy Optimization (PPO) be released soon?

When will PPO be released?

load_weights to continue training

Hi, im not sure if i've missed it maybe, but is there any function to restore the saved weights to continue training or just to be able to run a test based on the saved weights so that we don't have to run train every time. Im thinking along the lines of the 'load_weights' function from tensorflow. Thanks !

Soft Actor Critic (SAC) algorithm

Add the recent Soft Actor Critic algorithm to the list of supported agents. As described in https://arxiv.org/abs/1801.01290.

Are LSTM networks supported?

Are LSTM networks supported by A2C agent? If so, can you provide a small sample?

DDPG on CartPole-v0

Hi! Thanks for your great work. I modified DDPG.py and want to implement discrete control in “CartPole-v0” environment. But the model learned nothing. I believe the reason behind this is there is no exploration noise in this code. The main changes is to tf.argmax(actor_action) but avoid "No gradient" issue. My code is as followed:

from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np

from huskarl.policy import PassThrough
from huskarl.core import Agent
from huskarl.memory import ExperienceReplay
import tensorflow.keras.backend as K

class DDPG(Agent):
	"""Deep Deterministic Policy Gradient

	"Continuous control with deep reinforcement learning" (Lillicrap et al., 2015)
	"""
	def __init__(self, actor=None, critic=None, optimizer_critic=None, optimizer_actor=None,
				 policy=None, test_policy=None, memsize=100_000, target_update=1e-3,
				 gamma=0.9, batch_size=32, nsteps=1, discrete=False):
		"""
		TODO: Describe parameters
		"""
		self.actor = actor
		self.critic = critic

		self.optimizer_critic = Adam(lr=5e-3) if optimizer_critic is None else optimizer_critic
		self.optimizer_actor = Adam(lr=5e-3) if optimizer_actor is None else optimizer_actor

		self.policy = PassThrough() if policy is None else policy
		self.test_policy = PassThrough() if test_policy is None else test_policy

		self.memsize = memsize
		self.memory = ExperienceReplay(memsize, nsteps)

		self.target_update = target_update
		self.gamma = gamma
		self.batch_size = batch_size
		self.nsteps = nsteps
		self.training = True
		self.discrete = discrete

		# Clone models to use for delayed Q targets
		self.target_actor = tf.keras.models.clone_model(self.actor)
		self.target_critic = tf.keras.models.clone_model(self.critic)

		self.critic.compile(optimizer=self.optimizer_critic, loss='mse')

		# To train the actor we want to maximize the critic's output (action value) given the predicted action as input
		# Namely we want to change the actor's weights such that it picks the action that has the highest possible value
		state_input = self.critic.input[1]
		if self.discrete:
                        # K.argmax cannot be used since 'No Gradient Issuse'
			critic_output = self.critic([K.dot(K.softmax(self.actor(state_input) * 1e5), K.variable([[0], [1]])), state_input])
                        # Change K.variable([[0], [1]]) to top discrete action number
		else:
			critic_output = self.critic([self.actor(state_input), state_input])
		my_loss = -tf.keras.backend.mean(critic_output)
		with my_loss.graph.as_default(): # THIS IS A WORKAROUND, SEE https://github.com/tensorflow/tensorflow/issues/26098
			actor_updates = self.optimizer_actor.get_updates(params=self.actor.trainable_weights, loss=my_loss)
		self.actor_train_on_batch = tf.keras.backend.function(inputs=[state_input], outputs=[self.actor(state_input)], updates=actor_updates)

	def save_weights(self, filename, overwrite=False):
		"""Saves the model parameters to the specified file(s)."""
		self.actor.save_weights(filename+"_actor", overwrite=overwrite)
		self.critic.save_weights(filename+"_critic", overwrite=overwrite)

	def act(self, state, instance=0):
		"""Returns the action to be taken given a state."""
		action = self.actor.predict(np.array([state]))[0]
		if self.discrete:
			action = np.argmax(action)
		else:
			action = action
		if self.training:
			return self.policy[instance].act(action) if isinstance(self.policy, list) else self.policy.act(action)
		else:
			return self.test_policy[instance].act(action) if isinstance(self.test_policy, list) else self.test_policy.act(action)

	def push(self, transition, instance=0):
		"""Stores the transition in memory."""
		self.memory.put(transition, instance)

	def train(self, step):
		"""Trains the agent for one step."""
		if len(self.memory) == 0:
			return

		# Update target network
		if self.target_update >= 1 and step % self.target_update == 0:
			# Perform a hard update
			self.target_actor.set_weights(self.actor.get_weights())
			self.target_critic.set_weights(self.critic.get_weights())
		elif self.target_update < 1:
			# Perform a soft update
			a_w = np.array(self.actor.get_weights())
			ta_w = np.array(self.target_actor.get_weights())
			self.target_actor.set_weights(self.target_update*a_w + (1-self.target_update)*ta_w)
			c_w = np.array(self.critic.get_weights())
			tc_w = np.array(self.target_critic.get_weights())
			self.target_critic.set_weights(self.target_update*c_w + (1-self.target_update)*tc_w)

		# Train even when memory has fewer than the specified batch_size
		batch_size = min(len(self.memory), self.batch_size)

		# Sample from memory (experience replay)
		state_batch, action_batch, reward_batches, end_state_batch, not_done_mask = self.memory.get(batch_size)

		# Compute the value of the last next states
		target_qvals = np.zeros(batch_size)
		non_final_last_next_states = [es for es in end_state_batch if es is not None]


		if len(non_final_last_next_states) > 0:
			non_final_mask = list(map(lambda s: s is not None, end_state_batch))
			target_actions = self.target_actor.predict_on_batch(np.array(non_final_last_next_states))
			if self.discrete:
				target_actions = np.argmax(target_actions, 1).astype('float32')
			else:
				target_actions = target_actions

			target_qvals[non_final_mask] = self.target_critic.predict_on_batch([target_actions, np.array(non_final_last_next_states)]).squeeze()

		# Compute n-step discounted return
		# If episode ended within any sampled nstep trace - zero out remaining rewards
		for n in reversed(range(self.nsteps)):
			rewards = np.array([b[n] for b in reward_batches])
			target_qvals *= np.array([t[n] for t in not_done_mask])
			target_qvals = rewards + (self.gamma * target_qvals)

		self.critic.train_on_batch([np.array(action_batch), np.array(state_batch)], target_qvals)
		self.actor_train_on_batch([np.array(state_batch)])

Any idea to improve this? Thanks!

REINFORCE

What about REINFORCE algorithm?

Visualization windows remain open

I probably simply do not see the function, but there does not seem to be a way to close the visualized plots once they are open. I attempted using:
plt.close()
plt.close('all')
but neither worked. Also simply attempting to close the window by clicking had no effect. The only way I could close the windows was by closing the entire workspace/killing the process.

Let me know what the proper way is!

interfacing with other RL environments

Are you taking pull requests for extending some of the classes with other environments? I really like this framework and have been getting it to work with habit-api https://github.com/facebookresearch/habitat-api (I had tried with another but it was too difficult) and the sim.train I don't think works as is, I've written my own train() loop and it seems like it works but there are a bunch of nuances with habitat-api it seems (like you can't have multiple instances it seems due to an opengl thing? Im still not sure as trying to understand both frameworks and habitat-api/habitat-sim is pretty extensive) What I have right now is something like:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten

import huskarl as hk
import habitat

class SimpleRLEnv(habitat.RLEnv):
    def get_reward_range(self):
        return [-1, 1]

    def get_reward(self, observations):
        return 0

    def get_done(self, observations):
        return self.habitat_env.episode_over

    def get_info(self, observations):
        return self.habitat_env.get_metrics()

config = habitat.get_config(config_paths="configs/tasks/pointnav.yaml")
create_env = lambda: SimpleRLEnv(config=config).unwrapped
dummy_env = create_env()

obs = dummy_env.observation_space.sample()
input_shape = obs["rgb"].shape
action_space_n = dummy_env.action_space.n
dummy_env.close()

model = Sequential([
    Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
    Conv2D(16, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Flatten(),
    Dense(16, activation='relu')
])


agent = hk.agent.DQN(model, actions=action_space_n, nsteps=2)

# These are what would need to work with habitat-api i believe
# sim = hk.Simulation(create_env, agent)
# sim.train(max_steps=30)
# sim.test(max_steps=10)

instances = 1
max_steps = 50

episode_reward_sequences = []
episode_step_sequences = []
episode_rewards = 0

envs = create_env()
states = envs.reset()


for step in range(max_steps):
    # Most of this is copied from simulation._sp_train()
    action = agent.act(states["rgb"])
    next_state, reward, done, other_ = envs.step(action)
    agent.push(hk.memory.Transition(states["rgb"], action, reward, None if done else next_state["rgb"]))
    episode_rewards += reward

    if done:
        episode_reward_sequences.append(episode_rewards)
        episode_step_sequences.append(step)
        episode_rewards = 0
        states = envs.reset()
    else:
        states = next_state
    if step % 5 == 0: print(f"step is: {step} and pointgoal is: {states['pointgoal']}")
    agent.train(step)

Also I think I can add PPO to the agents but wasn't working fully yet.

Impala

Could this be extended to support a distributed algorithm like Impala?

Dqn_prioritized is not multiplied by importance sampling weight in training

Dqn_prioritized is not multiplied by importance sampling weight in training, whether this is a problem.

No module named tensorflow.keras.optimizers

the import of
from tensorflow.keras.optimizers import Adam causes:

use: from tensorflow.python.keras.optimizers instead?

`ModuleNotFoundError Traceback (most recent call last)
in
4 import tensorflow.python.keras.optimizers
5 #from tensorflow.keras.optimizers import Adam
----> 6 import huskarl as hk
7 import gym
8

~\AppData\Roaming\Python\Python37\site-packages\huskarl_init_.py in
1 from huskarl.core import HkException
2 from huskarl.simulation import Simulation
----> 3 import huskarl.agent

~\AppData\Roaming\Python\Python37\site-packages\huskarl\agent_init_.py in
----> 1 from huskarl.agent.dqn import DQN
2 from huskarl.agent.a2c import A2C
3 from huskarl.agent.ddpg import DDPG

~\AppData\Roaming\Python\Python37\site-packages\huskarl\agent\dqn.py in
1 from tensorflow.keras.layers import Dense, Lambda
2 from tensorflow.keras.models import Model
----> 3 from tensorflow.keras.optimizers import Adam
4 import tensorflow as tf
5 import numpy as np

ModuleNotFoundError: No module named 'tensorflow.keras.optimizers'
`

weird unpack

huskarl/huskarl/memory.py

Line 27 in 8708d96

def unpack(traces):

def unpack(traces):
    """Returns states, actions, rewards, end_states, and a mask for episode boundaries given traces."""
    states = [t[0].state for t in traces]
    actions = [t[0].action for t in traces]
    rewards = [[e.reward for e in t] for t in traces]
    end_states = [t[-1].nextState for t in traces]
    not_done_mask = [[1 if n.nextState is not None else 0 for n in t] for t in traces]
    return states, actions, rewards, end_states, not_done_mask

For my understanding, unpack function should unpack the traces into separated list for every elements.
But from the code, says states, it only take the first state on every buffer. and ignore all others.
I don't understand why.

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'squeeze'

I've recently installed huskral and I've come accross this error running the demo cartpole:

$ python cartpole.py
2020-02-07 17:39:02.076013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\gym\logger.py:30: UserWarning: WARN: Box bound precision lowered 
by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
2020-02-07 17:39:04.211558: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-02-07 17:39:04.232403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-07 17:39:04.244615: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-02-07 17:39:04.251647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-02-07 17:39:04.263631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-07 17:39:04.267875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-02-07 17:39:04.274896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll      
2020-02-07 17:39:04.280780: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-02-07 17:39:04.291507: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-07 17:39:04.296764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-07 17:39:04.299782: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-02-07 17:39:04.304124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.8475GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-07 17:39:04.318416: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll       
2020-02-07 17:39:04.321747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll        
2020-02-07 17:39:04.328921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-02-07 17:39:04.332537: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll        
2020-02-07 17:39:04.335739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll      
2020-02-07 17:39:04.338978: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll      
2020-02-07 17:39:04.348008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-02-07 17:39:04.352672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-07 17:39:04.832276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:   
2020-02-07 17:39:04.836232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-02-07 17:39:04.838399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-02-07 17:39:04.848517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6349 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-02-07 17:39:05.228721: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll        
Traceback (most recent call last):
  File "cartpole.py", line 47, in <module>
    sim.train(max_steps=5000, instances=instances, plot=plot_rewards)
  File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\simulation.py", line 32, in train
    self._sp_train(max_steps, instances, visualize, plot)
  File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\simulation.py", line 66, in _sp_train
    self.agent.train(step)
  File "C:\Users\Project Apollo\AppData\Local\Programs\Python\Python37\lib\site-packages\huskarl\agent\a2c.py", line 95, in train
    target_qvals[non_final_mask] = self.model.predict_on_batch(np.array(non_final_last_next_states))[:,-1].squeeze()
AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'squeeze'

Any idea why?

danaugrs / huskarl Goto Github PK

huskarl's People

Contributors

Stargazers

Watchers

Forkers

huskarl's Issues

Recommend Projects

Recommend Topics

Recommend Org