Git Product home page Git Product logo

nn_dynamics's People

Contributors

anagabandi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nn_dynamics's Issues

Which line of code use dt_steps for swimmer forward training?

Hi Anusha,

Thanks for sharing the code. I have another question about dt_steps. You mentioned in your paper to better learn the dynamics, you have to use longer timesteps to collect data.

It seems dt_steps is only used when you are trying to render the environment with a sleep time dt_steps*dt_from_xml. But during training data collection, you just perform regular self.env.step() which does not include dt_steps in it. Do you have a modified version of environments or do you just use regular rllab mujoco envs?

Thanks,
Harry

License

Hello!

I was wondering which License this work is published under? I have tried looking around in the repository here on GitHub but was unable to find any information regarding this. Thanks in advance!

Best Regards,

'NormalizedEnv' object has no attribute 'model'

when I run the main.py, something wrong

(rllab3) jatq@ubuntu:~/Downloads/nn_dynamics-master$
python main.py --seed=0 --run_num=0 --yaml_file='cheetah_forward'

#####################################
Initializing environment
#####################################

Traceback (most recent call last):
File "main.py", line 831, in
main()
File "main.py", line 197, in main
env, dt_from_xml= create_env(which_agent)
File "/home/jatq/Downloads/nn_dynamics-master/helper_funcs.py", line 66, in create_env
dt_from_xml = env.model.opt.timestep
AttributeError: 'NormalizedEnv' object has no attribute 'model'

then I found the function create_env

def create_env(which_agent):

# setup environment
if(which_agent==0):
    env = normalize(PointEnv())
elif(which_agent==1):
    env = normalize(AntEnv())
elif(which_agent==2):
    env = normalize(SwimmerEnv()) #dt 0.001 and frameskip=150
elif(which_agent==3):
    env = ReacherEnv() 
elif(which_agent==4):
    env = normalize(HalfCheetahEnv())
elif(which_agent==5):
    env = RoachEnv() #this is a personal vrep env
elif(which_agent==6):
    env=normalize(HopperEnv())
elif(which_agent==7):
    env=normalize(Walker2DEnv())

#get dt value from env
if(which_agent==5):
    dt_from_xml = env.VREP_DT
else:
    dt_from_xml = env.model.opt.timestep
print("\n\n the dt is: ", dt_from_xml, "\n\n")

#set vars
tf.set_random_seed(2)
gym.logger.setLevel(gym.logging.WARNING)
dimO = env.observation_space.shape
dimA = env.action_space.shape
print ('--------------------------------- \nState space dimension: ', dimO)
print ('Action space dimension: ', dimA, "\n -----------------------------------")

return env, dt_from_xml

into the normalize

class NormalizedEnv(ProxyEnv, Serializable):
def init(
self,
env,
scale_reward=1.,
normalize_obs=False,
normalize_reward=False,
obs_alpha=0.001,
reward_alpha=0.001,
):
Serializable.quick_init(self, locals())
ProxyEnv.init(self, env)
self._scale_reward = scale_reward
self._normalize_obs = normalize_obs
self._normalize_reward = normalize_reward
self._obs_alpha = obs_alpha
self._obs_mean = np.zeros(env.observation_space.flat_dim)
self._obs_var = np.ones(env.observation_space.flat_dim)
self._reward_alpha = reward_alpha
self._reward_mean = 0.
self._reward_var = 1.

def _update_obs_estimate(self, obs):
    flat_obs = self.wrapped_env.observation_space.flatten(obs)
    self._obs_mean = (1 - self._obs_alpha) * self._obs_mean + self._obs_alpha * flat_obs
    self._obs_var = (1 - self._obs_alpha) * self._obs_var + self._obs_alpha * np.square(flat_obs - self._obs_mean)

def _update_reward_estimate(self, reward):
    self._reward_mean = (1 - self._reward_alpha) * self._reward_mean + self._reward_alpha * reward
    self._reward_var = (1 - self._reward_alpha) * self._reward_var + self._reward_alpha * np.square(reward -
                                                                                                    self._reward_mean)

def _apply_normalize_obs(self, obs):
    self._update_obs_estimate(obs)
    return (obs - self._obs_mean) / (np.sqrt(self._obs_var) + 1e-8)

def _apply_normalize_reward(self, reward):
    self._update_reward_estimate(reward)
    return reward / (np.sqrt(self._reward_var) + 1e-8)

def reset(self):
    ret = self._wrapped_env.reset()
    if self._normalize_obs:
        return self._apply_normalize_obs(ret)
    else:
        return ret

def __getstate__(self):
    d = Serializable.__getstate__(self)
    d["_obs_mean"] = self._obs_mean
    d["_obs_var"] = self._obs_var
    return d

def __setstate__(self, d):
    Serializable.__setstate__(self, d)
    self._obs_mean = d["_obs_mean"]
    self._obs_var = d["_obs_var"]

@property
@overrides
def action_space(self):
    if isinstance(self._wrapped_env.action_space, Box):
        ub = np.ones(self._wrapped_env.action_space.shape)
        return spaces.Box(-1 * ub, ub)
    return self._wrapped_env.action_space

@overrides
def step(self, action):
    if isinstance(self._wrapped_env.action_space, Box):
        # rescale the action
        lb, ub = self._wrapped_env.action_space.bounds
        scaled_action = lb + (action + 1.) * 0.5 * (ub - lb)
        scaled_action = np.clip(scaled_action, lb, ub)
    else:
        scaled_action = action
    wrapped_step = self._wrapped_env.step(scaled_action)
    next_obs, reward, done, info = wrapped_step
    if self._normalize_obs:
        next_obs = self._apply_normalize_obs(next_obs)
    if self._normalize_reward:
        reward = self._apply_normalize_reward(reward)
    return Step(next_obs, reward * self._scale_reward, done, **info)

def __str__(self):
    return "Normalized: %s" % self._wrapped_env

# def log_diagnostics(self, paths):
#     print "Obs mean:", self._obs_mean
#     print "Obs std:", np.sqrt(self._obs_var)
#     print "Reward mean:", self._reward_mean
#     print "Reward std:", np.sqrt(self._reward_var)

normalize = NormalizedEnv

and I found that there is no attr named model,did I do something wrong?

Plugging in Roboschool envs

Hi Anusha,

I wanted to plug in Roboschool environments and try out your code. Could you please suggest where i could make changes.
I am currently trying to modify helper_funcs.py to make way for roboschool envs.
Please suggest where else changes should be made. I am trying to run the cheetah_forward module that you have.

Regards,

Rohan

Why different noise in training and evaluating?

Hi Anusha,
I have another question for you. Why do you use different noise in terms of scale and distribution for swimmer?My observation is that you used Uniform(-0.1, 0.1) as the evaluating noise and Normal(0, 0.01) as the training variance.
I checked the rllab version: they used Normal(0, 0.01) for qpos, Normal(0, 0.1) for qvel.
The gym version: they used Uniform(-0.1, 0.1) for both qpos and qvel.

So can I assume the paramater 'evaluating' means 'using gym version'? So in your main.py, you are using rllab version to collect data with collectsamples.py and then use gym version to perform mpc rollout?

I am a little bit confused..Hope that you can help :)

About AntEnv modifications

I have one more question...
I recently read your AntEnv modification and am not sure that I understand the intuition. So you eliminate the contact frictional cost and slightly modified the condition whether an agent is done or not done. (from 0.2-1.0 to 0.3-1.0)
Is this some important modification for the experiment to work or is it just trivial modification? It would be really nice if you can have a list of modifications in rllab and explain the intuition behind them.

normalized env normalize default value

Hi Anusha,
I think you accidentally set the 'need_heading_diff' to be true in NormalizedEnv. So when you are running collect_samples and enable swimmer, you will always get different headings no matter 'follow_trajectories' is true or false.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.