anagabandi / nn_dynamics Goto Github PK

View Code? Open in Web Editor NEW

332.0 332.0 118.0 8.11 MB

Python 14.62% Jupyter Notebook 84.67% Shell 0.71%

nn_dynamics's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 meelement amoliu rafagalvez rvaughan qicny gbyfbi hbcbh1999 isagul ideaplexus allen3young codeaudit snk0752 miketam1021 hal2001 keithsw liujiandu w1368027790 panpanyunshi smilejx daghty thinkronize emigmo 1165048017 nyangs andelnegro6 zhexiaozhe capri2014 mehdimashayekhi tpzjj612 buanderie joe1234321joe richardkelley juliuskunze xueeinstein ml-lab asbroad layne-wang sluo1989 aashiqmuhamed itakeshi tmwangcas ywen666 leminhkieu bemoregt darren-huang afcarl williamd4112 jzinsa haochihlin benbenji osrlab chennnnnyize-zz tcjcxy30 diyano gchal fedorajzf qhuang-pnl wissamakretche yungkc zadaianchuk xieliang555 zhaoxiukai litianyang2017 robot0102 w0617 tianshenglongdi chc93b sg774 bzp92 z4z5 lumings arita37 llsllslls ageliss xrosliang fengyongfei chenmeng0508 ameth64 eriwdz xu-yanran halesmith guoyaq chaosshi flyinskybtx fm-he dpduanpu prajwalthakur handsomebbb laozhanger lyltc1 mrcongcong xuejianyong wei-tianhao goldexcalibur soumyarani qtsho sayani21 akashratheesh ch-meijie

nn_dynamics's Issues

Which line of code use dt_steps for swimmer forward training?

Hi Anusha,

Thanks for sharing the code. I have another question about dt_steps. You mentioned in your paper to better learn the dynamics, you have to use longer timesteps to collect data.

It seems dt_steps is only used when you are trying to render the environment with a sleep time dt_steps*dt_from_xml. But during training data collection, you just perform regular self.env.step() which does not include dt_steps in it. Do you have a modified version of environments or do you just use regular rllab mujoco envs?

Thanks,
Harry

License

Hello!

I was wondering which License this work is published under? I have tried looking around in the repository here on GitHub but was unable to find any information regarding this. Thanks in advance!

Best Regards,

In-repo .md links in readme.md doesn't work

Installation.md and notes.md's link is not correct in readme.md. I couldn't open a pull request, so here is the correct version:

For installation guide, go to installation.md
For notes on how to use your own environment, how to edit envs, etc. go to notes.md

'NormalizedEnv' object has no attribute 'model'

when I run the main.py, something wrong

(rllab3) jatq@ubuntu:~/Downloads/nn_dynamics-master$
python main.py --seed=0 --run_num=0 --yaml_file='cheetah_forward'

#####################################
Initializing environment
#####################################

Traceback (most recent call last):
File "main.py", line 831, in
main()
File "main.py", line 197, in main
env, dt_from_xml= create_env(which_agent)
File "/home/jatq/Downloads/nn_dynamics-master/helper_funcs.py", line 66, in create_env
dt_from_xml = env.model.opt.timestep
AttributeError: 'NormalizedEnv' object has no attribute 'model'

then I found the function create_env

def create_env(which_agent):

# setup environment
if(which_agent==0):
    env = normalize(PointEnv())
elif(which_agent==1):
    env = normalize(AntEnv())
elif(which_agent==2):
    env = normalize(SwimmerEnv()) #dt 0.001 and frameskip=150
elif(which_agent==3):
    env = ReacherEnv() 
elif(which_agent==4):
    env = normalize(HalfCheetahEnv())
elif(which_agent==5):
    env = RoachEnv() #this is a personal vrep env
elif(which_agent==6):
    env=normalize(HopperEnv())
elif(which_agent==7):
    env=normalize(Walker2DEnv())

#get dt value from env
if(which_agent==5):
    dt_from_xml = env.VREP_DT
else:
    dt_from_xml = env.model.opt.timestep
print("\n\n the dt is: ", dt_from_xml, "\n\n")

#set vars
tf.set_random_seed(2)
gym.logger.setLevel(gym.logging.WARNING)
dimO = env.observation_space.shape
dimA = env.action_space.shape
print ('--------------------------------- \nState space dimension: ', dimO)
print ('Action space dimension: ', dimA, "\n -----------------------------------")

return env, dt_from_xml

into the normalize

class NormalizedEnv(ProxyEnv, Serializable):
def init(
self,
env,
scale_reward=1.,
normalize_obs=False,
normalize_reward=False,
obs_alpha=0.001,
reward_alpha=0.001,
):
Serializable.quick_init(self, locals())
ProxyEnv.init(self, env)
self._scale_reward = scale_reward
self._normalize_obs = normalize_obs
self._normalize_reward = normalize_reward
self._obs_alpha = obs_alpha
self._obs_mean = np.zeros(env.observation_space.flat_dim)
self._obs_var = np.ones(env.observation_space.flat_dim)
self._reward_alpha = reward_alpha
self._reward_mean = 0.
self._reward_var = 1.

def _update_obs_estimate(self, obs):
    flat_obs = self.wrapped_env.observation_space.flatten(obs)
    self._obs_mean = (1 - self._obs_alpha) * self._obs_mean + self._obs_alpha * flat_obs
    self._obs_var = (1 - self._obs_alpha) * self._obs_var + self._obs_alpha * np.square(flat_obs - self._obs_mean)

def _update_reward_estimate(self, reward):
    self._reward_mean = (1 - self._reward_alpha) * self._reward_mean + self._reward_alpha * reward
    self._reward_var = (1 - self._reward_alpha) * self._reward_var + self._reward_alpha * np.square(reward -
                                                                                                    self._reward_mean)

def _apply_normalize_obs(self, obs):
    self._update_obs_estimate(obs)
    return (obs - self._obs_mean) / (np.sqrt(self._obs_var) + 1e-8)

def _apply_normalize_reward(self, reward):
    self._update_reward_estimate(reward)
    return reward / (np.sqrt(self._reward_var) + 1e-8)

def reset(self):
    ret = self._wrapped_env.reset()
    if self._normalize_obs:
        return self._apply_normalize_obs(ret)
    else:
        return ret

def __getstate__(self):
    d = Serializable.__getstate__(self)
    d["_obs_mean"] = self._obs_mean
    d["_obs_var"] = self._obs_var
    return d

def __setstate__(self, d):
    Serializable.__setstate__(self, d)
    self._obs_mean = d["_obs_mean"]
    self._obs_var = d["_obs_var"]

@property
@overrides
def action_space(self):
    if isinstance(self._wrapped_env.action_space, Box):
        ub = np.ones(self._wrapped_env.action_space.shape)
        return spaces.Box(-1 * ub, ub)
    return self._wrapped_env.action_space

@overrides
def step(self, action):
    if isinstance(self._wrapped_env.action_space, Box):
        # rescale the action
        lb, ub = self._wrapped_env.action_space.bounds
        scaled_action = lb + (action + 1.) * 0.5 * (ub - lb)
        scaled_action = np.clip(scaled_action, lb, ub)
    else:
        scaled_action = action
    wrapped_step = self._wrapped_env.step(scaled_action)
    next_obs, reward, done, info = wrapped_step
    if self._normalize_obs:
        next_obs = self._apply_normalize_obs(next_obs)
    if self._normalize_reward:
        reward = self._apply_normalize_reward(reward)
    return Step(next_obs, reward * self._scale_reward, done, **info)

def __str__(self):
    return "Normalized: %s" % self._wrapped_env

# def log_diagnostics(self, paths):
#     print "Obs mean:", self._obs_mean
#     print "Obs std:", np.sqrt(self._obs_var)
#     print "Reward mean:", self._reward_mean
#     print "Reward std:", np.sqrt(self._reward_var)

normalize = NormalizedEnv

and I found that there is no attr named model,did I do something wrong?

Plugging in Roboschool envs

Hi Anusha,

I wanted to plug in Roboschool environments and try out your code. Could you please suggest where i could make changes.
I am currently trying to modify helper_funcs.py to make way for roboschool envs.
Please suggest where else changes should be made. I am trying to run the cheetah_forward module that you have.

Regards,

Rohan

Swimmer Yaml file?

Is it a typo that you use swimmer_forward.yaml for all the scripts?

Why different noise in training and evaluating?

Hi Anusha,
I have another question for you. Why do you use different noise in terms of scale and distribution for swimmer？My observation is that you used Uniform(-0.1, 0.1) as the evaluating noise and Normal(0, 0.01) as the training variance.
I checked the rllab version: they used Normal(0, 0.01) for qpos, Normal(0, 0.1) for qvel.
The gym version: they used Uniform(-0.1, 0.1) for both qpos and qvel.

So can I assume the paramater 'evaluating' means 'using gym version'? So in your main.py, you are using rllab version to collect data with collectsamples.py and then use gym version to perform mpc rollout?

I am a little bit confused..Hope that you can help :)

About AntEnv modifications

I have one more question...
I recently read your AntEnv modification and am not sure that I understand the intuition. So you eliminate the contact frictional cost and slightly modified the condition whether an agent is done or not done. (from 0.2-1.0 to 0.3-1.0)
Is this some important modification for the experiment to work or is it just trivial modification? It would be really nice if you can have a list of modifications in rllab and explain the intuition behind them.

normalized env normalize default value

Hi Anusha,
I think you accidentally set the 'need_heading_diff' to be true in NormalizedEnv. So when you are running collect_samples and enable swimmer, you will always get different headings no matter 'follow_trajectories' is true or false.

What's the difference between observation (states) and starting states?

In your main.py, observations (states) is used to train the dyn_model. But there are other states which are the "starting states" that is not used. For example, in a swimmer env, states are 16-dimensional vectors; however, starting states have 17 dims.

anagabandi / nn_dynamics Goto Github PK

nn_dynamics's People

Contributors

Stargazers

Watchers

Forkers

nn_dynamics's Issues

Which line of code use dt_steps for swimmer forward training?

License

In-repo .md links in readme.md doesn't work

'NormalizedEnv' object has no attribute 'model'

when I run the main.py, something wrong

Plugging in Roboschool envs

Swimmer Yaml file?

Why different noise in training and evaluating?

About AntEnv modifications

normalized env normalize default value

What's the difference between observation (states) and starting states?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent