I'm attempting to train a new policy in the "CollisionAvoidance-v0" env using the A2C

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

env.step(action) list size about gym-collision-avoidance HOT 5 CLOSED

mit-acl commented on August 15, 2024

env.step(action) list size

from gym-collision-avoidance.

Comments (5)

mfe7 commented on August 15, 2024 1

Hi @krishna-bala this all looks nice and makes sense. I suggest you change action passed to the env in trainA2C.py to be a dictionary of actions (where the key is the id of the agent you'd like to take that random action, e.g., action = {0: [spd, heading]})

from gym-collision-avoidance.

mfe7 commented on August 15, 2024 1

I think that's essentially what we did here:

gym-collision-avoidance/gym_collision_avoidance/experiments/src/env_utils.py

Lines 45 to 52 in dceff10

 def run_episode(env, one_env): 

 total_reward = 0 

 step = 0 

 done = False 

 while not done: 

 obs, rew, done, info = env.step([None]) 

 total_reward += rew[0] 

 step += 1

from gym-collision-avoidance.

krishna-bala commented on August 15, 2024

Hi @mfe7 -- I just noticed that mistake as well after comparing code to example.py.

I modified my code to instantiate actions as a dict, which solves the error from self._take_action(actions,dt) in collision_avoidance_env.py. Now, an action list of [delta heading angle, speed] should get passed to agent.policy.external_action_to_action().

However, the VecEnv wrapper around the CollisionAvoidanceEnv (object env from the method create_env() ) calls the step() method from vec_env.py from the baselines library.

This vec_env.py step() method calls step_async() and step_wait() from dummy_vec_env.py (baselines library).

In the method step_wait(), the action dict gets converted to a list before calling env.step() (the method defined in collision_avoidance_env.py). See code below.

def step_wait(self):
        print('step_wait() method')
        for e in range(self.num_envs):
            action = self.actions[e]
            # if isinstance(self.envs[e].action_space, spaces.Discrete):
            #    action = int(action)
            print('action from step_wait() method: {}'.format(action))
            obs, self.buf_rews[e], self.buf_dones[e], self.buf_infos[e] = self.envs[e].step(action)
            if self.buf_dones[e]:
                obs = self.envs[e].reset()
            self._save_obs(e, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
                self.buf_infos.copy())

The one_env (unwrapped) environment doesn't have this issue where the action is converted from a dict to a list before being called. Should I nest a dict inside of the actions dict when using the VecEnv wrapper? e.g. action = {0: {0: [spd, heading]}} ?

Here is the random agent that works properly on the one_env environment.

import os
os.environ['GYM_CONFIG_CLASS'] = 'Train'

import gym
from gym_collision_avoidance.envs import Config
import gym_collision_avoidance.envs.test_cases as tc
from gym_collision_avoidance.experiments.src.env_utils import create_env
from stable_baselines.common.policies import MlpPolicy
#from stable_baselines.common import make_vec_env
from stable_baselines import A2C
from stable_baselines.common.env_checker import check_env


# env: a VecEnv wrapper around the CollisionAvoidanceEnv
# one_env: an actual CollisionAvoidanceEnv class (the unwrapped version of the first env in the VecEnv)

env, one_env = create_env()

# check_env(env, warn=True)
# model = A2C(MlpPolicy, env, verbose=1)
# model.learn(total_timesteps=1000)

# The reset method is called at the beginning of an episode
obs = one_env.reset()

num_episodes = 1000

for i in range(num_episodes):
    actions = {}
    actions[0] = one_env.action_space.sample()
    
    obs, reward, done, info = one_env.step(actions)
    if done:
        obs = one_env.reset()

from gym-collision-avoidance.

mfe7 commented on August 15, 2024

Hmm is it possible that you can send a list of dicts (e.g., actions = [{0: [spd, heading]}]) to env.step, because the VecEnv wrapper you posted there would then grab the first element of that list and send it to environment 0.

from gym-collision-avoidance.

krishna-bala commented on August 15, 2024

Yes that works! Thank you.

from gym-collision-avoidance.

env.step(action) list size about gym-collision-avoidance HOT 5 CLOSED

Comments (5)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def run_episode(env, one_env):
	total_reward = 0
	step = 0
	done = False
	while not done:
	obs, rew, done, info = env.step([None])
	total_reward += rew[0]
	step += 1