robodhruv / visualnav-transformer Goto Github PK

Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.

Home Page: http://general-navigation-models.github.io

License: MIT License

Python 97.63% Shell 2.37%

visualnav-transformer's Introduction

General Navigation Models: GNM, ViNT and NoMaD

Contributors: Dhruv Shah, Ajay Sridhar, Nitish Dashora, Catherine Glossop, Kyle Stachowicz, Arjun Bhorkar, Kevin Black, Noriaki Hirose, Sergey Levine

Berkeley AI Research

Project Page | Citing | Pre-Trained Models

General Navigation Models are general-purpose goal-conditioned visual navigation policies trained on diverse, cross-embodiment training data, and can control many different robots in zero-shot. They can also be efficiently fine-tuned, or adapted, to new robots and downstream tasks. Our family of models is described in the following research papers (and growing):

GNM: A General Navigation Model to Drive Any Robot (October 2022, presented at ICRA 2023)
ViNT: A Foundation Model for Visual Navigation (June 2023, presented at CoRL 2023)
NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration (October 2023)

Overview

This repository contains code for training our family of models with your own data, pre-trained model checkpoints, as well as example code to deploy it on a TurtleBot2/LoCoBot robot. The repository follows the organization from GNM.

./train/train.py: training script to train or fine-tune the ViNT model on your custom data.
./train/vint_train/models/: contains model files for GNM, ViNT, and some baselines.
./train/process_*.py: scripts to process rosbags or other formats of robot trajectories into training data.
./deployment/src/record_bag.sh: script to collect a demo trajectory as a ROS bag in the target environment on the robot. This trajectory is subsampled to generate a topological graph of the environment.
./deployment/src/create_topomap.sh: script to convert a ROS bag of a demo trajectory into a topological graph that the robot can use to navigate.
./deployment/src/navigate.sh: script that deploys a trained GNM/ViNT/NoMaD model on the robot to navigate to a desired goal in the generated topological graph. Please see relevant sections below for configuration settings.
./deployment/src/explore.sh: script that deploys a trained NoMaD model on the robot to randomly explore its environment. Please see relevant sections below for configuration settings.

Train

This subfolder contains code for processing datasets and training models from your own data.

Pre-requisites

The codebase assumes access to a workstation running Ubuntu (tested on 18.04 and 20.04), Python 3.7+, and a GPU with CUDA 10+. It also assumes access to conda, but you can modify it to work with other virtual environment packages, or a native setup.

Setup

Run the commands below inside the vint_release/ (topmost) directory:

Set up the conda environment:

conda env create -f train/train_environment.yml

Source the conda environment:
```
conda activate vint_train
```
Install the vint_train packages:
```
pip install -e train/
```

Install the diffusion_policy package from this repo:

git clone [email protected]:real-stanford/diffusion_policy.git
pip install -e diffusion_policy/

Data-Wrangling

In the papers, we train on a combination of publicly available and unreleased datasets. Below is a list of publicly available datasets used for training; please contact the respective authors for access to the unreleased data.

We recommend you to download these (and any other datasets you may want to train on) and run the processing steps below.

Data Processing

We provide some sample scripts to process these datasets, either directly from a rosbag or from a custom format like HDF5s:

Run process_bags.py with the relevant args, or process_recon.py for processing RECON HDF5s. You can also manually add your own dataset by following our structure below (if you are adding a custom dataset, please checkout the Custom Datasets section).
Run data_split.py on your dataset folder with the relevant args.

After step 1 of data processing, the processed dataset should have the following structure:

├── <dataset_name>
│   ├── <name_of_traj1>
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   ├── ...
│   │   ├── T_1.jpg
│   │   └── traj_data.pkl
│   ├── <name_of_traj2>
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   ├── ...
│   │   ├── T_2.jpg
│   │   └── traj_data.pkl
│   ...
└── └── <name_of_trajN>
    	├── 0.jpg
    	├── 1.jpg
    	├── ...
        ├── T_N.jpg
        └── traj_data.pkl

Each *.jpg file contains an forward-facing RGB observation from the robot, and they are temporally labeled. The traj_data.pkl file is the odometry data for the trajectory. It’s a pickled dictionary with the keys:

"position": An np.ndarray [T, 2] of the xy-coordinates of the robot at each image observation.
"yaw": An np.ndarray [T,] of the yaws of the robot at each image observation.

After step 2 of data processing, the processed data-split should the following structure inside vint_release/train/vint_train/data/data_splits/:

├── <dataset_name>
│   ├── train
|   |   └── traj_names.txt
└── └── test
        └── traj_names.txt

Training your General Navigation Models

Run this inside the vint_release/train directory:

python train.py -c <path_of_train_config_file>

The premade config yaml files are in the train/config directory.

Custom Config Files

You can use one of the premade yaml files as a starting point and change the values as you need. config/vint.yaml is good choice since it has commented arguments. config/defaults.yaml contains the default config values (don't directly train with this config file since it does not specify any datasets for training).

Custom Datasets

Make sure your dataset and data-split directory follows the structures provided in the Data Processing section. Locate train/vint_train/data/data_config.yaml and append the following:

<dataset_name>:
    metric_waypoints_distance: <average_distance_in_meters_between_waypoints_in_the_dataset>

Locate your training config file and add the following text under the datasets argument (feel free to change the values of end_slack, goals_per_obs, and negative_mining):

<dataset_name>:
    data_folder: <path_to_the_dataset>
    train: data/data_splits/<dataset_name>/train/ 
    test: data/data_splits/<dataset_name>/test/ 
    end_slack: 0 # how many timesteps to cut off from the end of each trajectory  (in case many trajectories end in collisions)
    goals_per_obs: 1 # how many goals are sampled per observation
    negative_mining: True # negative mining from the ViNG paper (Shah et al.)

Training your model from a checkpoint

Instead of training from scratch, you can also load an existing checkpoint from the published results. Add load_run: <project_name>/<log_run_name>to your .yaml config file in vint_release/train/config/. The *.pth of the file you are loading to be saved in this file structure and renamed to “latest”: vint_release/train/logs/<project_name>/<log_run_name>/latest.pth. This makes it easy to train from the checkpoint of a previous run since logs are saved this way by default. Note: if you are loading a checkpoint from a previous run, check for the name the run in the vint_release/train/logs/<project_name>/, since the code appends a string of the date to each run_name specified in the config yaml file of the run to avoid duplicate run names.

If you want to use our checkpoints, you can download the *.pth files from this link.

Deployment

This subfolder contains code to load a pre-trained ViNT and deploy it on the open-source LoCoBot indoor robot platform with a NVIDIA Jetson Orin Nano. It can be easily adapted to be run on alternate robots, and researchers have been able to independently deploy it on the following robots – Clearpath Jackal, DJI Tello, Unitree A1, TurtleBot2, Vizbot – and in simulated environments like CARLA.

LoCoBot Setup

This software was tested on a LoCoBot running Ubuntu 20.04.

Software Installation (in this order)

ROS: ros-noetic

ROS packages:

sudo apt-get install ros-noetic-usb-cam ros-noetic-joy

kobuki
Conda
- Install anaconda/miniconda/etc. for managing environments
- Make conda env with environment.yml (run this inside the vint_release/ directory)
```
conda env create -f deployment/deployment_environment.yaml
```
- Source env
```
conda activate vint_deployment
```
- (Recommended) add to ~/.bashrc:
```
echo “conda activate vint_deployment” >> ~/.bashrc 
```
Install the vint_train packages (run this inside the vint_release/ directory):
```
pip install -e train/
```

Install the diffusion_policy package from this repo:

git clone [email protected]:real-stanford/diffusion_policy.git
pip install -e diffusion_policy/

(Recommended) Install tmux if not present. Many of the bash scripts rely on tmux to launch multiple screens with different commands. This will be useful for debugging because you can see the output of each screen.

Hardware Requirements

LoCoBot: http://locobot.org (just the navigation stack)
A wide-angle RGB camera: Example. The vint_locobot.launch file uses camera parameters that work with cameras like the ELP fisheye wide angle, feel free to modify to your own. Adjust the camera parameters in vint_release/deployment/config/camera.yaml your camera accordingly (used for visualization).
Joystick/keyboard teleop that works with Linux. Add the index mapping for the deadman_switch on the joystick to the vint_release/deployment/config/joystick.yaml. You can find the mapping from buttons to indices for common joysticks in the wiki.

Loading the model weights

Save the model weights *.pth file in vint_release/deployment/model_weights folder. Our model's weights are in this link.

Collecting a Topological Map

Make sure to run these scripts inside the vint_release/deployment/src/ directory.

This section discusses a simple way to create a topological map of the target environment for deployment. For simplicity, we will use the robot in “path-following” mode, i.e. given a single trajectory in an environment, the task is to follow the same trajectory to the goal. The environment may have new/dynamic obstacles, lighting variations etc.

Record the rosbag:

./record_bag.sh <bag_name>

Run this command to teleoperate the robot with the joystick and camera. This command opens up three windows

roslaunch vint_locobot.launch: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and nodes for the robot’s mobile base.
python joy_teleop.py: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.
rosbag record /usb_cam/image_raw -o <bag_name>: This command isn’t run immediately (you have to click Enter). It will be run in the vint_release/deployment/topomaps/bags directory, where we recommend you store your rosbags.

Once you are ready to record the bag, run the rosbag record script and teleoperate the robot on the map you want the robot to follow. When you are finished with recording the path, kill the rosbag record command, and then kill the tmux session.

Make the topological map:

./create_topomap.sh <topomap_name> <bag_filename>

This command opens up 3 windows:

roscore
python create_topomap.py —dt 1 —dir <topomap_dir>: This command creates a directory in /vint_release/deployment/topomaps/images and saves an image as a node in the map every second the bag is played.
rosbag play -r 1.5 <bag_filename>: This command plays the rosbag at x5 speed, so the python script is actually recording nodes 1.5 seconds apart. The <bag_filename> should be the entire bag name with the .bag extension. You can change this value in the make_topomap.sh file. The command does not run until you hit Enter, which you should only do once the python script gives its waiting message. Once you play the bag, move to the screen where the python script is running so you can kill it when the rosbag stops playing.

When the bag stops playing, kill the tmux session.

Running the model

Navigation

Make sure to run this script inside the vint_release/deployment/src/ directory.

./navigate.sh “--model <model_name> --dir <topomap_dir>”

To deploy one of the models from the published results, we are releasing model checkpoints that you can download from this link.

The <model_name> is the name of the model in the vint_release/deployment/config/models.yaml file. In this file, you specify these parameters of the model for each model (defaults used):

config_path (str): path of the *.yaml file in vint_release/train/config/ used to train the model
ckpt_path (str): path of the *.pth file in vint_release/deployment/model_weights/

Make sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.

The <topomap_dir> is the name of the directory in vint_release/deployment/topomaps/images that has the images corresponding to the nodes in the topological map. The images are ordered by name from 0 to N.

This command opens up 4 windows:

roslaunch vint_locobot.launch: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base).
python navigate.py --model <model_name> -—dir <topomap_dir>: This python script starts a node that reads in image observations from the /usb_cam/image_raw topic, inputs the observations and the map into the model, and publishes actions to the /waypoint topic.
python joy_teleop.py: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.
python pd_controller.py: This python script starts a node that reads messages from the /waypoint topic (waypoints from the model) and outputs velocities to navigate the robot’s base.

When the robot is finishing navigating, kill the pd_controller.py script, and then kill the tmux session. If you want to take control of the robot while it is navigating, the joy_teleop.py script allows you to do so with the joystick.

Exploration

Make sure to run this script inside the vint_release/deployment/src/ directory.

./exploration.sh “--model <model_name>”

To deploy one of the models from the published results, we are releasing model checkpoints that you can download from this link.

The <model_name> is the name of the model in the vint_release/deployment/config/models.yaml file (note that only NoMaD works for exploration). In this file, you specify these parameters of the model for each model (defaults used):

config_path (str): path of the *.yaml file in vint_release/train/config/ used to train the model
ckpt_path (str): path of the *.pth file in vint_release/deployment/model_weights/

Make sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.

This command opens up 4 windows:

roslaunch vint_locobot.launch: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base.
python explore.py --model <model_name>: This python script starts a node that reads in image observations from the /usb_cam/image_raw topic, inputs the observations and the map into the model, and publishes exploration actions to the /waypoint topic.
python joy_teleop.py: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.
python pd_controller.py: This python script starts a node that reads messages from the /waypoint topic (waypoints from the model) and outputs velocities to navigate the robot’s base.

Adapting this code to different robots

We hope that this codebase is general enough to allow you to deploy it to your favorite ROS-based robots. You can change the robot configuration parameters in vint_release/deployment/config/robot.yaml, like the max angular and linear velocities of the robot and the topics to publish to teleop and control the robot. Please feel free to create a Github Issue or reach out to the authors at [email protected].

Citing

@inproceedings{shah2022gnm,
  author    = {Dhruv Shah and Ajay Sridhar and Arjun Bhorkar and Noriaki Hirose and Sergey Levine},
  title     = {{GNM: A General Navigation Model to Drive Any Robot}},
  booktitle = {International Conference on Robotics and Automation (ICRA)},
  year      = {2023},
  url       = {https://arxiv.org/abs/2210.03370}
}

@inproceedings{shah2023vint,
  title     = {Vi{NT}: A Foundation Model for Visual Navigation},
  author    = {Dhruv Shah and Ajay Sridhar and Nitish Dashora and Kyle Stachowicz and Kevin Black and Noriaki Hirose and Sergey Levine},
  booktitle = {7th Annual Conference on Robot Learning},
  year      = {2023},
  url       = {https://arxiv.org/abs/2306.14846}
}

@article{sridhar2023nomad,
  author  = {Ajay Sridhar and Dhruv Shah and Catherine Glossop and Sergey Levine},
  title   = {{NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration}},
  journal = {arXiv pre-print},
  year    = {2023},
  url     = {https://arxiv.org/abs/2310.xxxx}
}

visualnav-transformer's People

Contributors

Stargazers

Watchers

visualnav-transformer's Issues

about model deploy on my own car

Hi，thank you for providing such excellent work. I have currently deployed this Nomad model on my own car, but the model performance is not very good. It is prone to collisions. Could you please tell me the reason for this?

What do you mean the gc and uc ？

return {
'uc_actions': uc_actions,
'gc_actions': gc_actions,
'gc_distance': gc_distance,
}

Collision Issues During Simulation

Dear Author,

I am currently facing an issue and would like to seek your advice. I am running the NoMaD explore simulation on CARLA, but after sending the linear and angular velocities outputted by pd_controller.py to the car in CARLA, collisions often occur. My waypoint visualization seems to be normal. I have a few questions:

In deployment/config/robot.yaml, there are settings such as max_v: 0.2, max_w: 0.4, and frame_rate: 4. Should I set the maximum speed and maximum angular velocity of the car in CARLA to 0.2 and 0.4 as well (I tried this and found that the car's speed was too fast)? Or should these values be manually adjusted based on the actual control situation of the car?
Each path generated by NoMaD has 8 waypoints. Should we track the nearest waypoint? Or should we track the 2nd or 3rd waypoint, depending on the situation? For instance, when there is an obstacle on the right side, my visualization shows that the entire path clearly shifts to the left. However, the car still slowly moves to the right, possibly because the first waypoint is too close.
Thank you for your help.

ViNT Dataset Split

Hi,

I am trying to obtain the same dataset that was used to train ViNT. I notice in data_split.py there is no fixed seed. Any further info about the splits used to train the models?

Go1 deployment details

Thank you for the great work!

Would it be possible to share scripts relevant to deploying the code on the Go1 robot (e.g., robot.yaml, topomap create code)?

Thank you.

What do the four values of choosen_waypoint represent?

Very nice work!
When deploying, if choosen_waypoint=[ 0.15498, -0.00129, 0.9395, -0.3425 ], the robot needs linear and angular velocities to be driven, how can I calculate the linear and angular velocities from the four data, and what do these four data mean?
Thank you!

Using vint on own robot, at starting position the closest_node keeps changing even though the robot is at standstill

The closest_node at starting point, keeps flickering between 0 and 5 , and when the controller is run, the robot navigates but the closest node detected seems to be wrong , hence the robot is unable to find the goal node and stop, any help on how to debug this issue will be appreciated, even the start and end values are changing even though the robot is at standstill.

NoMad on Carla

Hi @robodhruv and @ajaysridhar0

I'm trying to deploy NoMad on Carla under closed-loop setting with provided checkpoint fine-tuned in the Carla distribution. However, it doesn't seem to be working well even for a straight run (no traffic). The normalized waypoints plot doesn't seem to be apt.

Providing the script.
https://gist.github.com/reachpranjal/ca77e69c338dcbd4388cb441f5eb8f8d

how to visualize the prediction traj and gt traj in image view with nomad train and eval pipeline?

When will the code and data be open source

A very meaningful job. When will the code and data be open source

Batch Size Mismatch in explore.py

Hello,

I've been working with your project and examining the handling of model inputs in explore.py and navigate.py, especially in the context of simulations using CARLA, where I receive RGBA data. I preprocess this data to convert it from RGBA to RGB to align with the expected input format of your scripts.

In navigate.py, I noticed a specific approach to adjusting the batch size for the observed images (obs_img) and goal images (goal_img) before feeding them into the model:

obsgoal_cond = model('vision_encoder', obs_img=obs_images.repeat(len(goal_image), 1, 1, 1), goal_img=goal_image, input_goal_mask=mask.repeat(len(goal_image)))

Here, both obs_images and mask are adjusted in batch size to match that of goal_image by using .repeat(len(goal_image), 1, 1, 1), ensuring dimensional consistency for safe model forward propagation.

However, in explore.py, a similar strategy doesn't seem to be applied when dealing with comparable inputs:

obs_cond = model('vision_encoder', obs_img=obs_images, goal_img=fake_goal, input_goal_mask=mask)

In my experiments, since the batch size for fake_goal is defaulted to 1, and obs_images might have a different batch size, it leads to a batch size mismatch error when attempting to combine them for model input.

Could you shed some light on the rationale behind the handling in explore.py? Is there a specific reason for not adjusting the batch size of fake_goal to match obs_images as done in navigate.py? And would you recommend a particular approach to address the batch size mismatch issue I encountered in explore.py within the CARLA simulation context?

Thank you very much for your time and assistance. I look forward to your response.

Question about yaw learning

Good afternoon!

I was curious, why did you choose to include yaw change ψ to the intermediate action space in ViNT? Does it bring extra quality compared to just predicting nomalized actions xy? It's interesting that you do not predict angles in nomad. What was the motivation for that?

Thank you for great papers anyway ^_^

Question about the train step of DDPM schedule

Why your step is set as 10, I mean, it is so small, how you choose to set this param? Thanks

Failed to load image

Why does it fail to load images from RECON dataset?

Collision avoidance issue with ViNT model

Thank you for making the ViNT model available to the public.

While attempting to run the ViNT model, I encountered an issue that I need your help with. I successfully created topological map images and then launched "navigation.py" to drive my robot, following the tutorial provided in this link

I generated the topological map images based on a very simple trajectory, as demonstrated in this folder. My objective is to guide the robot to follow this trajectory.

However, my robot is experiencing inaccurate turning decisions, leading to collisions with walls. You can view the experiment's results in this video .
There were human interventions at 43s and 1min 6s to prevent collisions.

I'm wondering if you could offer any advice on what might be missing. I believe I've made all necessary modifications to the names of ROS topics and parameters. ViNT ran zero-shot w/o any extra training.

For your reference, here are the specifications of my robot and sensors:

Robot: A differential drive robot (details available at this link)
Camera: ELP USB Fisheye Camera 180 Degree 1080P Lightburn Camera
OS/ROS: Ubuntu 20.04 / Noetic
GPU (on the PC): 3070ti

Error in train.py

Thank you for providing the code.
I am trying to use train.by to train my model，but I encountered the following issue while using train.py

Traceback (most recent call last):
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 402, in
main(config)
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/train.py", line 326, in main
train_eval_loop_nomad(
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 203, in train_eval_loop_nomad
train_nomad(
File "/home/iiau-vln/ws_zqs/nomad/visualnav-transformer/train/vint_train/training/train_utils.py", line 661, in train_nomad
loss.backward()
File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/home/iiau-vln/miniconda3/envs/nomad/lib/python3.8/site-packages/torch/autograd/init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What is the purpose of negative actions?

In the ViNT dataloader there are negative goals that may be loaded. Is the purpose of this the give the distance prediction head some examples of very far distance and essential prevent some sort of mode collapse when training the distance prediction head? Then the action predictions are masked out, since you don't have any supervision on those action and you don't want to hurt the training of the action prediction?

Is my understanding correct and did I miss an ablation over this augmentation in the papers, because I do see an option that looks like it's there to turn this on and off, but tracing the code I can't see how it actually accomplishes this.

Error in train.py

When i run the train.py it fails to load images and i also get TypeErrors.Any thoughts? Did i miss anything?

long-distance navigation

Is it possible for the published code to achieve long-distance navigation? Is this the final version of the code? Are you considering releasing the relevant code of the heuristic or diffusion model in the future?

Model Fine-Tuning Related Issues

Hello, I am very grateful for the excellent work you have done on the visualnav-transformer and for making it available for everyone to learn and use. I am pleased to have successfully deployed Vint and Nomad. Now, I am hoping to fine-tune these according to our own robot and sensor equipment, but I have encountered the following two issues. First, I attempted to fine-tune using the values in the vint.yaml file, but the results were not ideal, possibly due to improper parameter adjustment. I wonder if you could provide a fine-tuning configuration file as mentioned in the Vint paper for my reference. Second, the Nomad weights you provided do not allow for the loading and obtaining of training parameters like the Vint weights, which may be due to a different method of saving the weights. I am curious if you could provide a set of weights that allows for the retrieval of training parameters. Thank you very much for your work.

very talented work, still waiting for the code~

Code and data release

Hello author, thanks for your great job and I want to know when the code and data will go to open source？

Seeking Guidance for Simulating Models on Carla

Dear authors and respected experts,

I am currently looking to simulate models on Carla and I'm seeking guidance. Here's my current environment setup:

Ubuntu 20.04
Carla 0.9.13
ROS Noetic

I believe I have successfully set up the Carla-Ros-Bridge and Kobuki. However, I'm unsure about how to integrate them effectively with Carla. Do I need to manually create cameras and robots (e.g., Turtlebot2) in Carla by myself, and create nodes to receive image data from the virtual cameras in the CARLA environment?

As I am relatively new to this field, I apologize in advance if my questions come across as rudimentary or offensive. Your assistance would be greatly appreciated.

Thank you.

Waypoint Visualization and FOV Adjustment Effects using NoMaD Model in CARLA

Hello,

I am currently working with the NoMaD model in the CARLA simulation environment and have encountered a couple of questions and issues that I hope you can help clarify.

Waypoint Visualization :
I have generated some visualization graphs for waypoints while using the NoMaD model. However, I am uncertain about the accuracy of these visualizations. Could you provide some guidance to confirm if my visualizations are correct?(It feels like the starting point is very far from the car)

Additionally, I have noticed that in some cases, the waypoints seem to scatter randomly. Is this a common issue? Will it affect the performance of the navigation?

Field of View (FOV) Adjustment:
In previous discussions, there was a mention that increasing the FOV might enhance navigation performance. I attempted to adjust the FOV from 90 to 120 to see its impact. After making this adjustment, the visualizations of the waypoints appeared somewhat unusual. Could this change in appearance be expected with a larger FOV? Additionally, could you confirm whether a larger FOV is indeed recommended for better navigation performance?

Any insights or suggestions you could provide would be greatly appreciated.

Thank you for your support.

Training on Multi-GPU

Hi, I'm trying to train on custom carla data (image size: [256, 256]). I've modified the gpu_ids in config to train it on 4 GPUs but I ended up with this error.

Traceback (most recent call last):                                                                                    
  File "train.py", line 403, in <module>
    main(config)
  File "train.py", line 327, in main
    train_eval_loop_nomad(
  File "/home2/user/visualnav-transformer/train/vint_train/training/train_eval_loop.py", line 203, in train_eval_loop_nomad
    train_nomad(
  File "/home2/user/visualnav-transformer/train/vint_train/training/train_utils.py", line 701, in train_nomad
    visualize_diffusion_action_distribution(
  File "/home2/user/visualnav-transformer/train/vint_train/training/train_utils.py", line 1090, in visualize_diffusion_action_distribution
    model_output_dict = model_output(
  File "/home2/user/visualnav-transformer/train/vint_train/training/train_utils.py", line 974, in model_output
    obs_cond = model("vision_encoder", obs_img=batch_obs_images, goal_img=batch_goal_images, input_goal_mask=goal_mask)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 108, in parallel_apply
    output.reraise()
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.

File "/home2/user/visualnav-transformer/train/vint_train/models/nomad/nomad.py", line 24, in forward
    output = self.vision_encoder(kwargs["obs_img"], kwargs["goal_img"], input_goal_mask=kwargs["input_goal_mask"])
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home2/user/visualnav-transformer/train/vint_train/models/nomad/nomad_vint.py", line 85, in forward
    obsgoal_encoding = self.goal_encoder.extract_features(obsgoal_img) # get encoding of this img
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/efficientnet_pytorch/model.py", line 296, in extract_features
    x = block(x, drop_connect_rate=drop_connect_rate)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/efficientnet_pytorch/model.py", line 122, in forward
    x = self._project_conv(x)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home2/user/miniconda3/envs/nomad_train/lib/python3.8/site-packages/efficientnet_pytorch/utils.py", line 275, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

How do I run this code in a simulation environment??

Inquiry on Environment and Simulator Usage

Dear author or experienced users, may I ask what environment and simulator you are using (for example, versions of Carla, Ubuntu, etc.)?
Additionally, is there any ViNT-related discussion group where we can discuss together?

Has anyone deployed the model with Unitree Go2?

Hi, thanks for the great work.

I want to check out with the community to see if any one has deployed the model with Unitree Go2 and if there is any guideline for doing so. Thanks!

process_data

Hi,
Is it more reasonable to update 'currtime' within the 'if' statement here? Update 'currtime' after storing synchronized data.
Thanks~

https://github.com/robodhruv/visualnav-transformer/blob/bd3dbda18e0ade852bdedda98866bdb3e599ba72/train/vint_train/process_data/process_data_utils.py#L179C4-L179C4

where deploy the model?

great work!
do you deploy the model on embeded board such as Xavier NX or 3090 like GPUs?
Thanks

code release

Really excellent work, how long will the code be released

Deployment on Go1

Hi, great work!

Is it possible for you to share how to deploy this on a Unitree GO1?

Did you undistort the images in Go-Stanford dataset before training ViNT model ?

Hello,

I have a question about your training/testing dataset.
It seems that the training dataset contains both rectified (SCAND) and distorted images (Go-Stanford). Did you undistort the images in the Go-Stanford dataset before training your models to standardize the image rectification?

Thanks

Clarification on Topological Graph based representation of Image map.

Hey, I am confused on how the topological graph works. I see it takes in the camera feed and convert it into a series of images naming them sequentially. Isn't needed to be in a tree structure. I remember the paper contains a tree shape. Also, with the current implementation, I believe wont it be difficult to navigate from one room to other room? Also, will it be possible to test it on Gazebo. How accurate would it work with zero shot learning? Do I need to make a long enough training or it ok if I use the current weights?

About late fusion option

With the late fusion option, the goal encoder is learned with only the goal image as input. Typically when I think of late fusion, I think of two encoder networks and then a mlp to fuse their outputs together. I can't see this being done explicitly, so is it assumed that the transformer layers will learn this late fusion? If this is the case then at the start of training the only thing which differentiates an observation feature from a goal feature is the positional encoding, which may make learning in this case difficult. This may explain why you observed that late fusion didn't work very well in the ViNT paper.

Does this seem accurate or have I missed something? Apologies if this isn't the right place to ask these kinds of questions, but it may be helpful to others.

What is the appropriate resolution or camera configuration?

I am trying to deploy different models (GNM, ViNT, NoMaD) on Jackal with this camera. The camera configuration file uses 160 X 120 image which looks like this:

While the default camera configuration of the usb_cam node uses a 640 x 480 images, which from the same place looks like this:

While creating the topological map and getting the images from camera on-policy while testing these models, which images are appropriate to use? The 640 X 480 images, or the 160 X 120 images? The 160 X 120 images look a lot different than what is shown as the image seen by by the robot in this video- the bottom right image or the robot's view image in this video.

There are some problem when we train the model in SCAND and recon

When we train and test the model on RECON, it operates stably. However, when we conduct joint training on RECON and SCAND, we find that the model becomes very unstable during testing, often experiencing stuttering and continuous straight-line movement. Is this due to the data distribution characteristics of SCAND, or is it a training issue?

Questions about the training -- TartanDrive missing?

Thank you for providing the code. I have successfully trained the model from scratch using the public dataset you shared. The training process seemed to be successful; I've monitored the loss decrease and the visualization results in wandb appear to be satisfactory.

However, when I run the explore.py code using the model checkpoint file from my training, the model outputs are incorrect. To verify the functionality of explore.py, I also executed the code using the model checkpoint file you provided, which worked perfectly fine.

So, I am wondering if I need to modify the explore.py code to better utilize the model that I trained from scratch?

robodhruv / visualnav-transformer Goto Github PK

visualnav-transformer's Introduction

General Navigation Models: GNM, ViNT and NoMaD

Overview

Train

Pre-requisites

Setup

Data-Wrangling

Data Processing

Training your General Navigation Models

Custom Config Files

Custom Datasets

Training your model from a checkpoint

Deployment

LoCoBot Setup

Software Installation (in this order)

Hardware Requirements

Loading the model weights

Collecting a Topological Map

Record the rosbag:

Make the topological map:

Running the model

Navigation

Exploration

Adapting this code to different robots

Citing

visualnav-transformer's People

Contributors

Stargazers

Watchers

Forkers

visualnav-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org