bradyz / cross_view_transformers Goto Github PK

View Code? Open in Web Editor NEW

511.0 14.0 77.0 13.35 MB

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)

License: MIT License

Python 87.71% Jupyter Notebook 12.29%

deep-learning pytorch transformer cvpr2022

cross_view_transformers's Introduction

Cross View Transformers

This repository contains the source code and data for our paper:

Cross-view Transformers for real-time Map-view Semantic Segmentation
Brady Zhou, Philipp Krähenbühl
CVPR 2022

Demos

Map-view Segmentation: The model uses multi-view images to produce a map-view segmentation at 45 FPS

Map Making: With vehicle pose, we can construct a map by fusing model predictions over time

Cross-view Attention: For a given map-view location, we show which image patches are being attended to

Installation

# Clone repo
git clone https://github.com/bradyz/cross_view_transformers.git

cd cross_view_transformers

# Setup conda environment
conda create -y --name cvt python=3.8

conda activate cvt
conda install -y pytorch torchvision cudatoolkit=11.3 -c pytorch

# Install dependencies
pip install -r requirements.txt
pip install -e .

Data

Documentation:

Dataset setup
Label generation (optional)

Download the original datasets and our generated map-view labels

	Dataset	Labels
nuScenes	keyframes + map expansion (60 GB)	cvt_labels_nuscenes.tar.gz (361 MB)
Argoverse 1.1	3D tracking	coming soon™

The structure of the extracted data should look like the following

/datasets/
├─ nuscenes/
│  ├─ v1.0-trainval/
│  ├─ v1.0-mini/
│  ├─ samples/
│  ├─ sweeps/
│  └─ maps/
│     ├─ basemap/
│     └─ expansion/
└─ cvt_labels_nuscenes/
   ├─ scene-0001/
   ├─ scene-0001.json
   ├─ ...
   ├─ scene-1000/
   └─ scene-1000.json

When everything is setup correctly, check out the dataset with

python3 scripts/view_data.py \
  data=nuscenes \
  data.dataset_dir=/media/datasets/nuscenes \
  data.labels_dir=/media/datasets/cvt_labels_nuscenes \
  data.version=v1.0-mini \
  visualization=nuscenes_viz \
  +split=val

Training

An average job of 50k training iterations takes ~8 hours.
Our models were trained using 4 GPU jobs, but also can be trained on single GPU.

To train a model,

python3 scripts/train.py \
  +experiment=cvt_nuscenes_vehicle
  data.dataset_dir=/media/datasets/nuscenes \
  data.labels_dir=/media/datasets/cvt_labels_nuscenes

For more information, see

config/config.yaml - base config
config/model/cvt.yaml - model architecture
config/experiment/cvt_nuscenes_vehicle.yaml - additional overrides

Additional Information

Awesome Related Repos

License

This project is released under the MIT license

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2022cross,
    title={Cross-view Transformers for real-time Map-view Semantic Segmentation},
    author={Zhou, Brady and Kr{\"a}henb{\"u}hl, Philipp},
    booktitle={CVPR},
    year={2022}
}

cross_view_transformers's People

Contributors

Stargazers

Watchers

cross_view_transformers's Issues

The softmax attention do not use a cosine similarity

Thanks for your work.
according to your code, softmax attention do not use a cosine similarity,Did I get something wrong?

How long does it take to train the model?

Hello！Thanks for your good work ! could you please explain what GPU you used to train models, and how long does it take?

Is the production of BEV labels and LSS labels the same?

About setting1

Hi, thanks for your work. I would like to ask how to validate under setting1? what should i modify in the config?

Question regarding the training configs

Hi @bradyz, thanks for your beautiful work and for releasing the codebase!

I see in config/experiment you only provide the training config for vehicle category. I was wondering what's the right way to change it into the config for road category. Is there anything else to change (e.g. bev or center keys) other than changing the override /data from vehicle to road?

Thanks a lot!

Model parameter count

Hi Brady, thanks for the great work!

I have a question regarding the model parameters. In your paper, you have reported 5M in Table 1 on nuscenes. However, I run your code with the default yaml config, and the log says 1.1M parameters. Did I miss anything?

image embedding calculation

Hey Brady,

In your encoder.py line 255, you used:

img_embed = d_embed - c_embed

To my understanding, here you want to subtract the camera translation information from the image coordinate embedding. However, I think the translation information is already included in the image coordinate embedding:

  pixel_flat = rearrange(pixel, '... h w -> ... (h w)')                   # 1 1 3 (h w)
  cam = I_inv @ pixel_flat                                                # b n 3 (h w)
  cam = F.pad(cam, (0, 0, 0, 1, 0, 0, 0, 0), value=1)                     # b n 4 (h w)
  d = E_inv @ cam                                                         # b n 4 (h w)
  d_flat = rearrange(d, 'b n d (h w) -> (b n) d h w', h=h, w=w)           # (b n) 4 h w
  d_embed = self.img_embed(d_flat)

where E_inv contains the translation already. So will the subtraction of the c_embed be redundant?

loss function mutation when training

Hello,i'm very grateful for your work.but when i was training the model,there is always a problem that the loss mutated and validation images became irregular or all black，with the iou metrics dropping down to almost 0.i trained the model with a single RTX2080ti and set the batchsize as 4,while the other parameters remained original.later i reduced the max_lr to 1e-3 but still cannot work.here's the wandb report:
https://wandb.ai/jemini/cross_view_transformers_test/runs/0404_144821?workspace=user-jemini
then i found a pretrained checkpoint later.i trained with it again, and still the same problem.
https://wandb.ai/jemini/cross_view_transformers_test/runs/0408_131432?workspace=user-jemini
hope for your help!

Question about camera extrinsics

Hi, I found in the experiment that the camera extrinsics generated by CVT are different from those generated by Fiery. I don't the reason,please help me, thank you!

A question about Setting 1

Thanks for your kindness in sharing your code with us!

I have a question about the Setting1. According to the Table 1 in your paper, PON is under the Setting 1 that refers to the 100m x 50m at 25 cm resolution. The resolution of 25cm per pixel is mentioned in the 4.2 section of PON's paper, but I think the wording of 100m x 50m is not correct. According to PON's code, pixel resolution of its BEV image is 196x200, so I think Setting 1 should refer to the 49m x 50m at 25cm resolution. Did I get something wrong?

> @yangyangsu29 after model convergence, how many the loss will be?

@yangyangsu29 Did you solve the problem now? Thanks.
training each class （vehicle and driveable area）separately，it can reproduce the paper‘s result indeed；
in visualization, the predict‘s map overlay together as follow：

Originally posted by @yangyangsu29 in #16 (comment)

How to draw Figure3 in the paper？

We tried to make a little modification on the structure of the paper, and we want to compare it with the results in Figure 3 of the paper. Could you please provide the code you used to make Figure 3?

(Figure 3. A comparison of model performance vs distance to the camera. Each entry shows the average intersection over union accuracy for annotations that are at least distance d away.)

error training

my env: torch==1.11.0+cu113.

image name is sample.json is not present in downloaded /media/datasets/nuscenes/samples/CAM_FRONT_LEFT folder

Hi, i have downloaded files as per instruction from dataset setup
Do you guys able to get the same results?

I have a problem to generate the result, it's seems like a lot of noise

If anyone facing issue please let me know ?

Error help

I train the model in windows, but there are problems with nccl:
Traceback (most recent call last):
File "scripts/train.py", line 74, in main
trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
File "D:\anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "D:\anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 721, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "D:\anaconda3\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "D:\anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "D:\anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1171, in _run
self.strategy.setup_environment()
File "D:\anaconda3\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 152, in setup_environment
self.setup_distributed()
File "D:\anaconda3\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 205, in setup_distributed
init_dist_connection(self.cluster_environment, self._process_group_backend)
File "D:\anaconda3\lib\site-packages\pytorch_lightning\utilities\distributed.py", line 355, in init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "D:\anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 537, in init_process_group
default_pg = _new_process_group_helper(
File "D:\anaconda3\lib\site-packages\torch\distributed\distributed_c10d.py", line 639, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I know that running on windows does not support nccl. So how should I modify it so that I can train normally?

How to add the function of predicting the angle?

Resume Training

Excuse me, I was interrupted during training. I modified the path to resume training to point to the .ckpt file where the last training was interrupted, but the training still cannot continue. How should I solve it?

About test dataset

Hi, i'm training this model with Nuscenes dataset.

when I train cross-view transformer network using this code, model's best IoU for road reach 71%

but in the "Cross-view Transformers for real-time Map-view Semantic Segmentation" paper road's best IoU is 74.3%

Is the reason why my experimental results are different is that the dataset for scoring in this paper is not a validation dataset? Or is it simply because the hyperparameters to get the highest score are different?

When run [train.py] RuntimeError: unmatched '}' in format string

This problem occurs in torch\distributed\rendezvous.py

But in this code I can't find any '}' to solve the problem

I try to print (hostname, port, world_size, start_daemon, timeout), But it doesn't look like they have a problem.

Need someone to help, thanks.

The labels of dataset Argoverse.

Thank you first for sharing your great work for the community.
May I ask when will the labels of dataset Argoverse or its generation code be shared?

How to train this for segmenting more than 2 classes?

Hi,

I have two specific questions:

For label generation, I only got labels like this: Only white and black colors. I want to generate labels for more than 2 classes like the results shown in Figure 5 on the paper. What parameters should I change to achieve this?
For training a model that produces Figure 5. on the paper, what parameters should I change?

Thank you so much!

Training this for other objects, e.g. Pedestrians

Hello,

First of all, this is great work! We really enjoyed reading this paper and thanks for sharing the code. We have been trying this approach out for other objects, e.g. pedestrians (in the current paper you have done this for vehicles). We have been finding issues with the training here and specifically, observing that the model is not able to learn at all. Is this something you have observed? Have you tried this with other objects other than vehicles?

Thanks,
Rishabh

How to visualize

thanks for the great work, I am wondering how to visualize the output?

Error running training

Hi, when I try to run the follow command to train, an error throws out.

python scripts/train.py   data=nuscenes +experiment=cvt_nuscenes_vehicle   data.dataset_dir=data/nuscenes   data.labels_dir=data/cvt_labels_nuscenes   visualization=nuscenes_viz

Error:

/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:486: PossibleUserWarning: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test/predict dataloaders.
  rank_zero_warn(
Epoch 0:   0%|                                                                                                                                                                                         | 0/8538 [00:00<?, ?it/s]Error executing job with overrides: ['data=nuscenes', '+experiment=cvt_nuscenes_vehicle', 'data.dataset_dir=/home/runshengxu/project/data/nuscenes', 'data.labels_dir=/home/runshengxu/project/data/cvt_labels_nuscenes', 'visualization=nuscenes_viz']
Traceback (most recent call last):
  File "scripts/train.py", line 71, in main
    trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
    self._call_and_handle_interrupt(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run
    results = self._run_stage()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage
    return self._run_train()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train
    self.fit_loop.run()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
    result = self._run_optimization(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/optim/adamw.py", line 100, in step
    loss = closure()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
    closure_result = closure()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step
    return self.model(*args, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/model/model_module.py", line 41, in training_step
    return self.shared_step(batch, 'train', True,
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/model/model_module.py", line 25, in shared_step
    loss, loss_details = self.loss_func(pred, batch)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 113, in forward
    outputs = {k: v(pred, batch) for k, v in self.items()}
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 113, in <dictcomp>
    outputs = {k: v(pred, batch) for k, v in self.items()}
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 50, in forward
    loss = super().forward(pred, label)
  File "/home/runshengxu/project/cross_view_transformers/cross_view_transformer/losses.py", line 24, in forward
    return sigmoid_focal_loss(pred, label, self.alpha, self.gamma, self.reduction)
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/fvcore/nn/focal_loss.py", line 34, in sigmoid_focal_loss
    ce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction="none")
  File "/home/runshengxu/anaconda3/envs/cvt/lib/python3.8/site-packages/torch/nn/functional.py", line 3130, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([4, 12, 200, 200])) must be the same as input size (torch.Size([4, 1, 200, 200]))

Did I input the wrong command? I didn't change the config.yaml and I only have 1 gpu.

visualisation

Hello,
first of all, thank you for your great work.
how every i have some questions as following. Could you please kindly help?
1, put the batch into the network, we got the prediction result which includes tow tensor as 'bev' and 'center'. What is the 'center'?
2, i had use the checkpoint from "https://www.cs.utexas.edu/~bzhou/cvt/cvt_nuscenes_vehicles_50k.ckpt", but the prediction is not good as gif in repo. this is a front vehicle as FP foo a long time. is this checkpoint the one of in paper or demo?
my result :
out_test.zip

3, in the demo, there is not only object detection , but also road segmentation as :

how could i got result like this?

A super huge Big Thank you in advance.

Please delete, my mistake

Evaluation code

Hi,
Thanks for your great work and sharing the code! I didn't find the codes for evaluation, will you release it sooner? Thanks!

Question about camera extrinsics in batch

Dear Professor，

Thank you very much for seeing this email in your busy schedule.
I have a question for you here: In your ''Cross-view Transformers for real-time Map-view Semantic Segmentation '' paper, when encoding image information, the rotation matrix and translation of the camera external parameters are relative to the ego-vehicle coordinate system, but the ego-vehicle coordinates are also useful in the code to the world coordinate system. So I don't know what the external parameters that go into the encoded batch in the code represent exactly。
I hope you can answer a question in your free time, thank you very much！

about Map Making visualization

Great job! How is this visualization made? Is there any code to refer to?

No File named generate_labels.py

Hello! Your work is so impressive. But there is no file named generate_labels.py which mentioned in docs/label_generation.md to generate cvt labels.

Discrepancy in the validation IOU values for Driveable Area segmentation

I have implemented the author's code, but I noticed a discrepancy in the validation IOU values for road segmentation (In Paper: Driveable Area = 74.3, My implementation = 68.7). However, the vehicle category scores are consistent with the scores mentioned in the paper. I would like to ask whether I should consider changing the training labels or how I should proceed with the training. Thank you!

Reproducing results of paper

Hello, many thanks for sharing the code of this awesome work !

I am trying to reproduce your results, but the config file cvt_nuscenes_vehicle.yaml differs from what is described in the paper and the training/evaluation setup of Lift Splat Shoot.

In particular:

The use of the Center Loss instead of the Focal loss
You use a learning rate of 4E-3 instead of 1E-2
You use the visibility token from Nuscenes annotation to filter-out objects that have a visibility level strictly inferior to 2
You use label_indices: [[4, 5, 6, 7, 8, 10, 11]] (7 classes) whereas the list of classes DYNAMIC contains 8 classes

Do you know how these factors influence your results?

Can you share the exact config you used to get the results in Table 1 of your paper ?

generate label

where is the scripts/generate_labels.py?

camera intrinsics calculation

Hey Brady, thanks for the great work and the open-sourced code!

I found the calculation of rescaled intrinsics is a bit wired here:

cross_view_transformers/cross_view_transformer/data/transforms.py

Line 135 in 8d1d688

I[0, 0] *= w / image.width

Shouldn't the scaling factor be w_resize and h_resize instead of w and h?

Val IoU not good after training

Hey there, thank you for your great contribution! Unfortunately I am running into some issues when I am train the model using the nuscenes vehicle experiment config.

The iou metrics are as follows:

Train/Val	[email protected]	[email protected]
Train	0.4305	0.3868
Val	0.09296	0.03948

So, you can see that there are some issues during validation?

Interestingly, this can also be visually confirmed as during training the models outputs get better and better.
However, during validation the predictions get better, then there is a period where the model only predicts a black screen, and then they start getting better again.

Following are some photos from the w&b log:
Val metrics plot

Val output at step 247

Val output at step 329 (notice all black)

Val output at step 663

Would really appreciate any feedback you might have regarding this issue. Thanks.

How to segmentate road_segment and dynamic objects together ?

Hi ， I have run your code,and I see there are only two examples to train your code,label_indices: are [[0, 1]]
and [[4, 5, 6, 7, 8, 10, 11]] It seems the label [[2,3]] isn't used. if I want to get road_segment results and different dynamic results at the same time, how to train ?

A tensor size mismatch bug when traing

Hi, thanks for your outstanding work!
When I was training this model with default configurations, a size mismatch error came up.
This error is mainly related to the class "SigmoidFocalLoss", and happens just before the epoch 0.
Is there any bug or I did something wrong?

Error when training

Hi, Thanks for your great work！
But when I try to train a model，it shows errors as follows：
Error executing job with overrides: ['+experiment=cvt_nuscenes_road']
Traceback (most recent call last):
File "scripts/train.py", line 77, in
main()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in
lambda: hydra.run(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "scripts/train.py", line 73, in main
trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1219, in _run
self._call_callback_hooks("on_fit_start")
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1634, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/rank_zero.py", line 32, in wrapped_fn
return fn(*args, **kwargs)
File "/home/liumh/cross_view_transformers/cross_view_transformer/callbacks/gitdiff_callback.py", line 34, in on_fit_start
diff = git.Repo(PROJECT_ROOT).git.diff()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/git/repo/base.py", line 224, in init
self.working_dir: Optional[PathLike] = self._working_tree_dir or self.common_dir
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/git/repo/base.py", line 307, in common_dir
raise InvalidGitRepositoryError()
git.exc.InvalidGitRepositoryError
Traceback (most recent call last):
File "scripts/train.py", line 77, in
main()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in
lambda: hydra.run(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "scripts/train.py", line 73, in main
trainer.fit(model_module, datamodule=data_module, ckpt_path=ckpt_path)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1219, in _run
self._call_callback_hooks("on_fit_start")
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1634, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/pytorch_lightning/utilities/rank_zero.py", line 32, in wrapped_fn
return fn(*args, **kwargs)
File "/home/liumh/cross_view_transformers/cross_view_transformer/callbacks/gitdiff_callback.py", line 34, in on_fit_start
diff = git.Repo(PROJECT_ROOT).git.diff()
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/git/repo/base.py", line 224, in init
self.working_dir: Optional[PathLike] = self._working_tree_dir or self.common_dir
File "/home/liumh/anaconda3/envs/cvt/lib/python3.8/site-packages/git/repo/base.py", line 307, in common_dir
raise InvalidGitRepositoryError()
git.exc.InvalidGitRepositoryError

What can I do to solve this？ Thank you！

Dear author, how to generate the label? thx where is the "generate_label" script in the dir?

where is the "generate_label" script?

Geometric reasoning in cross-view attention

Thanks for your great work.
May I ask which script is talking about "Geometric reasoning in cross-view attention" in the paper?
And is there any demo script reproducing the results of attention value in the paper (Figure 6)?

question about eval size and preprocess

Hi，What size images did you use when evaluating, still 480x224 or origin size of 1600x900

Questions concerning GPU memory

Hello,

Thanks for your inspiring work, and I am trying to reproduce the reported results with the provided code.

I run the code as you suggested :
python3 scripts/train.py
+experiment=cvt_nuscenes_vehicle
data.dataset_dir=/media/datasets/nuscenes
data.labels_dir=/media/datasets/cvt_labels_nuscenes

I made a slight change on the batch size, i.e., training on two A100 (CUDA11) with 8 samples on each card, but the results exhibit a large gap with the results in the paper.

Are there any suggestions for debugging this?

Pre-trained model

Hi Bradyz, many thanks for sharing this amzing work, the idea is elegant. Currently, I am trying to use the code for 3D object detection task, but it takes long time to train the model. Would you mind providing well-pretrained model to speedup training process?

Question about the implementation of 'camera-aware positional encoding' part

Thank you first for sharing your great work for the community :)

According to your published paper, camera location embeddings (tau_{k}) are subtracted from map-view positional encodings (c^{n}) to make map-view queries (c^{n} - tau_{k}).

However, I found from your code that camera location embeddings (tau_{k}) are also subtracted from camera positional embeddings (delta_{k,i}), which is different from equation 3. Please see the last two lines of the following code.

# -------------------------
# translation embedding, tau_{k}
# -------------------------
c = E_inv[..., -1:]                                                     # b n 4 1
c_flat = rearrange(c, 'b n ... -> (b n) ...')[..., None]                # (b n) 4 1 1
c_embed = self.cam_embed(c_flat)                                        # (b n) d 1 1

# -------------------------
# R_{k}^{-1} X K_{k}^{-1} X x_{i}^{(I)}
# -------------------------
pixel_flat = rearrange(pixel, '... h w -> ... (h w)')                   # 1 1 3 (h w)
cam = I_inv @ pixel_flat                                                # b n 3 (h w)
cam = F.pad(cam, (0, 0, 0, 1, 0, 0, 0, 0), value=1)                     # b n 4 (h w)
d = E_inv @ cam                                                         # b n 4 (h w)
d_flat = rearrange(d, 'b n d (h w) -> (b n) d h w', h=h, w=w)           # (b n) 4 h w
d_embed = self.img_embed(d_flat)   

# -------------------------
# Normalization for attention
# -------------------------
# TODO : why subtract c_embed?
img_embed = d_embed - c_embed                                           # (b n) d h w
img_embed = img_embed / (img_embed.norm(dim=1, keepdim=True) + 1e-7)    # (b n) d h w

Am I missing something?

the difference between paper‘s figure and visualized predictd results

hi，I run the demo in cross_view_transformers/scripts/example.ipynb，but I find the difference between paper‘s figure and the outputs'result , the figures are as fowllows:

in my opinion, if we get the paper's figure, the model is a three labels but not a simply binary segmentation model, I look forward to your answer，thank you.

Question about cross attention module

Thanks for sharing the great work!
Have you conwsidered the deformable attention? I believe in the paper you were trying to compare queries at each map location to keys at each pixel accross all six perspective views, right?

Warning : Grad strdies do not match bucket view strides & Error : internal database error

When I used source code training whole nuscenes dataset, model will be shutdown at epoch 6.

rank_zero_warn(
Epoch 6: 44%|█████████████████████████████████████████████████████▎ | 1866/4270 [00:00<?, ?it/s]

Messages form TERMINAL:
[W reducer.cpp:347] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [2, 64, 1, 1], strides() = [64, 1, 64, 64]
bucket_view.sizes() = [2, 64, 1, 1], strides() = [64, 1, 1, 1] (function operator())
[W reducer.cpp:347] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [2, 64, 1, 1], strides() = [64, 1, 64, 64]
bucket_view.sizes() = [2, 64, 1, 1], strides() = [64, 1, 1, 1] (function operator())

wandb debug logs message:
2022-07-19 21:15:58,526 ERROR MainThread:2484870 [internal_api.py:execute():143] 500 response executing GraphQL.
2022-07-19 21:15:58,527 ERROR MainThread:2484870 [internal_api.py:execute():144] {"error":"internal database error"}

So how should I modify it so that I can train normally?

camera internal and external parameters

Hello, model training must be heavily dependent on camera extrinsic and extrinsic parameters? Will the effect be much worse if I use it?

  I_inv = batch['intrinsics'].inverse()           # b n 3 3
  E_inv = batch['extrinsics'].inverse()           # b n 4 4

Thanks a lot!

Scripts to generate the labels

Hi Brady,

Thanks for your great work and clean coding style! I am wondering do you still have the scripts to generate the cvt_labels_nuscenes? If so, would you be willing to share it? Look forward to your reply.

Question about camera extrinsic

When extracting the camera extrinsic, why need to calculate "egocam_from_world @ world_from_egolidarflat"?
Is it for synchronizing with the lidar time and eliminating the z-axis？

cross_view_transformers/cross_view_transformer/data/nuscenes_dataset.py

Line 174 in 4de6e64

E = cam_from_egocam @ egocam_from_world @ world_from_egolidarflat