Git Product home page Git Product logo

eda's Introduction

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding (CVPR2023)

By Yanmin Wu, Xinhua Cheng, Renrui Zhang, Zesen Cheng, Jian Zhang*
This repo is the official implementation of "EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding". CVPR2023 | arXiv | Code

Figure 1

0. Installation

  • (1) Install environment with environment.yml file:
    conda env create -f environment.yml --name EDA
    
    • or you can install manually:
      conda create -n EDA python=3.7
      conda activate EDA
      conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
      pip install numpy ipython psutil traitlets transformers termcolor ipdb scipy tensorboardX h5py wandb plyfile tabulate
      
  • (2) Install spacy for text parsing
    pip install spacy
    # 3.3.0
    pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0.tar.gz
    
  • (3) Compile pointnet++
    cd ~/EDA
    sh init.sh
    

1. [TODO] Quick visualization demo

  • Visualization
  • Text-decoupling demo

2. Data preparation

The final required files are as follows:

├── [DATA_ROOT]
│	├── [1] train_v3scans.pkl # Packaged ScanNet training set
│	├── [2] val_v3scans.pkl   # Packaged ScanNet validation set
│	├── [3] ScanRefer/        # ScanRefer utterance data
│	│	│	├── ScanRefer_filtered_train.json
│	│	│	├── ScanRefer_filtered_val.json
│	│	│	└── ...
│	├── [4] ReferIt3D/        # NR3D/SR3D utterance data
│	│	│	├── nr3d.csv
│	│	│	├── sr3d.csv
│	│	│	└── ...
│	├── [5] group_free_pred_bboxes/  # detected boxes (optional)
│	├── [6] gf_detector_l6o256.pth   # pointnet++ checkpoint (optional)
│	├── [7] roberta-base/     # roberta pretrained language model
│	├── [8] checkpoints/      # EDA pretrained models
  • [1] [2] Prepare ScanNet Point Clouds Data
    • 1) Download ScanNet v2 data. Follow the ScanNet instructions to apply for dataset permission, and you will get the official download script download-scannet.py. Then use the following command to download the necessary files:
      python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.ply
      python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.labels.ply
      python2 download-scannet.py -o [SCANNET_PATH] --type .aggregation.json
      python2 download-scannet.py -o [SCANNET_PATH] --type _vh_clean_2.0.010000.segs.json
      python2 download-scannet.py -o [SCANNET_PATH] --type .txt
      
      where [SCANNET_PATH] is the output folder. The scannet dataset structure should look like below:
      ├── [SCANNET_PATH]
      │   ├── scans
      │   │   ├── scene0000_00
      │   │   │   ├── scene0000_00.txt
      │   │   │   ├── scene0000_00.aggregation.json
      │   │   │   ├── scene0000_00_vh_clean_2.ply
      │   │   │   ├── scene0000_00_vh_clean_2.labels.ply
      │   │   │   ├── scene0000_00_vh_clean_2.0.010000.segs.json
      │   │   ├── scene.......
      
    • 2) Package the above files into two .pkl files(train_v3scans.pkl and val_v3scans.pkl):
      python Pack_scan_files.py --scannet_data [SCANNET_PATH] --data_root [DATA_ROOT]
      
  • [3] ScanRefer: Download ScanRefer annotations following the instructions HERE. Unzip inside [DATA_ROOT].
  • [4] ReferIt3D: Download ReferIt3D annotations following the instructions HERE. Unzip inside [DATA_ROOT].
  • [5] group_free_pred_bboxes: Download object detector's outputs. Unzip inside [DATA_ROOT]. (not used in single-stage method)
  • [6] gf_detector_l6o256.pth: Download PointNet++ checkpoint into [DATA_ROOT].
  • [7] roberta-base: Download the roberta pytorch model:
    cd [DATA_ROOT]
    git clone https://huggingface.co/roberta-base
    cd roberta-base
    rm -rf pytorch_model.bin
    wget https://huggingface.co/roberta-base/resolve/main/pytorch_model.bin
    
  • [8] checkpoints: Our pre-trained models (see next step).

3. Models

Dataset [email protected] [email protected] Model Log (train) Log (test)
ScanRefer 54.59 42.26 OneDrive* 54_59.txt1 / 54_44.txt2 log.txt
ScanRefer (Single-Stage) 53.83 41.70 OneDrive 53_83.txt1 / 53_47.txt2 log.txt
SR3D 68.1 - OneDrive 68_1.txt1 / 67_6.txt2 log.txt
NR3D 52.1 - OneDrive 52_1.txt1 / 54_7.txt2 log.txt

*: This model is also used to evaluate the new task of grounding without object names, with performances of 26.5% and 21.6% for [email protected] and [email protected].
1: The log of the performance we reported in the paper.
2: The log of the performance we retrain the model with this open-released repository.
Note: To find the overall performance, please refer to issue3.

4. Training

  • Please specify the paths of --data_root, --log_dir, --pp_checkpoint in the train_*.sh script first. We use four or two 24-GB 3090 GPUs for training with a batch size of 12 by default.
  • For ScanRefer training
    sh scripts/train_scanrefer.sh
    
  • For ScanRefer (single stage) training
    sh scripts/train_scanrefer_single.sh
    
  • For SR3D training
    sh scripts/train_sr3d.sh
    
  • For NR3D training
    sh scripts/train_nr3d.sh
    

5. Evaluation

  • Please specify the paths of --data_root, --log_dir, --checkpoint_path in the test_*.sh script first.
  • For ScanRefer evaluation
    sh scripts/test_scanrefer.sh
    
    • New task: grounding without object names. Please first download our new annotation, then give the path of --wo_obj_name in the script and run:
      sh scripts/test_scanrefer_wo_obj_name.sh
      
  • For ScanRefer (single stage) evaluation
    sh scripts/test_scanrefer_single.sh
    
  • For SR3D evaluation
    sh scripts/test_sr3d.sh
    
  • For NR3D evaluation
    sh scripts/test_nr3d.sh
    

6. Acknowledgements

We are quite grateful for BUTD-DETR, GroupFree, ScanRefer, and SceneGraphParser.

7. Citation

If you find our work useful in your research, please consider citing:

@inproceedings{wu2022eda,
  title={EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding},
  author={Wu, Yanmin and Cheng, Xinhua and Zhang, Renrui and Cheng, Zesen and Zhang, Jian},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

8. Contact

If you have any question about this project, please feel free to contact Yanmin Wu: wuyanminmax[AT]gmail.com

eda's People

Contributors

yanmin-wu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

eda's Issues

should we add `self.text_encoder.eval()` ?

To freeze the text encoder, you set the param.requires_grad as False:

self.tokenizer = RobertaTokenizerFast.from_pretrained(t_type, local_files_only=True)
self.text_encoder = RobertaModel.from_pretrained(t_type, local_files_only=True)
for param in self.text_encoder.parameters():
    param.requires_grad = False

However, to make sure the Dropout layer in the RobertaModel work as in evaluation (do not drop out neurons randomly), we need add self.text_encoder.eval() at the end of the codes above:

self.tokenizer = RobertaTokenizerFast.from_pretrained(t_type, local_files_only=True)
self.text_encoder = RobertaModel.from_pretrained(t_type, local_files_only=True)
for param in self.text_encoder.parameters():
    param.requires_grad = False
self.text_encoder.eval()

Question about "point_instance_label"

Hi, thanks for you great work.

But I have a question about "point_instance_label" when reading the code.

In src/joint_det_dataset.py, function _get_target_boxes:

point_instance_label = -np.ones(len(scan.pc))
for t, tid in enumerate(tids):
    point_instance_label[scan.three_d_objects[tid]['points']] = t

Is there any problem with setting the label of the point to the sequence index t?

In my opinion, the code should be modified as follows:

point_instance_label = -np.ones(len(scan.pc))
for t, tid in enumerate(tids):
    point_instance_label[scan.three_d_objects[tid]['points']] = tid 

In this way, all points in the same scene can be correctly classified into the corresponding object IDs.

If we set the point_instance_label to t, the label of the point can only be 0 on the scanrefer dataset when joint_det is false, which leads to logic errors.

Question about performance on ScanRefer

When I run the train_scanrefer.sh, the result in the log of output is extremely bad, I don't konw why.

last_ position alignment Acc0.25: Top-1: 0.00032, Top-5: 0.00294, Top-10: 0.00936
last_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
last_ semantic alignment Acc0.25: Top-1: 0.00179, Top-5: 0.00947, Top-10: 0.01536
last_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00032, Top-10: 0.00032
proposal_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
proposal_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
proposal_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
proposal_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
0head_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
0head_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
0head_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
0head_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
1head_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
1head_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
1head_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
1head_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
2head_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
2head_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
2head_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
2head_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
3head_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
3head_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
3head_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
3head_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
4head_ position alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
4head_ position alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
4head_ semantic alignment Acc0.25: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000
4head_ semantic alignment Acc0.50: Top-1: 0.00000, Top-5: 0.00000, Top-10: 0.00000

Analysis
[email protected]
easy 0.0
hard 0.002426491578646874
vd 0.0014448257178977786
vid 0.002266431629312516
unique 0.0
multi 0.0021016194832488566
[email protected]
easy50 0.0
hard50 0.0
vd50 0.0
vid50 0.0
unique50 0.0
multi50 0.0

Ask for visualization code

Can i ask you for the visualization code about plotting the bounding box on the RGB-D data and save as the 2D RGB image, as shown in your figures?

The log does not contain the evaluated value. Can the epoch be made smaller? Which indicator represents Overall performance?

  1. The log file only contains loss information, and the evaluation information is only printed in the terminal, which is very unfriendly for epoch 400 training. I have to test each epoch again.
  2. Is epoch 400 necessary? In my experiments, val loss became overfitting near epoch 50. Can the epoch be adjusted to 75 or other relatively small values? Will it affect the results? Have you done any corresponding experiments?
  3. How long does it take to run the complete 400 epochs?
  4. In your public training log, it only reaches epoch 72. In issue 3, I find you load epoch 60, and the overall result is from last_semantic alignment Acc0.25: Top-1: 0.54586.
    But in the training log, the results of epoch 60 are:
last_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
last_ Box given span (contrastive) Acc0.25: Top-1: 0.546, Top-5: 0.680, Top-10: 0.736
last_ Box given span (contrastive) Acc0.50: Top-1: 0.423, Top-5: 0.573, Top-10: 0.627
proposal_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
proposal_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
proposal_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
proposal_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
0head_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
0head_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
0head_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
0head_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
1head_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
1head_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
1head_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
1head_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
2head_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
2head_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
2head_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
2head_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
3head_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
3head_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
3head_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
3head_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
4head_ Box given span (soft-token) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
4head_ Box given span (soft-token) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
4head_ Box given span (contrastive) Acc0.25: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000
4head_ Box given span (contrastive) Acc0.50: Top-1: 0.000, Top-5: 0.000, Top-10: 0.000

Why is one segment and one box?

problem of evaluation

When evaluating the model, how can I get the overall acc? I can't find overall acc in the output log.

group_free_pred_bboxes

How do you get the bboxes? Do the bboxes sizes are normalized? I can not visualize it in the ScanNetv2 dataset.

Question of visualization

Thanks for your excellent job.I have run through your code,but because I just entered this research field recently,so I have problem to visualize your results,such as detection frame like Figure 1.Would you please
release code of visualization?Thank you very much!

ImportError: Could not import _ext module.

I created the environment as README says, and train on the SR3D,here are the errors:

/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
"The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : train_dist_mod.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 2
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:3333
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_0/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972860) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_1/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972896) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=2
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_2/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_2/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972938) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 1/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=3
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_3/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_3/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972968) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:71: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0007307529449462891 seconds
{"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1972968", "role": "default", "hostname": "Continent", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [2]}", "agent_restarts": 3}}
{"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "1972969", "role": "default", "hostname": "Continent", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [1], "role_rank": [1], "role_world_size": [2]}", "agent_restarts": 3}}
{"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "Continent", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 3}}
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:354: UserWarning:


           CHILD PROCESS FAILED WITH NO ERROR_FILE                

CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 1972968 (local_rank 0) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

from torch.distributed.elastic.multiprocessing.errors import record

@record
def trainer_main(args):
# do train


warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py", line 173, in
main()
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py", line 169, in main
run(args)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/run.py", line 624, in run
)(*cmd_args)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


    train_dist_mod.py FAILED       

=======================================
Root Cause:
[0]:
time: 2023-12-23_12:33:23
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 1972968)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
[1]:
time: 2023-12-23_12:33:23
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 1972969)
error_file: <N/A>
msg: "Process failed with exitcode 1"


Can anybody help me?

Problem of text label

As mentioned in the paper, you have decoupled every text into 5 parts with corresponding labels, how can I get these labels in the code? I have tried to use 'positive_map', 'modify_positive_map', 'pron_positive_map', 'other_entity_map', 'rel_positive_map', but they don't seem to correspond to the texts. I would really appreciate it if you could reply to this question.

Problem of learning rate

Thanks for your excellent job!When doing experiments,I have find the model is particularly sensitive to learning rates.When training with one 3090GPU,without adjusting lr,the results are as follows:
The long line below is the baseline,the other short lines are the results training on single GPU without adjusting lr
113573aee00e537d09002d5b23e169e
9f5315541308ee4fd5e3e26c3fdabb3
Even when training with two 3090GPUs,adjusting learing rate such as multiple 2 or multiple 1.4 will show the same phenomenon.
So do you have any advice for this problem?Thank you very much!

How can --use_contrastive_align be disabled?

Hello,
I am trying to check performance without the use of semantic alignment loss, but it seems as though it is necessary to calculate proj_tokens for grounding evaluation. How can I change the setting to replicate the ablation experiment in your paper regarding the loss functions?

Overfitting when training with 2 GPUs

Hi,
I tried to train EDA for ScanRefer with 2 A100 GPUs following all your settings, but the performance is about 2% lower than yours. Meanwhile, I obverved the overfitting as the figure shows( after about 21 epochs, the training losses have been decreasing, but verification losses are rising).
image

I am confused about the log.txt file

  1. I downloaded your publicly available log file log_67_6.txt regarding Sr3D, and noticed something strange: the max_epoch parameter recorded in the log file is 300, but according to the log file, the training lasted for only 48 epochs. Can you explain why? Did you manually stop the program?

  2. When I ran the command sh scripts/train_sr3d.sh, the log.txt showed that the max_epoch was 400, and the training seemed to really continue towards 400 epochs. Therefore, I had to manually stop the program at the 75th epoch.

  3. In your published log file, log_67_6.txt, detailed validation rerults is output every 3 epochs, but this does not happen when I run sh scripts/train_sr3d.sh. I have included a link to my log file for your reference at any time.

Problem of Text-Decoupling.

Text decoupling takes a lot of time at each training time and is not very friendly to debug the code. How can we improve this problem? Thans for your great work.

Data processing for VG-w/o-ON

Thanks for your great job.
I observe that EDA proposes a new task, namely VG-w/o-ON. However, I do not know how to get the object name in the sentence. Can you provide the code to process the data for VG-w/o-ON.

Question about Table 1 and the joint_det setting

Dear authors,

Thanks for your great job. I have question about the --joint_det setting. I find it is set in all the scripts, which means you always train the model with additional utterance data constructed from ScanNet object labels.

Is it a widely-used protocol in 3D Visual Grounding? And is the performance in Tab. 1 come from the joint_det setting?

Best,

The pretrained PointNet++ backbone

Hi, is file "gf_detector_l6o256.pth" the backbone weights of the Group-Free which is trained on ScanNet with a vocabulary of 485 object categories ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.