yanmin-wu / eda Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
License: Other
[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
License: Other
Dear authors,
Thanks for your great job. I have question about the --joint_det setting. I find it is set in all the scripts, which means you always train the model with additional utterance data constructed from ScanNet object labels.
Is it a widely-used protocol in 3D Visual Grounding? And is the performance in Tab. 1 come from the joint_det setting?
Best,
Thanks for your excellent job.I have run through your code,but because I just entered this research field recently,so I have problem to visualize your results,such as detection frame like Figure 1.Would you please
release code of visualization?Thank you very much!
Thanks for your great job!!!
I am really interested in this paper. I wonder when will the code be released.
To freeze the text encoder, you set the param.requires_grad
as False
:
self.tokenizer = RobertaTokenizerFast.from_pretrained(t_type, local_files_only=True)
self.text_encoder = RobertaModel.from_pretrained(t_type, local_files_only=True)
for param in self.text_encoder.parameters():
param.requires_grad = False
However, to make sure the Dropout layer in the RobertaModel
work as in evaluation (do not drop out neurons randomly), we need add self.text_encoder.eval()
at the end of the codes above:
self.tokenizer = RobertaTokenizerFast.from_pretrained(t_type, local_files_only=True)
self.text_encoder = RobertaModel.from_pretrained(t_type, local_files_only=True)
for param in self.text_encoder.parameters():
param.requires_grad = False
self.text_encoder.eval()
Hi, thanks for you great work.
But I have a question about "point_instance_label" when reading the code.
In src/joint_det_dataset.py
, function _get_target_boxes
:
point_instance_label = -np.ones(len(scan.pc))
for t, tid in enumerate(tids):
point_instance_label[scan.three_d_objects[tid]['points']] = t
Is there any problem with setting the label of the point to the sequence index t
?
In my opinion, the code should be modified as follows:
point_instance_label = -np.ones(len(scan.pc))
for t, tid in enumerate(tids):
point_instance_label[scan.three_d_objects[tid]['points']] = tid
In this way, all points in the same scene can be correctly classified into the corresponding object IDs.
If we set the point_instance_label
to t
, the label of the point can only be 0 on the scanrefer dataset when joint_det
is false, which leads to logic errors.
Can i ask you for the visualization code about plotting the bounding box on the RGB-D data and save as the 2D RGB image, as shown in your figures?
When reading the scannet dataset, scans are obtained, and under each scans are fields named "choices" and "new_pts." What do these two fields represent?
I created the environment as README says, and train on the SR3D,here are the errors:
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
"The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK')
instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : train_dist_mod.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 2
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:3333
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}
INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_0/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972860) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=1
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_1/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972896) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=2
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_2/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_2/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972938) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 1/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=3
master_addr=127.0.0.1
master_port=3333
group_rank=0
group_world_size=1
local_ranks=[0, 1]
role_ranks=[0, 1]
global_ranks=[0, 1]
role_world_sizes=[2, 2]
global_world_sizes=[2, 2]
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_3/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_2x3bsv88/none_l9qwazu3/attempt_3/1/error.json
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
Traceback (most recent call last):
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 26, in
import pointnet2._ext as _ext
ImportError: /home/sd/.local/lib/python3.7/site-packages/pointnet2-0.0.0-py3.7-linux-x86_64.egg/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNSt15__exception_ptr13exception_ptr9_M_addrefEv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_dist_mod.py", line 19, in
from main_utils import parse_option, BaseTrainTester
File "/home/sd/Harddisk/sba/BS/EDA-master/main_utils.py", line 27, in
from models import HungarianMatcher, SetCriterion, compute_hungarian_loss
File "/home/sd/Harddisk/sba/BS/EDA-master/models/init.py", line 7, in
from .bdetr import BeaUTyDETR
File "/home/sd/Harddisk/sba/BS/EDA-master/models/bdetr.py", line 18, in
from .backbone_module import Pointnet2Backbone
File "/home/sd/Harddisk/sba/BS/EDA-master/models/backbone_module.py", line 23, in
from pointnet2_modules import PointnetSAModuleVotes, PointnetFPModule
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_modules.py", line 21, in
import pointnet2_utils
File "/home/sd/Harddisk/sba/BS/EDA-master/pointnet2/pointnet2_utils.py", line 30, in
"Could not import _ext module.\n"
ImportError: Could not import _ext module.
Please see the setup instructions in the README: https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1972968) of binary: /home/sd/anaconda3/envs/EDA/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:71: FutureWarning: This is an experimental API and will be changed in future.
"This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0007307529449462891 seconds
{"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "1972968", "role": "default", "hostname": "Continent", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [0], "role_rank": [0], "role_world_size": [2]}", "agent_restarts": 3}}
{"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "1972969", "role": "default", "hostname": "Continent", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{"message": ""}", "metadata": "{"group_world_size": 1, "entry_point": "python", "local_rank": [1], "role_rank": [1], "role_world_size": [2]}", "agent_restarts": 3}}
{"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "Continent", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{"group_world_size": 1, "entry_point": "python"}", "agent_restarts": 3}}
/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:354: UserWarning:
CHILD PROCESS FAILED WITH NO ERROR_FILE
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 1972968 (local_rank 0) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:
from torch.distributed.elastic.multiprocessing.errors import record
@record
def trainer_main(args):
# do train
warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py", line 173, in
main()
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launch.py", line 169, in main
run(args)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/run.py", line 624, in run
)(*cmd_args)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/sd/anaconda3/envs/EDA/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train_dist_mod.py FAILED
Other Failures:
[1]:
time: 2023-12-23_12:33:23
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 1972969)
error_file: <N/A>
msg: "Process failed with exitcode 1"
Can anybody help me?
Text decoupling takes a lot of time at each training time and is not very friendly to debug the code. How can we improve this problem? Thans for your great work.
Thanks for your great job.
I observe that EDA proposes a new task, namely VG-w/o-ON. However, I do not know how to get the object name in the sentence. Can you provide the code to process the data for VG-w/o-ON.
I downloaded your publicly available log file log_67_6.txt
regarding Sr3D, and noticed something strange: the max_epoch
parameter recorded in the log file is 300, but according to the log file, the training lasted for only 48 epochs. Can you explain why? Did you manually stop the program?
When I ran the command sh scripts/train_sr3d.sh
, the log.txt showed that the max_epoch
was 400, and the training seemed to really continue towards 400 epochs. Therefore, I had to manually stop the program at the 75th epoch.
In your published log file, log_67_6.txt, detailed validation rerults is output every 3 epochs, but this does not happen when I run sh scripts/train_sr3d.sh
. I have included a link to my log file for your reference at any time.
As mentioned in the paper, you have decoupled every text into 5 parts with corresponding labels, how can I get these labels in the code? I have tried to use 'positive_map', 'modify_positive_map', 'pron_positive_map', 'other_entity_map', 'rel_positive_map', but they don't seem to correspond to the texts. I would really appreciate it if you could reply to this question.
Thanks for your work, may I obtain the visualization code?
When evaluating the model, how can I get the overall acc? I can't find overall acc in the output log.
How do you get the bboxes? Do the bboxes sizes are normalized? I can not visualize it in the ScanNetv2 dataset.
Hi, is file "gf_detector_l6o256.pth" the backbone weights of the Group-Free which is trained on ScanNet with a vocabulary of 485 object categories ?
Thanks for your excellent job!When doing experiments,I have find the model is particularly sensitive to learning rates.When training with one 3090GPU,without adjusting lr,the results are as follows:
The long line below is the baseline,the other short lines are the results training on single GPU without adjusting lr
Even when training with two 3090GPUs,adjusting learing rate such as multiple 2 or multiple 1.4 will show the same phenomenon.
So do you have any advice for this problem?Thank you very much!
Thanks for your great job.
I notice that only Unique and Multiple performances are provided in your code. Can you share validation code for Overall Performance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.