warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[ ] 4/6019, 0.4 task/s, elapsed: 9s, ETA: 14005s/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6084/6019, 21.4 task/s, elapsed: 284s, ETA: -2sWARNING:torch.distributed.elastic.multiprocessing.api:Sending process 795432 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 795433 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 795434 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 3 (pid: 795435) of binary: /opt/conda/envs/fbocc/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/fbocc/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/fbocc/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 317, in run_module
run_module_as_main(options.target, alter_argv=True)
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 238, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/fbocc/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
tools_mm/test.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-08-30_01:11:31
host : fd-taxi-f-houjiawei-1691719963348-985c0990-3609275187
rank : 3 (local_rank: 3)
exitcode : -9 (pid: 795435)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 795435
============================================================
The evaluation run 6084 samples rather than 6019 samples and closed unexpectly.