Hi, I have downloaded the Mp3D dataset and I am trying to run train.sh file using a single GPU with small batch size =2 for debugging.
python train.py --batch-size 2 --folder 'mp3d' --num_workers 0 \
--resume --accumulation 'alphacomposite' \
--model_type 'zbuffer_pts' --refine_model_type 'resnet_256W8UpDown64' \
--norm_G 'sync:spectral_batch' --gpu_ids 0 --render_ids 1 \
--suffix '' --normalize_image --lr 0.0001
I also have changed Mp3D directory inside the options.py and some hard-coded str in the train_options.py. However , I am experiencing some broken pip error related to glX context does not support multiple GPUs.
phong@phong-Server:~/data/Work/Paper3/Code/synsin$ ./train.sh
Timestamp 2020-03-01
./checkpoint/phong/logging/viewsynthesis3d/mp3d//2020-03-01/mp3d/models/lr0.00010_bs2_modelzbuffer_pts_splxyblending/noise_bnsync:spectral_batch_refresnet_256W8UpDown64_dunet_camxysFalse|False/_init_databoth_seed0/_multiFalse_losses1.0|l110.0|content_izFalse_alphaFalse__vol_ganpix2pixHD/
Model ./checkpoint/phong/logging/viewsynthesis3d/mp3d//2020-03-01/mp3d/models/lr0.00010_bs2_modelzbuffer_pts_splxyblending/noise_bnsync:spectral_batch_refresnet_256W8UpDown64_dunet_camxysFalse|False/_init_databoth_seed0/_multiFalse_losses1.0|l110.0|content_izFalse_alphaFalse__vol_ganpix2pixHD//model_epoch.pth
Loading dataset mp3d ...
Loading model %s ...
RESNET encoder
RESNET decoder
['1.0_l1', '10.0_content']
<zip object at 0x7fea940d0bc8>
./checkpoint/phong/logging/viewsynthesis3d/mp3d//%s//2020-03-01/mp3d/runs/lr0.00010_bs2_modelzbuffer_pts_splxyblending/noise_bnsync:spectral_batch_refresnet_256W8UpDown64_dunet_camxysFalse|False/_init_databoth_seed0/_multiFalse_losses1.0|l110.0|content_izFalse_alphaFalse__vol_ganpix2pixHD/
[0]
Starting run...
WARNING: Model path does not exist??
./checkpoint/phong/logging/viewsynthesis3d/mp3d//2020-03-01/mp3d/models/lr0.00010_bs2_modelzbuffer_pts_splxyblending/noise_bnsync:spectral_batch_refresnet_256W8UpDown64_dunet_camxysFalse|False/_init_databoth_seed0/_multiFalse_losses1.0|l110.0|content_izFalse_alphaFalse__vol_ganpix2pixHD//model_epoch.pth
Loading train dataset ....
Loaded train dataset ...
Starting epoch 0
At train
Restarting image_generator.... with seed 0 in train mode? True
gpu_id 1
data/scene_episodes/mp3d_train
One ep per scene
61
2020-03-27 10:15:28,995 initializing sim Sim-v0
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 10:15:28.999747 3051 Simulator.cpp:62] Loading navmesh from /media/phong/Data2TB/dataset/Matterport/Habitat/v1/tasks/mp3d_habitat//mp3d/gZ6f7yhEvPG/gZ6f7yhEvPG.navmesh
2020-03-27 10:15:29,003 initializing sim Sim-v0
2020-03-27 10:15:29,011 initializing sim Sim-v0
2020-03-27 10:15:29,011 initializing sim Sim-v0
I0327 10:15:29.017060 3051 Simulator.cpp:64] Loaded.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 10:15:29.017076 3049 Simulator.cpp:62] Loading navmesh from /media/phong/Data2TB/dataset/Matterport/Habitat/v1/tasks/mp3d_habitat//mp3d/JeFG25nYj2p/JeFG25nYj2p.navmesh
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 10:15:29.017110 3048 Simulator.cpp:62] Loading navmesh from /media/phong/Data2TB/dataset/Matterport/Habitat/v1/tasks/mp3d_habitat//mp3d/1pXnuDYAj8r/1pXnuDYAj8r.navmesh
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 10:15:29.017143 3050 Simulator.cpp:62] Loading navmesh from /media/phong/Data2TB/dataset/Matterport/Habitat/v1/tasks/mp3d_habitat//mp3d/S9hNv5qa7GM/S9hNv5qa7GM.navmesh
2020-03-27 10:15:29,025 initializing sim Sim-v0
I0327 10:15:29.029383 3049 Simulator.cpp:64] Loaded.
I0327 10:15:29.043566 3051 SceneGraph.h:92] Created DrawableGroup:
F0327 10:15:29.043596 3051 WindowlessContext.cpp:232] Check failed: device == 0 (1 vs. 0) glX context does not support multiple GPUs. Please compile with BUILD_GUI_VIEWERS=0 for multi-gpu support via EGL
I0327 10:15:29.043599 3049 SceneGraph.h:92] Created DrawableGroup:
F0327 10:15:29.043618 3049 WindowlessContext.cpp:232] Check failed: device == 0 (1 vs. 0) glX context does not support multiple GPUs. Please compile with BUILD_GUI_VIEWERS=0 for multi-gpu support via EGL
*** Check failure stack trace: ***
*** Check failure stack trace: ***
I0327 10:15:29.043759 3048 Simulator.cpp:64] Loaded.
I0327 10:15:29.043802 3050 Simulator.cpp:64] Loaded.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 10:15:29.043810 3052 Simulator.cpp:62] Loading navmesh from /media/phong/Data2TB/dataset/Matterport/Habitat/v1/tasks/mp3d_habitat//mp3d/qoiz87JEwZ2/qoiz87JEwZ2.navmesh
I0327 10:15:29.053377 3048 SceneGraph.h:92] Created DrawableGroup:
F0327 10:15:29.053406 3048 WindowlessContext.cpp:232] Check failed: device == 0 (1 vs. 0) glX context does not support multiple GPUs. Please compile with BUILD_GUI_VIEWERS=0 for multi-gpu support via EGL
*** Check failure stack trace: ***
I0327 10:15:29.053519 3050 SceneGraph.h:92] Created DrawableGroup:
F0327 10:15:29.053547 3050 WindowlessContext.cpp:232] Check failed: device == 0 (1 vs. 0) glX context does not support multiple GPUs. Please compile with BUILD_GUI_VIEWERS=0 for multi-gpu support via EGL
I0327 10:15:29.053588 3052 Simulator.cpp:64] Loaded.
*** Check failure stack trace: ***
I0327 10:15:29.054833 3052 SceneGraph.h:92] Created DrawableGroup:
F0327 10:15:29.054863 3052 WindowlessContext.cpp:232] Check failed: device == 0 (1 vs. 0) glX context does not support multiple GPUs. Please compile with BUILD_GUI_VIEWERS=0 for multi-gpu support via EGL
*** Check failure stack trace: ***
Traceback (most recent call last):
File "train.py", line 370, in <module>
run(model, Dataset, log_path, plotter, CHECKPOINT_tempfile)
File "train.py", line 265, in run
epoch, train_data_loader, model, log_path, plotter, opts
File "train.py", line 93, in train
iter_data_loader, isval=False, num_steps=opts.num_accumulations
File "/home/phong/data/Work/Paper3/Code/synsin/models/base_model.py", line 108, in __call__
t_losses, output_images = self.model(next(dataloader))
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/phong/data/Work/Paper3/Code/synsin/data/habitat_data.py", line 119, in __getitem__
self.__restart__()
File "/home/phong/data/Work/Paper3/Code/synsin/data/habitat_data.py", line 42, in __restart__
seed=self.worker_id + self.seed,
File "/home/phong/data/Work/Paper3/Code/synsin/data/create_rgb_dataset.py", line 189, in __init__
multiprocessing_start_method="forkserver",
File "/home/phong/data/Work/Paper3/Libraries/habitat-api/habitat/core/vector_env.py", line 135, in __init__
read_fn() for read_fn in self._connection_read_fns
File "/home/phong/data/Work/Paper3/Libraries/habitat-api/habitat/core/vector_env.py", line 135, in <listcomp>
read_fn() for read_fn in self._connection_read_fns
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7fea87773ef0>>
Traceback (most recent call last):
File "/home/phong/data/Work/Paper3/Libraries/habitat-api/habitat/core/vector_env.py", line 468, in __del__
self.close()
File "/home/phong/data/Work/Paper3/Libraries/habitat-api/habitat/core/vector_env.py", line 350, in close
write_fn((CLOSE_COMMAND, None))
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/home/phong/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
I guess this error comes from the habitat-sim and it is not coming from this repo synsin.