opendrivelab / lanesegnet Goto Github PK

View Code? Open in Web Editor NEW

229.0 229.0 25.0 3.3 MB

[ICLR 2024] Map Learning with Lane Segment for Autonomous Driving

License: Apache License 2.0

Python 99.74% Shell 0.26%

autonomous-driving lane-segment laneline-detection online-mapping topology-reasoning

lanesegnet's People

Contributors

Stargazers

Watchers

lanesegnet's Issues

Evaluate Result

Hi, Thanks for your works!
In your paper, Top_{lsls} is about 8.1, in your github repo, result is 25+ , why ? The code is different ? or evaluate code changed ?
I use A100 to reproduce, 4(sample gpu) * 2(card), the result is :
Epoch(val) [24][2403] mAP: 0.3225, AP_ls: 0.32722604274749756, AP_ped: 0.31785154342651367, TOP_lsls: 0.0660
What could be wrong?~

point cloud range change，only use front camera

Hi：
Thanks for your great job!
I wanted to know if i want to detect lanes only in front of camera.what else i would have to change?
i have changed point_cloud_range = [-0.8, -12.8, -2.3, 25.6, 12.8, 1.7],num_cams = 1, *.pkl only save ring_front_center sensor,

Best wishes!

About evaluation results

Hi Tianyu,

I really appreciate your great work.
I utilized the provided latest.pth and dist_test.sh for evaluation, and obtained the results (mAP: 19.03, AP_ls: 31.96, AP_ped: 6.10, TOP_lsls: 25.38). Why is there such a significant discrepancy between the AP_ped and the ones you publicly shared? Is there something I might have overlooked?

Looking forward to your response!

ped_cross points num ?

in dataset :

    def ped2lane_segment(self, points):
        assert points.shape[0] == 5
        dir_vector = points[1] - points[0]
        dir = np.rad2deg(np.arctan2(dir_vector[1], dir_vector[0]))

        if dir < -45 or dir > 135:
            left_boundary = points[[2, 3]]
            right_boundary = points[[1, 0]]
        else:
            left_boundary = points[[0, 1]]
            right_boundary = points[[3, 2]]
        
        centerline = LineString((left_boundary + right_boundary) / 2)
        left_boundary = LineString(left_boundary)
        right_boundary = LineString(right_boundary)

        return centerline, left_boundary, right_boundary

you assert points.shape[0] == 5, but in mapbucket info, some ped cross is not 5 points, so it has assert error....
how to solve it ?

CHILD PROCESS FAILED WITH NO ERROR_FILE

When I trained the model, it crashed and said " CHILD PROCESS FAILED WITH NO ERROR_FILE". How can I fix it?

EOFError: Ran out of input

Hello, I am having trouble running the training for the model. Each issue I have had so far, I have been able to figure out myself, but this is the first that I truly didn't understand.

Traceback (most recent call last):

  File "tools/train.py", line 320, in <module>
    main()
  File "tools/train.py", line 309, in main
    train_model(
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\mmdet3d\apis\train.py", line 344, in train_model
    train_detector(
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\mmdet3d\apis\train.py", line 319, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 130, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\[user]\AppData\Local\anaconda3\envs\lanesegnet\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

From what I could find, the error could be coming from a file that is being overwritten, but I could not find any evidence of that happening in the various library files. Any assistance would be greatly appreciated. Thank you very much!

checkpoint results

Hello,
I'm using the official checkpoint to run test and find the scores of "AP_ped" is much lower than the one reported in the log. My reproduced result and the logged results are respectively,
{'mAP': 0.19033822417259216, 'AP_ls': 0.31963453, 'AP_ped': 0.061041933, 'TOP_lsls': 0.2537}
{'mAP': 0.3345, 'AP_ls': 0.31965008, 'AP_ped': 0.349356800, 'TOP_lsls': 0.2538}.
The detailed outputs of my test is

++ date +%y%m%d.%H%M%S
+ timestamp=240124.211800
+ WORK_DIR=work_dirs/lanesegnet
+ CONFIG=projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py
+ CHECKPOINT=work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
+ GPUS=2
+ PORT=28510
+ python -m torch.distributed.run --nproc_per_node=2 --master_port=28510 tools/test.py projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth --launcher pytorch --out-dir work_dirs/lanesegnet/test --eval openlane_v2
+ tee work_dirs/lanesegnet/test.240124.211800.log
WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>] 4806/4806, 15.3 task/s, elapsed: 313s, ETA:     0s2024-01-24 21:26:00,097 - mmdet - INFO - Starting format results...


2024-01-24 21:34:24,142 - mmdet - INFO - Starting openlanev2 evaluate...
calculating distances:: 100%|███████████████| 4806/4806 [19:57<00:00,  4.01it/s]
{'mAP': 0.19033822417259216, 'AP_ls': 0.31963453, 'AP_ped': 0.061041933, 'TOP_lsls': 0.2537472264730754}

Can you help me to find out whether I configure something wrong?

about visualization

when load config, I got "No module named 'projects'" error
I need to do some debug if I want to run single_gpu_test, but the MMdetection3D visualization API show_results seems can't support the visualization
so I use '--evaluate' finally, but I found the function show of OpenLaneV2_subset_A_LaneSegNet_Dataset isn't impelemented yet

so how can I get the visualization results? Is the code ready to do it? If not, when will you impelement it?

thx

Temporal fusion

Hi,
It's me again. I've noticed you have some parts of the code containing functions for training with previous frames. Have you run some tests with these and could share some numbers on them?
Best regards,
Pham

Batch_size

Hi,
Great work! I want to ask whether there is a performance hit when training with batch size > 1?
Best regards,
Pham

segmentation fault

I followed the instruction to setup a conda env and install all package, but when I ran test.sh, I got a segmentation fault. I did some debug and it seems it is because of the following line from openlanev2.lanesegment.evaluation.distance import (pairwise)

Have anyone ever successfully run this repo?

Import Error "import_modules_from_strings(**cfg_dict['custom_imports'])"

Hello, when I try to test model provided in the repo, I met some import error. And I have verified my "~/.bashrc" file like another issue mentioned (export PYTHONPATH in the bashrc file). After that, the code still reports error.
I try to locate the error by my self, and I find "Segmentation fault (core dumped)" happened in "import projects.lanesegnet". So I wonder whether you guys can give me some suggestions or not.

failed to download your trained pth file, could you share again, thanks

potential bug in loading pipeline

there is a unnecessary comma in
self.with_area = with_area,

which might cause the flag being always True.

Training issue

Hi, when I running the training scripts, I got this issue.

Traceback (most recent call last):
File "/home/gpu/anaconda3/envs/lanesegnet/lib/python3.8/site-packages/mmcv/utils/misc.py", line 73, in import_modules_from_strings
imported_tmp = import_module(imp)
File "/home/gpu/anaconda3/envs/lanesegnet/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 961, in _find_and_load_unlocked
File "", line 219, in _call_with_frames_removed
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'projects'

The det_t is only 36.09.

I trained the model with map_bucket config. The traffic element detection result is only 36.09. I notice that the realization of bbox_head is the same as TopoNet, but the det_t of TopoNet is 48.1. I wondered if I did something wrong? Is there any explanation of this phenomenon?

Performance drop if set --autoscale-lr

Hi, I conducted two reproduction experiments on the same openlane_v2 dataset. Both experiments were trained using 8 NVIDIA Tesla V100 GPUs with a total batch size of 8, over the course of 24 training epochs. The only difference was whether the --autoscale_lr was set or not. I found that when the --autoscale_lr was enabled, there was a significant drop in performance.

From the code, whether to set this parameter does not affect the learning rate，So I'm confused about this.

reproduced result with --autoscale_lr
{'mAP': 0.13650985062122345, 'AP_ls': 0.16275863, 'AP_ped': 0.11026106, 'TOP_lsls': 0.10861310987314955}
reproduced result without --autoscale_lr
{'mAP': 0.3187885582447052, 'AP_ls': 0.3224528, 'AP_ped': 0.3151243, 'TOP_lsls': 0.25731517483348604}

兄弟，你这个跟maptr有什么区别么 [无意引战，关闭issue了，主要是没太get到与maptr的区别]

大兄弟，好奇的想问一下，你这个跟maptr有啥区别，没太看出来

Migrate data to map element detection

hi, thanks for the great work.

In p13 of the paper, it states "Furthermore, we migrate our data to two subtasks: map detection and centerline perception."

To my knowledge, OpenLaneV2 provides lanelines highly coupled with lane segments, which are frequently divided when lanelines of either side change in type. Directly extracting each laneline from lane segments might lead to potential undesirable fragmentation of lanelines representation, which might confuse the model.

How exactly did you solve this problem? I am struggling merging these lanelines from lane segments.

Would there be any codes for migrating the data for map element detection?

Question about evaluation code

Hi,

Is there any difference between the evaluation code provided by this repository and the one actually used in the paper?
Because I found that there is a large gap between the TOP_lsls in the paper and this repository.

best,

训练时长

博主您好，我想问下整个训练过程大概要花费多长时间呢？

dist_test.sh error

./tools/dist_test.sh 1
++ date +%y%m%d.%H%M%S

timestamp=240322.110416
WORK_DIR=model
CONFIG=projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py
CHECKPOINT=model/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
GPUS=1
PORT=28510
python3.8 -m torch.distributed.run --nproc_per_node=1 --master_port=28510 tools/test.py projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py model/lanesegnet_r50_8x1_24e_olv2_subset_A.pth --launcher pytorch --out-dir model/test --eval openlane_v2
tee model/test.240322.110416.log
^C
root@10-0-5-174:/jfs/liutao/work_docker/LaneSegNet# ./tools/dist_test.sh 1
++ date +%y%m%d.%H%M%S
timestamp=240322.110820
WORK_DIR=model
CONFIG=projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py
CHECKPOINT=model/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
GPUS=1
PORT=28510
python3.8 -m torch.distributed.run --nproc_per_node=1 --master_port=28510 tools/test.py projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py model/lanesegnet_r50_8x1_24e_olv2_subset_A.pth --launcher pytorch --out-dir model/test --eval openlane_v2
tee model/test.240322.110820.log
load checkpoint from local path: model/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
completed: 0, elapsed: 0sTraceback (most recent call last):
File "tools/test.py", line 263, in
main()
File "tools/test.py", line 233, in main
outputs = multi_gpu_test(model, data_loader,
File "/usr/local/lib/python3.8/dist-packages/mmdet/apis/test.py", line 107, in multi_gpu_test
for i, data in enumerate(data_loader):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 359, in iter
return self._get_iterator()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 944, in init
self._reset(loader, first_iter=True)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 975, in _reset
self._try_put_index()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index
index = self._next_index()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 512, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/sampler.py", line 226, in iter
for idx in self.sampler:
File "/usr/local/lib/python3.8/dist-packages/mmdet/datasets/samplers/distributed_sampler.py", line 47, in iter
math.ceil(self.total_size / len(indices)))[:self.total_size]
ZeroDivisionError: division by zero
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 32766) is killed by signal: Terminated.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 32531) of binary: /usr/bin/python3.8
/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py:367: UserWarning:

           CHILD PROCESS FAILED WITH NO ERROR_FILE

CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 32531 (local_rank 0) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

from torch.distributed.elastic.multiprocessing.errors import record

@record
def trainer_main(args):
# do train

warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 702, in
main()
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 361, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 698, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

      tools/test.py FAILED

=======================================
Root Cause:
[0]:
time: 2024-03-22_11:08:41
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 32531)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
<NO_OTHER_FAILURES>

any tutorial to convert Argoverse V2 data into lane segment annotation?

Hi,

first, thanks a lot for your great work. Here I suppose you convert Argoverse V2 HD map data into lane segment annotation automatically. Could you provide the function？ Thx!

大型路口+上下文不全时的拓扑连接鲁棒性

作者你好，真是非常棒的工作啊！
想请教一下，使用预测Lane Segment的状态信息和Ls-Ls彼此之间的逻辑连接关系时，局部上下文的完整性是不是一个非常重要的因素呢？
当遇到一个大型路口，或者是路口处车辆较多且较远处的遮挡比较严重，导致进入车道与退出车道无法在当前时间戳下的环视输入中同时观测完全，那这种情况下，进入LaneSegment和退出LaneSegment之间的逻辑topo连接以及预测的虚拟线是否还鲁棒呢？
想请教多问题有点多，看到主创团队都是国人，就用中文提问啦，期待回复。

ms_deform_attn_impl_forward implementation for cuda not found

I was able to install the env successfully. However, when I run the dist_test.sh, I got the following error:

RuntimeError: ms_deform_attn_impl_forward: implementation for device cuda:1 not found.

I did some search, looks like this is a CUDA issue. However, I was able to run the following python script successfully

import mmcv
import mmcv.ops.

How can I fix this implementation for device cuda:1 not found issue?

--show option error

Hello,

Thank you for great work.
I'm really interested in this project.

Firstly, I could run the estimation with learned coefficients on my 2 GPUs (RTX3070) environment and get the below result.

(lanesegnet) root@my-pc:/opt/LaneSegNet# ./tools/dist_test.sh 2

# omitting log ...

load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4806/4806, 8.6 task/s, elapsed: 556s, ETA:     0s2024-02-21 14:04:12,028 - mmdet - INFO - Starting format results...
2024-02-21 14:08:19,540 - mmdet - INFO - Starting openlanev2 evaluate...
calculating distances:: 100%|███████████████| 4806/4806 [09:46<00:00,  8.20it/s]
{'mAP': 0.3345044255256653, 'AP_ls': 0.31961587, 'AP_ped': 0.349393, 'TOP_lsls': 0.2537521315543861}

And then I tried to run the estimation with --show option.
As a result below error occured.

Could you give me any tips to solve this error??

sincerely.

(lanesegnet) root@my-pc:/opt/LaneSegNet# ./tools/dist_test.sh 2 --show
++ date +%y%m%d.%H%M%S
+ timestamp=240221.151626
+ WORK_DIR=work_dirs/lanesegnet
+ CONFIG=projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py
+ CHECKPOINT=work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
+ GPUS=2
+ PORT=28510
+ python -m torch.distributed.run --nproc_per_node=2 --master_port=28510 tools/test.py projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth --launcher pytorch --out-dir work_dirs/lanesegnet/test --eval openlane_v2 --show
+ tee work_dirs/lanesegnet/test.240221.151626.log
WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4806/4806, 8.6 task/s, elapsed: 559s, ETA:     0sERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 1 (pid: 2738) of binary: /opt/miniconda3/envs/lanesegnet/bin/python
/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning: 

**********************************************************************
               CHILD PROCESS FAILED WITH NO ERROR_FILE                
**********************************************************************
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 2738 (local_rank 1) FAILED (exitcode -11)
Error msg: Signal 11 (SIGSEGV) received by PID 2738
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

  from torch.distributed.elastic.multiprocessing.errors import record

  @record
  def trainer_main(args):
     # do train
**********************************************************************
  warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module>
    main()
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
    return f(*args, **kwargs)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main
    run(args)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
*************************************************
               tools/test.py FAILED              
=================================================
Root Cause:
[0]:
  time: 2024-02-21_15:26:02
  rank: 1 (local_rank: 1)
  exitcode: -11 (pid: 2738)
  error_file: <N/A>
  msg: "Signal 11 (SIGSEGV) received by PID 2738"
=================================================
Other Failures:
  <NO_OTHER_FAILURES>
*************************************************

/opt/miniconda3/envs/lanesegnet/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 32 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

(lanesegnet) root@my-pc:/opt/LaneSegNet# ./tools/dist_test.sh 2 --show
++ date +%y%m%d.%H%M%S
+ timestamp=240221.154229
+ WORK_DIR=work_dirs/lanesegnet
+ CONFIG=projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py
+ CHECKPOINT=work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
+ GPUS=2
+ PORT=28510
+ python -m torch.distributed.run --nproc_per_node=2 --master_port=28510 tools/test.py projects/configs/lanesegnet_r50_8x1_24e_olv2_subset_A.py work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth --launcher pytorch --out-dir work_dirs/lanesegnet/test --eval openlane_v2 --show
+ tee work_dirs/lanesegnet/test.240221.154229.log
WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
load checkpoint from local path: work_dirs/lanesegnet/lanesegnet_r50_8x1_24e_olv2_subset_A.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4806/4806, 8.6 task/s, elapsed: 556s, ETA:     0s
[E ProcessGroupNCCL.cpp:566] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1806124 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:325] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1806124 milliseconds before timing out.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 0 (pid: 1408) of binary: /opt/miniconda3/envs/lanesegnet/bin/python
/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning: 

**********************************************************************
               CHILD PROCESS FAILED WITH NO ERROR_FILE                
**********************************************************************
CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 1408 (local_rank 0) FAILED (exitcode -6)
Error msg: Signal 6 (SIGABRT) received by PID 1408
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

  from torch.distributed.elastic.multiprocessing.errors import record

  @record
  def trainer_main(args):
     # do train
**********************************************************************
  warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module>
    main()
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
    return f(*args, **kwargs)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main
    run(args)
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/miniconda3/envs/lanesegnet/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
************************************************
              tools/test.py FAILED              
================================================
Root Cause:
[0]:
  time: 2024-02-21_16:22:06
  rank: 0 (local_rank: 0)
  exitcode: -6 (pid: 1408)
  error_file: <N/A>
  msg: "Signal 6 (SIGABRT) received by PID 1408"
================================================
Other Failures:
  <NO_OTHER_FAILURES>
************************************************

AssertionError: points.shape[0] == 5

in ped2lane_segment
assert points.shape[0] == 5
AssertionError

Performance of LaneSegNet for Map Element Bucket

Thanks for the great job!
I ‘ve trained the new LaneSegNet(for Map Element Bucket), and when checking if the reproduction was successful, I didn't find it's performance results in the repository. Can you please publish them?

wrong in the corner and how to visualize bev features?

Hi,
Recently I try to train lanesegnet on my own dataset, but I find that the predictions are often wrong when the car turns a corner. Did you meet wrong prediction when the rotation change lots?
In addition, I wanna draw bev features, but the results seem not correct as below:
1. calculate the norm the output of BEVFormerEncoder.forward(),(encoder output size is [1,num_query,256], norm result size is [num_query])
2. then, create image which size is [bev_w,bev_h]
3. calculate pixel position by ref_2d and pc_range, and set norm of bev feature as color
Is there any problem? How to draw bev feature correctly?

OpenLane_v2数据集下载文件缺失

非常感谢作者的优秀工作！
我想在您提供的数据集上训练复现论文效果，根据指引，我在 OpenLane-V2 仓内下载了相关数据集，但是训练的时候，提示文件缺失FileNotFoundError: [Errno 2] No such file or directory: 'data/OpenLane-V2/train/00447/image/ring_front_center/315971962049927215.jpg'

检查后发现 data/OpenLane-V2/train/00447/ 文件夹是存在的，但是Image文件夹缺失
同时我注意到 “The Map Element Bucket has been updated as of October 2023. Please ensure you download the most recent data!”

我怀疑是我没有正确下载到最新版本的数据，所以想问一下我应该怎么获取正确的数据？

question about gt centerline?

Hi, I noticed the centerline gt was generated by left_lane and right_lane in "LaneSegmentParameterized3D", what is the difference between original gt centerline and the re-generated centerline?

centerline = (left_line + right_line) / 2.0

opendrivelab / lanesegnet Goto Github PK

lanesegnet's People

Contributors

Stargazers

Watchers

Forkers

lanesegnet's Issues

======================================= Root Cause: [0]: time: 2024-03-22_11:08:41 rank: 0 (local_rank: 0) exitcode: 1 (pid: 32531) error_file: <N/A> msg: "Process failed with exitcode 1"

Recommend Projects

Recommend Topics

Recommend Org

=======================================
Root Cause:
[0]:
time: 2024-03-22_11:08:41
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 32531)
error_file: <N/A>
msg: "Process failed with exitcode 1"