用自己的数据训练LGPMA报错ValueError: cannot convert float NaN to integer about davar-lab-ocr HOT 10 CLOSED

hikopensource commented on July 27, 2024

用自己的数据训练LGPMA报错ValueError: cannot convert float NaN to integer

from davar-lab-ocr.

Comments (10)

qiaoliang6 commented on July 27, 2024 3

@tucachmo2202 Not yet. The train process works smoothly with PubTabNet dataset, but draws error with my own data.

how did you create your dataset? I custom this git follow pubnet format. When training with my dataset, sometimes, it raise error row = [cellnp[i, 0], cellnp[i, 2]] IndexError: index 132 is out of bounds for axis 0 with size 124 sometimes it raise ValueError: cannot convert float NaN to integer

We find that many people may different problems in custom the model on their own datasets. We will provide more examples about how the regular datalists format.

from davar-lab-ocr.

le8888e commented on July 27, 2024 1

@tucachmo2202 I create my dataset with the same git repository and convert it to Davar format by running DAVAR-Lab-OCR/blob/demo/table_recognition/lgpma/tools/convert_html_ann.py.
I came up with the same error as you, but I haven't followed this project for a long time. Sorry.

from davar-lab-ocr.

qiaoliang6 commented on July 27, 2024 1

@qiaoliang6 , Hope you provide soon. Thank you very much!

Could you please provide us some of the generated data image and its corresponding datalist (You may send them via email or share in an online drive)? This would help us find the problem quickly.

from davar-lab-ocr.

qiaoliang6 commented on July 27, 2024 1

@qiaoliang6 , Hope you provide soon. Thank you very much!

@tucachmo2202 Thank you for providing samples. In this samples, we find that the orignal annoation in html has the mismatch problem, i.e., the number of bbox does not match with the number of "<td></td>" in html. So in the latest update c85ca3f, we modify the conversion script to filter out the illegal samples. See demo/table_recognition/datalist/ReadMe.md for more details.

from davar-lab-ocr.

qiaoliang6 commented on July 27, 2024

您好，我用自己的表格数据集（PubTabNet格式的）训练LGPMA，报了错DAVAR-Lab-OCR/davarocr/davarocr/davar_table/core/mask/lp_mask_target.py", line 55, in get_lpmask_single middle_x, middle_y = round(np.where(box_text == 1)[1].mean()), round(np.where(box_text == 1)[0].mean()) ValueError: cannot convert float NaN to integer

我准备训练集的方法是：用DAVAR-Lab-OCR/demo/table_recognition/lgpma/tools/convert_html_ann.py将我的数据集转成davar格式的。其中对convert_html_ann.py的html_to_davar函数做了一些修改："labels"用0和1代替"t-head"和"t-body"；content_ann只返回"bboxes"、"cells"和"labels"。转出来的数据格式和您公开的PubTabNet_train_datalist_all.json 格式一致，但是训练时报了上述错误。想请教下可能是哪一步出了问题，谢谢

令：用您公开的PubTabNet_train_datalist_all.json 作训练集，可以正常训练

是不是这个函数里面的gt_bboxes中包含那种不包含文本的格子？在lpma分支，模型只训包含文字的框

from davar-lab-ocr.

tucachmo2202 commented on July 27, 2024

hi @le8888e ,
I am stucking with the error too. Have you solve it yet?

from davar-lab-ocr.

le8888e commented on July 27, 2024

@tucachmo2202 Not yet. The train process works smoothly with PubTabNet dataset, but draws error with my own data.

from davar-lab-ocr.

tucachmo2202 commented on July 27, 2024

@tucachmo2202 Not yet. The train process works smoothly with PubTabNet dataset, but draws error with my own data.

how did you create your dataset? I custom this git follow pubnet format. When training with my dataset, sometimes, it raise error
row = [cellnp[i, 0], cellnp[i, 2]] IndexError: index 132 is out of bounds for axis 0 with size 124
sometimes it raise
ValueError: cannot convert float NaN to integer

from davar-lab-ocr.

tucachmo2202 commented on July 27, 2024

@qiaoliang6 ,
Hope you provide soon. Thank you very much!

from davar-lab-ocr.

tucachmo2202 commented on July 27, 2024

@qiaoliang6 , Hope you provide soon. Thank you very much!

@tucachmo2202 Thank you for providing samples. In this samples, we find that the orignal annoation in html has the mismatch problem, i.e., the number of bbox does not match with the number of "" in html. So in the latest update c85ca3f, we modify the conversion script to filter out the illegal samples. See demo/table_recognition/datalist/ReadMe.md for more details.

Thank you very much for your help!

from davar-lab-ocr.

用自己的数据训练LGPMA报错ValueError: cannot convert float NaN to integer about davar-lab-ocr HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent