2017-10-22 23:07:17.471187: W tensorflow/core/framework/op_kernel.cc:1158] Invalid arg

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

adding the following lines solves the problem <a href="https://github.com/emedvede

HI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Error when training on Synth 90k about attention-ocr HOT 18 CLOSED

emedvedev commented on August 17, 2024

Error when training on Synth 90k

from attention-ocr.

Comments (18)

emedvedev commented on August 17, 2024

Thanks for the report! Could you please provide the image in the dataset that errors out? Does this error appear from the beginning, or at some point in the middle of the training process?

from attention-ocr.

thisismohitgupta commented on August 17, 2024

happens in the middle, roughly around 1300-1500 steps with batch size of 512. I hard tried but I could not identify that problematic image. Please help.

from attention-ocr.

emedvedev commented on August 17, 2024

It's pretty much impossible to help unless I know what the image is, unfortunately. You can try inserting some debugging line that would output the list of images in the batch, and then try to narrow it down to a particular one, or just add a catch that would ignore a failed batch and continue training. Might be that your dataset is corrupted.

from attention-ocr.

tumusudheer commented on August 17, 2024

Hi @emedvedev,

I've faced similar errors while training on my data. I debugged which images are giving the similar errors and I used Imagemagick's convert command to convert that image to gray scale and the commands are working fine.

I think the issue is with this line image = tf.image.decode_png(img, channels=1) here
while converting to gray scale from image bytes.

How about changing it to:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

from attention-ocr.

emedvedev commented on August 17, 2024

@tumusudheer thanks for investigating the issue! Can you confirm that the proposed change works with a "broken" image?

@thisismohitgupta you can try applying the proposed patch and re-training your model. Please tell me if it helps!

from attention-ocr.

thisismohitgupta commented on August 17, 2024

@emedvedev it did'nt work for me.
@tumusudheer how did you debug from 9 M images. any pointers?

from attention-ocr.

thisismohitgupta commented on August 17, 2024

adding the following lines solves the problem
here

try:
    image = Image.open(IO(img)).convert('RGB')
except Exception as e:
    continue
if self.max_width and (image.size[0] <= self.max_width):
...

from attention-ocr.

emedvedev commented on August 17, 2024

Well, then we're just silently skipping broken images, which isn't very good. Is there any way to make an image reading/conversion more bulletproof so that we wouldn't skip anything?

from attention-ocr.

tumusudheer commented on August 17, 2024

HI @emedvedev ,

I trained with my proposed change yesterday, and I just verified the results. They are good

This change worked for me:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

While skipping the images, we can create a log file which lists all broken images. After preparing training data, people can verify what is wrong with the images in log file and try to fix/verify the images.

from attention-ocr.

emedvedev commented on August 17, 2024

Sweet! Would you mind opening a PR with the change then? If you want, you can also implement image skipping there, would be great.

from attention-ocr.

tumusudheer commented on August 17, 2024

Hi @emedvedev ,
Sure I'll send a PR with my changes.

from attention-ocr.

lmolhw5252 commented on August 17, 2024

Hi I got some problems when I want to train.
Caused by op 'IteratorGetNext', defined at:
File "/home/user/anaconda3/bin/aocr", line 11, in
sys.exit(main())
File "/home/user/PycharmProjects/attention-ocr-master/aocr/main.py", line 308, in main
num_epoch=parameters.num_epoch
File "/home/user/PycharmProjects/attention-ocr-master/aocr/model/model.py", line 347, in train
for batch in s_gen.gen(self.batch_size):
File "/home/user/PycharmProjects/attention-ocr-master/aocr/util/data_gen.py", line 54, in gen
images, labels, comments = iterator.get_next()
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 304, in get_next
name=name))
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 379, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?], [?], [?]], output_types=[DT_STRING, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/cpu:0"]]

Did you met this problem before? I don't know how to figure it.

from attention-ocr.

emedvedev commented on August 17, 2024

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords

Your dataset path is incorrect. I'd say it's that dot in the beginning. :)

from attention-ocr.

MBleeker commented on August 17, 2024

Hi Guys,

I encounter the same problem. But, for some reason both solutions are not working for me. Therefore I set the batch size to 1 and checked witch image caused the problem, it is:

mnt/ramdisk/max/90kDICT32px/2194/2/334_EFFLORESCENT_24742.jpg'

But, there could be more ...

Cheers,
Maurits

from attention-ocr.

emedvedev commented on August 17, 2024

@MBleeker Hi there! Just checking: have you set the max-prediction parameter while training and testing? From an earlier issue:

The max-prediction parameter is set to 8 by default, so it'll error out on labels longer than 8 characters. Just set it to whatever makes sense for you in the CLI when you run the training subcommand.

If it's set correctly, then could you provide the full log of your run?

from attention-ocr.

MBleeker commented on August 17, 2024

Hi @emedvedev,

I found the problem already. I never used setup.py before ... I did not know about the .egg files. The updates I made were therefor not used while running the code. It is working now. There are several corrupted images

About the bias terms we discussed in #70 . I added them, results seem not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

Did you try this code with a different set op hyper params than the defaults? Any different results?

Cheers,
Maurits

from attention-ocr.

emedvedev commented on August 17, 2024

About the bias terms we discussed in #70 . I added them, results seems not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

If there's no visible benefit, I'd rather maintain backward compatibility, but if you find out that bias terms do have significant benefit with some datasets, please submit a PR, I'd really appreciate it!

Did you try this code with a different set op hyper params than the defaults? Any different results?

I tried to tweak it a little, but mostly just depends on the dataset. I find the defaults to be sensible, but maybe someone else will have something to add here, too. :)

from attention-ocr.

emedvedev commented on August 17, 2024

I'll close the issue since the original problem has been fixed, so if anyone else has issues with Synth90k, just open a new one. :)

from attention-ocr.

Error when training on Synth 90k about attention-ocr HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent