Git Product home page Git Product logo

Comments (18)

emedvedev avatar emedvedev commented on August 17, 2024

Thanks for the report! Could you please provide the image in the dataset that errors out? Does this error appear from the beginning, or at some point in the middle of the training process?

from attention-ocr.

thisismohitgupta avatar thisismohitgupta commented on August 17, 2024

happens in the middle, roughly around 1300-1500 steps with batch size of 512. I hard tried but I could not identify that problematic image. Please help.

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

It's pretty much impossible to help unless I know what the image is, unfortunately. You can try inserting some debugging line that would output the list of images in the batch, and then try to narrow it down to a particular one, or just add a catch that would ignore a failed batch and continue training. Might be that your dataset is corrupted.

from attention-ocr.

tumusudheer avatar tumusudheer commented on August 17, 2024

Hi @emedvedev,

I've faced similar errors while training on my data. I debugged which images are giving the similar errors and I used Imagemagick's convert command to convert that image to gray scale and the commands are working fine.

I think the issue is with this line image = tf.image.decode_png(img, channels=1) here
while converting to gray scale from image bytes.

How about changing it to:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

@tumusudheer thanks for investigating the issue! Can you confirm that the proposed change works with a "broken" image?

@thisismohitgupta you can try applying the proposed patch and re-training your model. Please tell me if it helps!

from attention-ocr.

thisismohitgupta avatar thisismohitgupta commented on August 17, 2024

@emedvedev it did'nt work for me.
@tumusudheer how did you debug from 9 M images. any pointers?

from attention-ocr.

thisismohitgupta avatar thisismohitgupta commented on August 17, 2024

adding the following lines solves the problem
here

try:
    image = Image.open(IO(img)).convert('RGB')
except Exception as e:
    continue
if self.max_width and (image.size[0] <= self.max_width):
...

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

Well, then we're just silently skipping broken images, which isn't very good. Is there any way to make an image reading/conversion more bulletproof so that we wouldn't skip anything?

from attention-ocr.

tumusudheer avatar tumusudheer commented on August 17, 2024

HI @emedvedev ,

I trained with my proposed change yesterday, and I just verified the results. They are good

This change worked for me:

rgb_image = tf.image.decode_png(img,  channels=3)
image = tf.image.rgb_to_grayscale(rgb_image)

While skipping the images, we can create a log file which lists all broken images. After preparing training data, people can verify what is wrong with the images in log file and try to fix/verify the images.

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

Sweet! Would you mind opening a PR with the change then? If you want, you can also implement image skipping there, would be great.

from attention-ocr.

tumusudheer avatar tumusudheer commented on August 17, 2024

Hi @emedvedev ,
Sure I'll send a PR with my changes.

from attention-ocr.

lmolhw5252 avatar lmolhw5252 commented on August 17, 2024

Hi I got some problems when I want to train.
Caused by op 'IteratorGetNext', defined at:
File "/home/user/anaconda3/bin/aocr", line 11, in
sys.exit(main())
File "/home/user/PycharmProjects/attention-ocr-master/aocr/main.py", line 308, in main
num_epoch=parameters.num_epoch
File "/home/user/PycharmProjects/attention-ocr-master/aocr/model/model.py", line 347, in train
for batch in s_gen.gen(self.batch_size):
File "/home/user/PycharmProjects/attention-ocr-master/aocr/util/data_gen.py", line 54, in gen
images, labels, comments = iterator.get_next()
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 304, in get_next
name=name))
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 379, in iterator_get_next
output_shapes=output_shapes, name=name)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/user/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?], [?], [?]], output_types=[DT_STRING, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/cpu:0"]]

Did you met this problem before? I don't know how to figure it.

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords

Your dataset path is incorrect. I'd say it's that dot in the beginning. :)

from attention-ocr.

MBleeker avatar MBleeker commented on August 17, 2024

Hi Guys,

I encounter the same problem. But, for some reason both solutions are not working for me. Therefore I set the batch size to 1 and checked witch image caused the problem, it is:

mnt/ramdisk/max/90kDICT32px/2194/2/334_EFFLORESCENT_24742.jpg'

But, there could be more ...

Cheers,
Maurits

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

@MBleeker Hi there! Just checking: have you set the max-prediction parameter while training and testing? From an earlier issue:

The max-prediction parameter is set to 8 by default, so it'll error out on labels longer than 8 characters. Just set it to whatever makes sense for you in the CLI when you run the training subcommand.

If it's set correctly, then could you provide the full log of your run?

from attention-ocr.

MBleeker avatar MBleeker commented on August 17, 2024

Hi @emedvedev,

I found the problem already. I never used setup.py before ... I did not know about the .egg files. The updates I made were therefor not used while running the code. It is working now. There are several corrupted images

About the bias terms we discussed in #70 . I added them, results seem not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

Did you try this code with a different set op hyper params than the defaults? Any different results?

Cheers,
Maurits

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

About the bias terms we discussed in #70 . I added them, results seems not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint).

If there's no visible benefit, I'd rather maintain backward compatibility, but if you find out that bias terms do have significant benefit with some datasets, please submit a PR, I'd really appreciate it!

Did you try this code with a different set op hyper params than the defaults? Any different results?

I tried to tweak it a little, but mostly just depends on the dataset. I find the defaults to be sensible, but maybe someone else will have something to add here, too. :)

from attention-ocr.

emedvedev avatar emedvedev commented on August 17, 2024

I'll close the issue since the original problem has been fixed, so if anyone else has issues with Synth90k, just open a new one. :)

from attention-ocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.