Git Product home page Git Product logo

cnn_lstm_ctc_ocr's People

Contributors

gaffordb avatar lamsalab avatar lgoldberg9 avatar luftj avatar murphymatt avatar sahilbandar avatar weinman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn_lstm_ctc_ocr's Issues

About the checkpoint

Hello, sir. I am s student just start the learning about tensorflow. I met a few problems when running this program. Can you upload the checkpoint. Thank you!

The train error

When I train on my data. There is a error! Please can any one suggest me?
2017-12-10 13:28:41.796273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:00:06.0 totalMemory: 11.90GiB freeMemory: 11.76GiB 2017-12-10 13:28:41.796331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:00:06.0, compute capability: 6.1) INFO:tensorflow:Starting standard services. INFO:tensorflow:Saving checkpoint to path ../data/model/model.ckpt INFO:tensorflow:Starting queue runners. INFO:tensorflow:global_step/sec: 0 2017-12-10 13:28:47.390354: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390538: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390862: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]]

often recognize 'u' wrongly

Hello,

I trained your model with mjsynth dataset and default parameter settings over 1000000 steps.
I found that the model often wrongly recognizes character 'u'.
It seems as if there is no 'u' class.
Do you have any thoughts about what the cause might be?

error with the mjsynth-tfrecord.py file

I downloaded the mjsynth dataset separately and stored the images in the image subpath under the data directory. Basically, I did everything manually up until the "make mjsynth-tfrecord.py" command.
When i ran the command, it showed me a syntax error in the print line in this line from the mjsynth-tfrecord.py file.

    print str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print str(i),'of',str(num_shards),'[',str(start),':]',out_filename
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename) 

since i am using python 3.6, I thought the problem is the absence of opening and closing brackets in the print line, hence i changed it to this...

    print (str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename)
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print (str(i),'of',str(num_shards),'[',str(start),':]',out_filename)
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)

And the program started runnig, but Im seeing a lot of files read a error corrosponding to this line

    except:
        # Some files have bogus payloads, catch and note the error, moving on
        print('ERROR',filename)

Can anyone tell me why this is happening? Thankyou for the help in advance.

are there any pretrain model file

@weinman
1. Is there any pre-train model file so i can just check it
2. When I ran make command its creating tfrecords files and terminal output is below its skipping some files may be for test use I think
here is the terminal output
177 of 1000 [ 1278825 : 1286050 ] ../data/train/words-177.tfrecord ('SKIPPING', '1993/4/472_nj_51777.jpg') ('SKIPPING', '1991/5/238_d_18979.jpg') ('SKIPPING', '1991/5/204_V_83811.jpg') ('SKIPPING', '1991/4/228_j_41074.jpg') 178 of 1000 [ 1286050 : 1293275 ] ../data/train/words-178.tfrecord ('SKIPPING', '1990/1/447_4_95.jpg') ('SKIPPING', '1989/4/34_NI_51538.jpg') ('SKIPPING', '1988/3/445_CORRECTNESS_17153.jpg') 179 of 1000 [ 1293275 : 1300500 ] ../data/train/words-179.tfrecord ('SKIPPING', '1987/7/56_n_50734.jpg') ('SKIPPING', '1987/6/477_SJ_71221.jpg') ('SKIPPING', '1987/6/102_RADIOTELEPHONE_62145.jpg') ('SKIPPING', '1986/1/175_INDIVIDUALISTICALLY_39086.jpg') 180 of 1000 [ 1300500 : 1307725 ] ../data/train/words-180.tfrecord ('SKIPPING', '1985/1/50_Debaucheries_19549.jpg') 181 of 1000 [ 1307725 : 1314950 ] ../data/train/words-1

ctc_loss_calculator.cc Not a valid path

While training OCR I got the following error a couple of times only:
Error: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found
I don't understand what it means. Can anyone help?

Memory soon used up when running train step

After running 'make mjsynth-download' and 'make mjsynth-tfrecord', I went to the 3rd step to train the model by running 'make train', but the machine's memory(32G) was soon used up in 2 secends and the host hang and restarted. What's the possible cause of this issue?

Train on more characters?

I want to recognize more than just English alphabet and numbers (e.g. special Unicode characters). Is this possible and how can I do this?

Suppose I have my own dataset, do I have to write my own data loader and provide

out_charset="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

like in your src/mjsynth.py

feature extract using CNN with unconstrained length image

hi weinman , I have read the paper and code and try to understand but a few questions confused me, please help me.

  1. the model we input data by function bucket_by_sequence_length with paramter dynamic_pad setted True , in every batch has a fix shape , but different batch may have different shape, so how does cnn in the model work ?
  2. how to write inference service when input different width images ?
  3. Any theroy about sequence length calculation in end of convnet layer?
    thanks.

Input shapes: [72,357,1], [4] and with input tensors computed

@weinman Hello,
1. I have seen this #35 But the which file name its not given even i checked validate.py, mjsynth.py, model_fn.py but was not able to find this line
image = tf.concat([first_row, image], 0)
or this
image = tf.placeholder(tf.uint8, shape=[32, None, 1])

tensorflow/python/framework/ops.py", line 1631, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension size must be evenly divisible by 32 but is 25704 for 'Reshape' (op: 'Reshape') with input shapes: [72,357,1], [4] and with input tensors computed as partial shapes: input[1] = [1,32,?,1].

2. In the default train data are there any Number Present I mean 123. I have checked it but its not availble. If i want to train my coustom data how is the labling done i usally use bonding box but this is different can you tell me how is the lableing done and how to train coustom data. Forgive me for my english is not good.

do not convergence

image

I'm confused
Why doesn't the loss function go down? Who can tell me? I am a beginner.

Feature extraction using CNN

Hi,
I'm using this code to extract CNN features. So, I would like to ask about the variable containing features and how to convert it to a vector and save it to the disk. I'm using the file src/test.py

Thank you in advance for your help

How can I make own words-000.tfrecord ?

When debugging, I find the type is tensor of image,width...

image = tf.image.decode_jpeg( features['image/encoded'], channels=1 ) #gray
width = tf.cast( features['image/width'], tf.int32) # for ctc_loss
label = tf.serialize_sparse( features['image/labels'] ) # for batching
length = features['text/length']
text = features['text/string']
filename = features['image/filename']

Could you please tell me how to create my own words-000.tfrecord ?

Thank you!

image pixel value

excuse me,in function _preprocess_image(image) (Mjsynth.py), why rescale the pixels value to float([-0.5,0.5]) not float([0,1]). can u tell me why? tks

Feature Extraction using CNN and Window width

Hi, I would like to use this code to extract features using CNN, I'm asking if I can use a sliding window width more than 1 pixel .
My goal is to extract a set of features based on CNN and to train a BLSTM-CTC recognizer.

Training error under tf1.0

Hi,

I cloned the project and try to train it under tensorflow 1.0, but got the following error information. Could you please give me some advice? Thank you very much! BTW, I'm using ubuntu 16.04 and IBM ppc64 machine.

xiaoren@S822lc1:~/homework/cnn_lstm_ctc_ocr/src$ python train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    tf.app.run()
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 173, in main
    image,width,label = _get_input()
  File "train.py", line 83, in _get_input
    length_threshold=FLAGS.length_threshold )
  File "/home/xiaoren/homework/cnn_lstm_ctc_ocr/src/mjsynth.py", line 69, in bucketed_input_pipeline
    dynamic_pad=True)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 389, in bucket_by_sequence_length
    shared_name=shared_name)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 231, in bucket
    control_flow_ops.no_op)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1726, in cond
    raise TypeError("pred must not be a Python bool")
TypeError: pred must not be a Python bool

Training error

When I trained your sample data with tensorflow-gpu 1.12, I got this error (I've cloned tf-1.12 branch, but it had same error).

INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_session_config': allow_soft_placement: true
, '_save_checkpoints_steps': None, '_service': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_protocol': None, '_master': '', '_tf_random_seed': None, '_save_checkpoints_secs': 120, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd68bd37390>, '_experimental_distribute': None, '_keep_checkpoint_max': 5, '_is_chief': True, '_task_type': 'worker', '_device_fn': None, '_train_distribute': None, '_save_summary_steps': 100, '_model_dir': '../data/model', '_evaluation_master': '', '_num_ps_replicas': 0}
Traceback (most recent call last):
File "train.py", line 182, in
tf.app.run()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 179, in main
classifier.train( input_fn=_get_input, max_steps=FLAGS.max_num_steps )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1234, in _train_model_default
input_fn, model_fn_lib.ModeKeys.TRAIN))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1075, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1162, in _call_input_fn
return input_fn(**kwargs)
File "train.py", line 130, in _get_input
dataset = pipeline.get_data( FLAGS.static_data, **data_args)
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/pipeline.py", line 79, in get_data
dataset = dpipe.get_dataset( dpipe_args )
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/mjsynth.py", line 60, in get_dataset
buffer_size=buffer_sz )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 218, in init
prefetch_input_elements=None)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 134, in init
cycle_length, block_length)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2714, in init
super(InterleaveDataset, self).init(input_dataset, map_func)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2677, in init
experimental_nested_dataset_support=True)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1860, in init
self._function.add_to_graph(ops.get_default_graph())
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 479, in add_to_graph
self._create_definition_if_needed()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 335, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 344, in _create_definition_if_needed_impl
self._capture_by_value, self._caller_device)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 864, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1794, in tf_data_structured_function_wrapper
ret = func(*nested_args)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 210, in read_one_file
return _TFRecordDataset(filename, compression_type, buffer_size)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 105, in init
argument_default=_DEFAULT_READER_BUFFER_SIZE_BYTES)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/util/convert.py", line 32, in optional_param_to_tensor
argument_value, dtype=argument_dtype, name=argument_name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
as_ref=False)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 442, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int64, got 256.0 of type 'float' instead.

Loading the model only once.

I want to load the model only once and pass the N number of images for recognition, but whenever I'm passing images, model is loading again and again. I tried to load model in another function and using the same session variable for future recognition but it is giving error raise RuntimeError('Attempted to use a closed Session.') RuntimeError: Attempted to use a closed Session.

How to deal with single character input

Hi,

When I created tfrecords for my custom dataset, a lot of images got filtered out. Because the input image only contains one character, so precessed image width < min_width (https://github.com/weinman/cnn_lstm_ctc_ocr/blob/master/src/mjsynth-tfrecord.py#L143).

I am wondering what is the correct way to deal with single char inputs. Do I need to set min_width to be a smaller value (already tried 3, still filtered out many images), or should I pad the input image with zeros?

Thanks,
Xin

Training Error

while i am running the file train.py,i am facing the following issue. please can any one suggest me?

screenshot from 2017-11-16 17-24-37

Empty output in validate.py

For prediction of some real images, I ran your script:

cd src ; python validate.py < ~/paths_to_images.txt

and received empty output (it didn't print anything into screen).
I went to check your code and found this line:

[output] = sess.run(prediction,{ image: image_data, width: image_data.shape[1]} )

and tried to print(output):

SparseTensorValue(indices=array([], shape=(0, 2), dtype=int64), values=array([], dtype=int64), dense_shape=array([1, 0]))

As expected its values is an empty array [].

What went wrong?

validate.py speed problem

one picture one time needs 30 seconds, -- validate.py
picture is 32*280 around, 5000+ chars, 200mb model size.
how to speed up? 30 seconds is too long.

Dynamic training data shape error.

ValueError: generator yielded an element of shape (37, 109, 1) where an element of shape (32, ?, 1) was expected.

the pipline.py call preprocess data
dataset = dataset.map( dpipe.preprocess_fn, num_parallel_calls=num_threads ) seems ok and
maptextsynth.py use the new normalize_image method

def _preprocess_image( image ):
"""Rescale image"""
image = pipeline.normalize_image(image)
return image

passing the session .

hey , thank you for your nice code !
I want to load the model just once and pass the session for several predicts .
because of loading the model time .
I just wonder how to do it , please help ! , thanks .

model question

Layer Op KrnSz Stride(v,h) OutDim H W PadOpt
1 Conv 3 1 64 30 30 valid
2 Conv 3 1 64 30 30 same
Pool 2 2 64 15 15
3 Conv 3 1 128 15 15 same
4 Conv 3 1 128 15 15 same
Pool 2 2,1 128 7 14
5 Conv 3 1 256 7 14 same
6 Conv 3 1 256 7 14 same
Pool 2 2,1 256 3 13
7 Conv 3 1 512 3 13 same
8 Conv 3 1 512 3 13 same
Pool 3 3,1 512 1 13
9 LSTM 512
10 LSTM 512

if I want to train more than 3000+ chars, how to modify the model.
cnn layer more deeper, change to maxpooling layer or what?

Irrelevancy

Training fixed models is vastly wasteful of yours and especially your students time. This focus on narrow AI is only side stepping from our goal of developing human-level AI. The only reasons narrow AI from a person or group with fairly or greater deep understanding of AI can be justifiable by two reasons. One is for a product that is needed fairly quickly before human-level AI ultimately arrives. A product that the world would greatly suffer without before the eventual ultimatum when strong AI arrives. Among special group of others, I list Tesla's AI vision wing in this category in its goal which is part of a far greater picture "necessary" if explanation needed. The other reason is if this narrow is clearing up new sectors like potent technological models different from CNNs or RNNs or even pushing these models into new territories. This project among others is the social equivalent of globalization. Yes, it might help a bit but what it does more is waste talent on things that will eventually be replaced. It is working on a far superior sail when knowing world is on the verge of steam engines which will revolutionize the field. Great courage is required to make great strides. That courage is diving into something that you don't know what actually you are even searching for. That courage is knowing very well a life's work might net nothing. That courage is being selfless taking an impossible chance at breakthrough over some publications in your name. Have your students into new territories that even you don't even know comfortably. Be their leader in routes you feel no one is exploring, perhaps even against your own beliefs. Grow like the AI you are building after reading this if you read this entirely. Be the building block.

  • Someone you know very well.

Using Multiple GPU as a train_device

I need just small help in training the model in multiple GPU so as option is availble --train_device I'm able to mention only one device. How I can mention the both of gpu as train device.

State-of-the-art

Hi! Thanks for your work!
Do you know which algorithm is now state-of the-art in OCR since crnn was invented two years ago? Are there some new models that are based (may be) on the crnn (convolutional and recurrent)?

Saving Checkpoint

Hi @weinman sir,
I've run the training for this model. In the source file, I've noticed that model is not saving the checkpoint on basis of less loss, and I am not able to get what is accuracy of the model till now. I want to know that, on which basis it is saving the checkpoints.
Thank You...

Can we use new training data?

Hi,
I was just wondering that can we use our own dataset(the images and labels may be different)?
But it looks like your code already has a pre-trained model.....Do we need to delete that pre-trained model? Since that pre-trained model may not be helpful for the new training set...

Thanks!

Which fork tensorflow are you using?

I tried to find the fork tensorflow that contains "tf.nn.ctc_beam_search_decoder_trie", However, I cannot find it. Could you tell me which one are you using?

Can't recognize consecutive same charactors

Hi, I read your excellent paper and use your code to do some experiment. But I found it can not recognize the consecutive charactor when they are same. For example, "good" will be recognized as "god".
Could you please help me about this problem?
Thanks

Model

Please, can you give me you pre-trained model for test

Can't recognise same consecutive characters

Hi, This paper is very useful. But I am facing an issue of unable to recognise consecutive characters like "password" is recognised as "pasword". Can you help with this how to get rid of it ?

python validate.py problem

hi there!
when i type python validate.py and i type 1.jpg
i got follow error

2018-08-02 15-42-18

how could i solve this,
thanks

Imbalanced classes

Hi!
I am new in tensorflow, now I am trying to figure out how to work with your model. The thing is, I need to put my own data in it, but my dataset is very imbalanced (for example, the class ‘q’ occurs about 100 times in the dataset, but the class ‘a’ may be more than 10 thousand times). What should I do? How can I use class weights in your code?
I think it may looks like this. In function ‘ctc_loss_layer’ in ‘model.py’ we have rnn_logits - this is output from RNN, what if I multiply it by class weights before put it in CTC loss? Then CTC loss would have the greater weights for rare classes, and that would impact to backpropagation. Am I right? Could you please help me?

TypeError: __init__() got an unexpected keyword argument 'session_config'`

when i run validate.py,i encounter an error:
D:\Tensorflow\cnn_lstm_ctc_ocr-master\src>python validate.py d:/Tensorflow/cnn_lstm_ctc_ocr_master/src/11.jpg d:\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "validate.py", line 109, in <module> tf.app.run() File "d:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "validate.py", line 89, in main classifier = tf.estimator.Estimator( config=_get_config(), File "validate.py", line 82, in _get_config custom_config = tf.estimator.RunConfig( session_config=device_config ) TypeError: __init__() got an unexpected keyword argument 'session_config'
my tensorflow version is 1.2.1,windows

Multiple Words?

This isn't an issue but a question. I've read the CRNN paper and have played around with PyTorch implementation of it: https://github.com/meijieru/crnn.pytorch

I've noticed that CRNN is able to detect single words but not multiple words. Example:
box_21

My question is, would this project help with detection of multiple words?

Computing real sequence length

Hi!
I have a simple doubt about the calculation of the sequence length after the conv and pool layers. In the following code, why did you calculate the seq len just until the fourth pooling op (after_pool4)?

conv1 = conv_layer(inputs, layer_params[0], training ) # 30,30
conv2 = conv_layer( conv1, layer_params[1], training ) # 30,30
pool2 = pool_layer( conv2, 2, 'valid', 'pool2')        # 15,15
conv3 = conv_layer( pool2, layer_params[2], training ) # 15,15
conv4 = conv_layer( conv3, layer_params[3], training ) # 15,15
pool4 = pool_layer( conv4, 1, 'valid', 'pool4' )       # 7,14
conv5 = conv_layer( pool4, layer_params[4], training ) # 7,14
conv6 = conv_layer( conv5, layer_params[5], training ) # 7,14
pool6 = pool_layer( conv6, 1, 'valid', 'pool6')        # 3,13
conv7 = conv_layer( pool6, layer_params[6], training ) # 3,13
conv8 = conv_layer( conv7, layer_params[7], training ) # 3,13
pool8 = tf.layers.max_pooling2d( conv8, [3,1], [3,1], 
                           padding='valid', name='pool8') # 1,13

features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim

kernel_sizes = [ params[1] for params in layer_params]

#Calculate resulting sequence length from original image widths
conv1_trim = tf.constant( 2 * (kernel_sizes[0] // 2),
                          dtype=tf.int32,
                          name='conv1_trim')
one = tf.constant(1, dtype=tf.int32, name='one')
two = tf.constant(2, dtype=tf.int32, name='two')
after_conv1 = tf.subtract( widths, conv1_trim)
after_pool2 = tf.floor_div( after_conv1, two )
after_pool4 = tf.subtract(after_pool2, one)
sequence_length = tf.reshape(after_pool4,[-1], name='seq_len') # Vectorize

sequence length problem

hi, calculating sequence length in calc_seq_len() (mjsynth-tfrecord.py) should be the same with convnet_layers() (model.py)? may be like this:

import model
kernel_sizes = [ params[1] for params in model.layer_params]
def calc_seq_len(image_width):
    conv1_trim =  2 * (kernel_sizes[0] // 2)
    after_conv1 = image_width - conv1_trim
    after_pool2 = after_conv1 // 2
    after_pool4 = after_pool2 - 1
    after_pool6 = after_pool4 - 1
    after_pool8 = after_pool6
    sequence_length = after_pool8
    return sequence_length

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.