weinman / cnn_lstm_ctc_ocr Goto Github PK

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

License: GNU General Public License v3.0

Makefile 0.87% Python 99.13%

ocr tensorflow text-recognition ctc lstm convolutional-neural-networks

cnn_lstm_ctc_ocr's Introduction

Overview

This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perform robust word recognition.

The model is a straightforward adaptation of Shi et al.'s CRNN architecture (arXiv:1507.0571). The provided code downloads and trains using Jaderberg et al.'s synthetic data (IJCV 2016), MJSynth.

Notably, the model achieves a lower test word error rate (1.82%) than CRNN when trained and tested on case-insensitive, closed vocabulary MJSynth data.

Written for Python 2.7. Requires TensorFlow >=1.10 (deprecation warnings exist for TF>1.10, but the code still works).

The model and subsequent experiments are more fully described in Weinman et al. (ICDAR 2019)

Structure

The model as built is a hybrid of Shi et al.'s CRNN architecture (arXiv:1507.0571) and the VGG deep convnet, which reduces the number of parameters by stacking pairs of small 3x3 kernels. In addition, the pooling is also limited in the horizontal direction to preserve resolution for character recognition. There must be at least one horizontal element per character.

Assuming one starts with a 32x32 image, the dimensions at each level of filtering are as follows:

Layer	Op	KrnSz	Stride(v,h)	OutDim	H	W	PadOpt
1	Conv	3	1	64	30	30	valid
2	Conv	3	1	64	30	30	same
	Pool	2	2	64	15	15
3	Conv	3	1	128	15	15	same
4	Conv	3	1	128	15	15	same
	Pool	2	2,1	128	7	14
5	Conv	3	1	256	7	14	same
6	Conv	3	1	256	7	14	same
	Pool	2	2,1	256	3	13
7	Conv	3	1	512	3	13	same
8	Conv	3	1	512	3	13	same
	Pool	3	3,1	512	1	13
9	LSTM			512
10	LSTM			512

To accelerate training, a batch normalization layer is included before each pooling layer and ReLU non-linearities are used throughout. Other model details should be easily identifiable in the code.

The default training mechanism uses the ADAM optimizer with learning rate decay.

Differences from CRNN

Deeper early convolutions

The original CRNN uses a single 3x3 convolution in the first two conv/pool stages, while this network uses a paired sequence of 3x3 kernels. This change increases the theoretical receptive field of early stages of the network.

As a tradeoff, we omit the computationally expensive 2x2x512 final convolutional layer of CRNN. In its place, this network vertically max pools over the remaining three rows of features to collapse to a single 512-dimensional feature vector at each horizontal location.

The combination of these changes preserves the theoretical receptive field size of the final CNN layer, but reduces the number of convolution parameters to be learned by 15%.

Padding

Another important difference is the lack of zero-padding in the first convolutional layer, which can cause spurious strong filter responses around the border. By trimming the first convolution to valid regions, this model erodes the outermost pixel of values from the response filter maps (reducing height from 32 to 30 and reducing the width by two pixels).

This approach seems preferable to requiring the network to learn to ignore strong Conv1 responses near the image edge (presumably by weakening the power of filters in subsequent convolutional layers).

Batch normalization

We include batch normalization after each pair of convolutions (i.e., after layers 2, 4, 6, and 8 as numbered above). The CRNN does not include batch normalization after its first two convolutional stages. Our model therefore requires greater computation with an eye toward decreasing the number of training iterations required to reach converegence.

Subsampling/stride

The first two pooling stages of CRNN downsample the feature maps with a stride of two in both spatial dimensions. This model instead preserves sequence length by downsampling horizontally only after the first pooling stage.

Because the output feature map must have at least one timeslice per character predicted, overzealous downsampling can make it impossible to represent/predict sequences of very compact or narrow characters. Reducing the horizontal downsampling allows this model to recognize words in narrow fonts.

This increase in horizontal resolution does mean the LSTMs must capture more information. Hence this model uses 512 hidden units, rather than the 256 used by the CRNN. We found this larger number to be necessary for good performance.

Training

To completely train the model, you will need to download the mjsynth dataset and pack it into sharded TensorFlow records. Then you can start the training process, a tensorboard monitor, and an ongoing evaluation thread. The individual commands are packaged in the accompanying Makefile.

make mjsynth-download
make mjsynth-tfrecord
make train &
make monitor &
make test

To monitor training, point your web browser to the url (e.g., (http://127.0.1.1:8008)) given by the Tensorboard output.

Note that it may take 4-12 hours to download the complete mjsynth data set. A very small set (0.1%) of packaged example data is included; to run the small demo, skip the first two lines involving mjsynth.

With a GeForce GTX 1080, the demo takes about 20 minutes for the validation character error to reach 45% (using the default parameters); at one hour (roughly 7000 iterations), the validation error is just over 20%.

With the full training data, by one million iterations the model typically converges to around 5% training character error and 27.5% word error.

Checkpoints

Pre-trained model checkpoints at DOI:11084/23328 are used to produce results in the following paper:

Weinman, J. et al. (2019) Deep Neural Networks for Text Detection and Recognition in Historical Maps. In Proc. ICDAR.

Testing

The evaluate script (src/evaluate.py) streams statistics for one batch of validation (or evaluation) data. It prints the iteration, evaluation batch loss, label error (percentage of characters predicted incorrectly), and the sequence error (percentage of words—entire sequences—predicted incorrectly).

The test script (src/test.py) tallies statistics, finally normalizing for all data. It prints the loss, label error, total number of labels, sequence error, total number of sequences, and the label error rate and sequence error rate.

Validation

To see the output of a small set of instances, the validation script (src/validation.py) allows you to load a model and read an image one at a time via the process's standard input and print the decoded output for each. For example

cd src ; python validate.py < ~/paths_to_images.txt

Alternatively, you can run the program interactively by typing image paths in the terminal (one per line, type Control-D when you want the model to run the input entered so far).

Configuration

There are many command-line options to configure training parameters. Run train.py or test.py with the --help flag to see them or inspect the scripts. Model parameters are not command-line configurable and need to be edited in the code (see src/model.py).

Dynamic training data

Dynamic data can be used for training or testing by setting the --nostatic_data flag.

You can use the --ipc_synth boolean flag [default=True] to determine whether to use single-threaded or a buffered, multiprocess synthesis.

The --synth_config_file flag must be given with --nostatic_data.

The MapTextSynthesizer library supports training with dynamically synthesized data. The relevant code can be found within MapTextSynthesizer/tensorflow/generator

Using a lexicon

By default, recognition occurs in "open vocabulary" mode. That is, the system observes no constraints on producing the resulting output strings. However, it also has a "closed vocabulary" mode that can efficiently limit output to a given word list as well as a "mixed vocabulary" mode that can produce either a vocabulary word from a given word list (lexicon) or a non-vocabulary word, depending on the value of a prior bias for lexicon words.

Using the closed or mixed vocabulary modes requires additional software. This repository is connected with a fork of Harald Scheidl's CTCWordBeamSearch, obtainable as follows:

git clone https://github.com/weinman/CTCWordBeamSearch
cd CTCWordBeamSearch
git checkout var_seq_len

Then follow the build instructions, which may be as simple as running

cd cpp/proj
./buildTF.sh

To use, make sure CTCWordBeamSearch/cpp/proj (the directory containing TFWordBeamSearch.so) is in the LD_LIBRARY_PATH when running test.py or validate.py (in this repository).

API Notes

This version uses the TensorFlow (v1.14) Dataset for fast I/O. Training, testing, validation, and prediction use a custom Estimator.

Citing this work

Please cite the following paper if you use this code in your own research work:

@inproceedings{ weinman19deep,
    author = {Jerod Weinman and Ziwen Chen and Ben Gafford and Nathan Gifford and Abyaya Lamsal and Liam Niehus-Staab},
    title = {Deep Neural Networks for Text Detection and Recognition in Historical Maps},
    booktitle = {Proc. IAPR International Conference on Document Analysis and Recognition},
    month = {Sep.},
    year = {2019},
    location = {Sydney, Australia},
    doi = {10.1109/ICDAR.2019.00149}
}

Acknowledgment

This work was supported in part by the National Science Foundation under grant Grant Number 1526350.

cnn_lstm_ctc_ocr's People

Contributors

Stargazers

Watchers

Forkers

zhuzzjlu xray1111 dafeix sickfox zgsxwsdxg qaisarrajput pickou zhangxinnan flyflywang mlnagents onebaicai ghhong1986 peternara trigrass2 srwpf dengcy028 littlepai qwzhong1988 mfarooq90 vgovindarajulu schperics omriargaman xuanhan863 yangyi0959 2php aaron-wu lyk125 levinj yuckfu wyw636 jaechoon2 lturing lemonaha anthonyawuley shiyongde tgialoimtr icaffe liviust zealerww sequence-labeling eric2323223 harshadeepg xuming76 kitter davidtranno1 zxdatascience sunshinezhihuo donglinjy yiyifu alexliyang kinect59 adiffm murphymatt comedsh kirstihly fendaq sesebuckin zfxxfeng pogeba duchen521 terrynech ezglory huggable lgoldberg9 lamsalab chengstone nguyenhongchau lovaya xianfengju arnaudmkonan wkhunter juliping sahilbandar mhsamavatian duboya chaitusvk mxuer fanofjava whitexiezx yvelzhang miendinh stepinto163 kanishk-mehta undercontroller wjth07 coolan2013 kyushu fakhraddin bitisony dllearn linecode beifengche aiwener attendfov sand47 diaaesmail theamrzaki zjj-2015 haorotu changya1990

cnn_lstm_ctc_ocr's Issues

Dynamic training data shape error.

ValueError: generator yielded an element of shape (37, 109, 1) where an element of shape (32, ?, 1) was expected.

the pipline.py call preprocess data
dataset = dataset.map( dpipe.preprocess_fn, num_parallel_calls=num_threads ) seems ok and
maptextsynth.py use the new normalize_image method

def _preprocess_image( image ):
"""Rescale image"""
image = pipeline.normalize_image(image)
return image

Training error

When I trained your sample data with tensorflow-gpu 1.12, I got this error (I've cloned tf-1.12 branch, but it had same error).

INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_session_config': allow_soft_placement: true
, '_save_checkpoints_steps': None, '_service': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_protocol': None, '_master': '', '_tf_random_seed': None, '_save_checkpoints_secs': 120, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd68bd37390>, '_experimental_distribute': None, '_keep_checkpoint_max': 5, '_is_chief': True, '_task_type': 'worker', '_device_fn': None, '_train_distribute': None, '_save_summary_steps': 100, '_model_dir': '../data/model', '_evaluation_master': '', '_num_ps_replicas': 0}
Traceback (most recent call last):
File "train.py", line 182, in
tf.app.run()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 179, in main
classifier.train( input_fn=_get_input, max_steps=FLAGS.max_num_steps )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1234, in _train_model_default
input_fn, model_fn_lib.ModeKeys.TRAIN))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1075, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1162, in _call_input_fn
return input_fn(**kwargs)
File "train.py", line 130, in _get_input
dataset = pipeline.get_data( FLAGS.static_data, **data_args)
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/pipeline.py", line 79, in get_data
dataset = dpipe.get_dataset( dpipe_args )
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/mjsynth.py", line 60, in get_dataset
buffer_size=buffer_sz )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 218, in init
prefetch_input_elements=None)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 134, in init
cycle_length, block_length)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2714, in init
super(InterleaveDataset, self).init(input_dataset, map_func)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2677, in init
experimental_nested_dataset_support=True)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1860, in init
self._function.add_to_graph(ops.get_default_graph())
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 479, in add_to_graph
self._create_definition_if_needed()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 335, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 344, in _create_definition_if_needed_impl
self._capture_by_value, self._caller_device)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 864, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1794, in tf_data_structured_function_wrapper
ret = func(*nested_args)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 210, in read_one_file
return _TFRecordDataset(filename, compression_type, buffer_size)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 105, in init
argument_default=_DEFAULT_READER_BUFFER_SIZE_BYTES)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/util/convert.py", line 32, in optional_param_to_tensor
argument_value, dtype=argument_dtype, name=argument_name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
as_ref=False)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 442, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int64, got 256.0 of type 'float' instead.

What is the output_node_name in the model?

I want to save the model in PB file mode,Then I can call it in Vc++,But I can't find the output_node_name.
How could I found it?

Loading the model only once.

I want to load the model only once and pass the N number of images for recognition, but whenever I'm passing images, model is loading again and again. I tried to load model in another function and using the same session variable for future recognition but it is giving error raise RuntimeError('Attempted to use a closed Session.') RuntimeError: Attempted to use a closed Session.

Is there a script to test the model with one image?

now, I have trained a model with dataset, how can I test the model with one image, which will output the result?

Can't recognise same consecutive characters

Hi, This paper is very useful. But I am facing an issue of unable to recognise consecutive characters like "password" is recognised as "pasword". Can you help with this how to get rid of it ?

the version of tensorflow?

can you tell me the version of tensorflow?

image pixel value

excuse me，in function _preprocess_image(image) (Mjsynth.py), why rescale the pixels value to float([-0.5,0.5]) not float([0,1]). can u tell me why? tks

are there any pretrain model file

@weinman
1. Is there any pre-train model file so i can just check it
2. When I ran make command its creating tfrecords files and terminal output is below its skipping some files may be for test use I think
here is the terminal output
177 of 1000 [ 1278825 : 1286050 ] ../data/train/words-177.tfrecord ('SKIPPING', '1993/4/472_nj_51777.jpg') ('SKIPPING', '1991/5/238_d_18979.jpg') ('SKIPPING', '1991/5/204_V_83811.jpg') ('SKIPPING', '1991/4/228_j_41074.jpg') 178 of 1000 [ 1286050 : 1293275 ] ../data/train/words-178.tfrecord ('SKIPPING', '1990/1/447_4_95.jpg') ('SKIPPING', '1989/4/34_NI_51538.jpg') ('SKIPPING', '1988/3/445_CORRECTNESS_17153.jpg') 179 of 1000 [ 1293275 : 1300500 ] ../data/train/words-179.tfrecord ('SKIPPING', '1987/7/56_n_50734.jpg') ('SKIPPING', '1987/6/477_SJ_71221.jpg') ('SKIPPING', '1987/6/102_RADIOTELEPHONE_62145.jpg') ('SKIPPING', '1986/1/175_INDIVIDUALISTICALLY_39086.jpg') 180 of 1000 [ 1300500 : 1307725 ] ../data/train/words-180.tfrecord ('SKIPPING', '1985/1/50_Debaucheries_19549.jpg') 181 of 1000 [ 1307725 : 1314950 ] ../data/train/words-1

Train on more characters?

I want to recognize more than just English alphabet and numbers (e.g. special Unicode characters). Is this possible and how can I do this?

Suppose I have my own dataset, do I have to write my own data loader and provide

out_charset="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

like in your src/mjsynth.py

do not convergence

I'm confused
Why doesn't the loss function go down? Who can tell me? I am a beginner.

sequence length problem

hi, calculating sequence length in calc_seq_len() (mjsynth-tfrecord.py) should be the same with convnet_layers() (model.py)? may be like this:

import model
kernel_sizes = [ params[1] for params in model.layer_params]
def calc_seq_len(image_width):
    conv1_trim =  2 * (kernel_sizes[0] // 2)
    after_conv1 = image_width - conv1_trim
    after_pool2 = after_conv1 // 2
    after_pool4 = after_pool2 - 1
    after_pool6 = after_pool4 - 1
    after_pool8 = after_pool6
    sequence_length = after_pool8
    return sequence_length

Saving Checkpoint

Hi @weinman sir,
I've run the training for this model. In the source file, I've noticed that model is not saving the checkpoint on basis of less loss, and I am not able to get what is accuracy of the model till now. I want to know that, on which basis it is saving the checkpoints.
Thank You...

feature extract using CNN with unconstrained length image

hi weinman , I have read the paper and code and try to understand but a few questions confused me, please help me.

the model we input data by function bucket_by_sequence_length with paramter dynamic_pad setted True , in every batch has a fix shape , but different batch may have different shape, so how does cnn in the model work ?
how to write inference service when input different width images ?
Any theroy about sequence length calculation in end of convnet layer?
thanks.

error with the mjsynth-tfrecord.py file

I downloaded the mjsynth dataset separately and stored the images in the image subpath under the data directory. Basically, I did everything manually up until the "make mjsynth-tfrecord.py" command.
When i ran the command, it showed me a syntax error in the print line in this line from the mjsynth-tfrecord.py file.

    print str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print str(i),'of',str(num_shards),'[',str(start),':]',out_filename
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)

since i am using python 3.6, I thought the problem is the absence of opening and closing brackets in the print line, hence i changed it to this...

    print (str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename)
    gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print (str(i),'of',str(num_shards),'[',str(start),':]',out_filename)
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)

And the program started runnig, but Im seeing a lot of files read a error corrosponding to this line

    except:
        # Some files have bogus payloads, catch and note the error, moving on
        print('ERROR',filename)

Can anyone tell me why this is happening? Thankyou for the help in advance.

Training Error

while i am running the file train.py,i am facing the following issue. please can any one suggest me?

Multiple Words?

This isn't an issue but a question. I've read the CRNN paper and have played around with PyTorch implementation of it: https://github.com/meijieru/crnn.pytorch

I've noticed that CRNN is able to detect single words but not multiple words. Example:

My question is, would this project help with detection of multiple words?

The train error

When I train on my data. There is a error! Please can any one suggest me?
2017-12-10 13:28:41.796273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:00:06.0 totalMemory: 11.90GiB freeMemory: 11.76GiB 2017-12-10 13:28:41.796331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:00:06.0, compute capability: 6.1) INFO:tensorflow:Starting standard services. INFO:tensorflow:Saving checkpoint to path ../data/model/model.ckpt INFO:tensorflow:Starting queue runners. INFO:tensorflow:global_step/sec: 0 2017-12-10 13:28:47.390354: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390538: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390862: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]]

model question

Layer Op KrnSz Stride(v,h) OutDim H W PadOpt
1 Conv 3 1 64 30 30 valid
2 Conv 3 1 64 30 30 same
Pool 2 2 64 15 15
3 Conv 3 1 128 15 15 same
4 Conv 3 1 128 15 15 same
Pool 2 2,1 128 7 14
5 Conv 3 1 256 7 14 same
6 Conv 3 1 256 7 14 same
Pool 2 2,1 256 3 13
7 Conv 3 1 512 3 13 same
8 Conv 3 1 512 3 13 same
Pool 3 3,1 512 1 13
9 LSTM 512
10 LSTM 512

if I want to train more than 3000+ chars, how to modify the model.
cnn layer more deeper, change to maxpooling layer or what?

Hi Can I use this code for English word recognition in an image for different fonts and different font sizes. Can you plz help me. Thanks

often recognize 'u' wrongly

Hello,

I trained your model with mjsynth dataset and default parameter settings over 1000000 steps.
I found that the model often wrongly recognizes character 'u'.
It seems as if there is no 'u' class.
Do you have any thoughts about what the cause might be?

Empty output in validate.py

For prediction of some real images, I ran your script:

cd src ; python validate.py < ~/paths_to_images.txt

and received empty output (it didn't print anything into screen).
I went to check your code and found this line:

[output] = sess.run(prediction,{ image: image_data, width: image_data.shape[1]} )

and tried to print(output):

SparseTensorValue(indices=array([], shape=(0, 2), dtype=int64), values=array([], dtype=int64), dense_shape=array([1, 0]))

As expected its values is an empty array [].

What went wrong?

Could you give a link to a trained model to test the neural network?

Computing real sequence length

Hi!
I have a simple doubt about the calculation of the sequence length after the conv and pool layers. In the following code, why did you calculate the seq len just until the fourth pooling op (after_pool4)?

conv1 = conv_layer(inputs, layer_params[0], training ) # 30,30
conv2 = conv_layer( conv1, layer_params[1], training ) # 30,30
pool2 = pool_layer( conv2, 2, 'valid', 'pool2')        # 15,15
conv3 = conv_layer( pool2, layer_params[2], training ) # 15,15
conv4 = conv_layer( conv3, layer_params[3], training ) # 15,15
pool4 = pool_layer( conv4, 1, 'valid', 'pool4' )       # 7,14
conv5 = conv_layer( pool4, layer_params[4], training ) # 7,14
conv6 = conv_layer( conv5, layer_params[5], training ) # 7,14
pool6 = pool_layer( conv6, 1, 'valid', 'pool6')        # 3,13
conv7 = conv_layer( pool6, layer_params[6], training ) # 3,13
conv8 = conv_layer( conv7, layer_params[7], training ) # 3,13
pool8 = tf.layers.max_pooling2d( conv8, [3,1], [3,1], 
                           padding='valid', name='pool8') # 1,13

features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim

kernel_sizes = [ params[1] for params in layer_params]

#Calculate resulting sequence length from original image widths
conv1_trim = tf.constant( 2 * (kernel_sizes[0] // 2),
                          dtype=tf.int32,
                          name='conv1_trim')
one = tf.constant(1, dtype=tf.int32, name='one')
two = tf.constant(2, dtype=tf.int32, name='two')
after_conv1 = tf.subtract( widths, conv1_trim)
after_pool2 = tf.floor_div( after_conv1, two )
after_pool4 = tf.subtract(after_pool2, one)
sequence_length = tf.reshape(after_pool4,[-1], name='seq_len') # Vectorize

TypeError: init() got an unexpected keyword argument 'session_config'`

when i run validate.py,i encounter an error:
D:\Tensorflow\cnn_lstm_ctc_ocr-master\src>python validate.py d:/Tensorflow/cnn_lstm_ctc_ocr_master/src/11.jpg d:\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "validate.py", line 109, in <module> tf.app.run() File "d:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "validate.py", line 89, in main classifier = tf.estimator.Estimator( config=_get_config(), File "validate.py", line 82, in _get_config custom_config = tf.estimator.RunConfig( session_config=device_config ) TypeError: __init__() got an unexpected keyword argument 'session_config'
my tensorflow version is 1.2.1,windows

Training error under tf1.0

Hi,

I cloned the project and try to train it under tensorflow 1.0, but got the following error information. Could you please give me some advice? Thank you very much! BTW, I'm using ubuntu 16.04 and IBM ppc64 machine.

xiaoren@S822lc1:~/homework/cnn_lstm_ctc_ocr/src$ python train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    tf.app.run()
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 173, in main
    image,width,label = _get_input()
  File "train.py", line 83, in _get_input
    length_threshold=FLAGS.length_threshold )
  File "/home/xiaoren/homework/cnn_lstm_ctc_ocr/src/mjsynth.py", line 69, in bucketed_input_pipeline
    dynamic_pad=True)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 389, in bucket_by_sequence_length
    shared_name=shared_name)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 231, in bucket
    control_flow_ops.no_op)
  File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1726, in cond
    raise TypeError("pred must not be a Python bool")
TypeError: pred must not be a Python bool

Feature extraction using CNN

Hi,
I'm using this code to extract CNN features. So, I would like to ask about the variable containing features and how to convert it to a vector and save it to the disk. I'm using the file src/test.py

Thank you in advance for your help

Squeeze function error

there is no access tp mjsynth dataset. How can I train the model with IAM dataset?

Irrelevancy

Training fixed models is vastly wasteful of yours and especially your students time. This focus on narrow AI is only side stepping from our goal of developing human-level AI. The only reasons narrow AI from a person or group with fairly or greater deep understanding of AI can be justifiable by two reasons. One is for a product that is needed fairly quickly before human-level AI ultimately arrives. A product that the world would greatly suffer without before the eventual ultimatum when strong AI arrives. Among special group of others, I list Tesla's AI vision wing in this category in its goal which is part of a far greater picture "necessary" if explanation needed. The other reason is if this narrow is clearing up new sectors like potent technological models different from CNNs or RNNs or even pushing these models into new territories. This project among others is the social equivalent of globalization. Yes, it might help a bit but what it does more is waste talent on things that will eventually be replaced. It is working on a far superior sail when knowing world is on the verge of steam engines which will revolutionize the field. Great courage is required to make great strides. That courage is diving into something that you don't know what actually you are even searching for. That courage is knowing very well a life's work might net nothing. That courage is being selfless taking an impossible chance at breakthrough over some publications in your name. Have your students into new territories that even you don't even know comfortably. Be their leader in routes you feel no one is exploring, perhaps even against your own beliefs. Grow like the AI you are building after reading this if you read this entirely. Be the building block.

Someone you know very well.

State-of-the-art

Hi! Thanks for your work!
Do you know which algorithm is now state-of the-art in OCR since crnn was invented two years ago? Are there some new models that are based (may be) on the crnn (convolutional and recurrent)?

Using Multiple GPU as a train_device

I need just small help in training the model in multiple GPU so as option is availble --train_device I'm able to mention only one device. How I can mention the both of gpu as train device.

Which fork tensorflow are you using?

I tried to find the fork tensorflow that contains "tf.nn.ctc_beam_search_decoder_trie", However, I cannot find it. Could you tell me which one are you using?

lexicon.txt'function?

@weinman why we use lexicon.txt? what is lexicon.txt's function? what does that for？

validate.py speed problem

one picture one time needs 30 seconds, -- validate.py
picture is 32*280 around, 5000+ chars, 200mb model size.
how to speed up? 30 seconds is too long.

About the checkpoint

Hello, sir. I am s student just start the learning about tensorflow. I met a few problems when running this program. Can you upload the checkpoint. Thank you!

Can we use new training data?

Hi,
I was just wondering that can we use our own dataset(the images and labels may be different)?
But it looks like your code already has a pre-trained model.....Do we need to delete that pre-trained model? Since that pre-trained model may not be helpful for the new training set...

Thanks!

How to deal with single character input

Hi,

When I created tfrecords for my custom dataset, a lot of images got filtered out. Because the input image only contains one character, so precessed image width < min_width (https://github.com/weinman/cnn_lstm_ctc_ocr/blob/master/src/mjsynth-tfrecord.py#L143).

I am wondering what is the correct way to deal with single char inputs. Do I need to set min_width to be a smaller value (already tried 3, still filtered out many images), or should I pad the input image with zeros?

Thanks,
Xin

python validate.py problem

hi there!
when i type python validate.py and i type 1.jpg
i got follow error

how could i solve this,
thanks

Feature Extraction using CNN and Window width

Hi, I would like to use this code to extract features using CNN, I'm asking if I can use a sliding window width more than 1 pixel .
My goal is to extract a set of features based on CNN and to train a BLSTM-CTC recognizer.

How can I make own words-000.tfrecord ?

When debugging, I find the type is tensor of image,width...

image = tf.image.decode_jpeg( features['image/encoded'], channels=1 ) #gray
width = tf.cast( features['image/width'], tf.int32) # for ctc_loss
label = tf.serialize_sparse( features['image/labels'] ) # for batching
length = features['text/length']
text = features['text/string']
filename = features['image/filename']

Could you please tell me how to create my own words-000.tfrecord ?

Thank you!

ctc_beam_search_decoder_trie Tensorflow Operator not found

I was unable to find a Tensorflow operator called ctc_beam_search_decoder_trie. Was this a custom Tensorflow operator you created?

Thanks!

ctc_loss_calculator.cc Not a valid path

While training OCR I got the following error a couple of times only:
Error: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found
I don't understand what it means. Can anyone help?

Input shapes: [72,357,1], [4] and with input tensors computed

@weinman Hello,
1. I have seen this #35 But the which file name its not given even i checked validate.py, mjsynth.py, model_fn.py but was not able to find this line
image = tf.concat([first_row, image], 0)
or this
image = tf.placeholder(tf.uint8, shape=[32, None, 1])

tensorflow/python/framework/ops.py", line 1631, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension size must be evenly divisible by 32 but is 25704 for 'Reshape' (op: 'Reshape') with input shapes: [72,357,1], [4] and with input tensors computed as partial shapes: input[1] = [1,32,?,1].

2. In the default train data are there any Number Present I mean 123. I have checked it but its not availble. If i want to train my coustom data how is the labling done i usally use bonding box but this is different can you tell me how is the lableing done and how to train coustom data. Forgive me for my english is not good.

Model

Please, can you give me you pre-trained model for test

Memory soon used up when running train step

After running 'make mjsynth-download' and 'make mjsynth-tfrecord', I went to the 3rd step to train the model by running 'make train', but the machine's memory(32G) was soon used up in 2 secends and the host hang and restarted. What's the possible cause of this issue?

Can't recognize consecutive same charactors

Hi, I read your excellent paper and use your code to do some experiment. But I found it can not recognize the consecutive charactor when they are same. For example, "good" will be recognized as "god".
Could you please help me about this problem?
Thanks

passing the session .

hey , thank you for your nice code !
I want to load the model just once and pass the session for several predicts .
because of loading the model time .
I just wonder how to do it , please help ! , thanks .

Imbalanced classes

Hi!
I am new in tensorflow, now I am trying to figure out how to work with your model. The thing is, I need to put my own data in it, but my dataset is very imbalanced (for example, the class ‘q’ occurs about 100 times in the dataset, but the class ‘a’ may be more than 10 thousand times). What should I do? How can I use class weights in your code?
I think it may looks like this. In function ‘ctc_loss_layer’ in ‘model.py’ we have rnn_logits - this is output from RNN, what if I multiply it by class weights before put it in CTC loss? Then CTC loss would have the greater weights for rare classes, and that would impact to backpropagation. Am I right? Could you please help me?

convergence problem

the test output result is [ iterations，the test loss, label error, and the sequence error]

The testing output is out of order in README