weinman / cnn_lstm_ctc_ocr Goto Github PK
View Code? Open in Web Editor NEWTensorflow-based CNN+LSTM trained with CTC-loss for OCR
License: GNU General Public License v3.0
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
License: GNU General Public License v3.0
Hello, sir. I am s student just start the learning about tensorflow. I met a few problems when running this program. Can you upload the checkpoint. Thank you!
When I train on my data. There is a error! Please can any one suggest me?
2017-12-10 13:28:41.796273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:00:06.0 totalMemory: 11.90GiB freeMemory: 11.76GiB 2017-12-10 13:28:41.796331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:00:06.0, compute capability: 6.1) INFO:tensorflow:Starting standard services. INFO:tensorflow:Saving checkpoint to path ../data/model/model.ckpt INFO:tensorflow:Starting queue runners. INFO:tensorflow:global_step/sec: 0 2017-12-10 13:28:47.390354: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390538: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]] 2017-12-10 13:28:47.390862: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Tried to explicitly squeeze dimension 1 but dimension was not 1: 2 [[Node: convnet/features = Squeeze[T=DT_FLOAT, squeeze_dims=[1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](convnet/pool8/MaxPool)]]
Hello,
I trained your model with mjsynth dataset and default parameter settings over 1000000 steps.
I found that the model often wrongly recognizes character 'u'.
It seems as if there is no 'u' class.
Do you have any thoughts about what the cause might be?
I downloaded the mjsynth dataset separately and stored the images in the image subpath under the data directory. Basically, I did everything manually up until the "make mjsynth-tfrecord.py" command.
When i ran the command, it showed me a syntax error in the print line in this line from the mjsynth-tfrecord.py file.
print str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename
gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print str(i),'of',str(num_shards),'[',str(start),':]',out_filename
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)
since i am using python 3.6, I thought the problem is the absence of opening and closing brackets in the print line, hence i changed it to this...
print (str(i),'of',str(num_shards),'[',str(start),':',str(end),']',out_filename)
gen_shard(sess, input_base_dir, image_filenames[start:end], out_filename)
# Clean up writing last shard
start = num_shards*images_per_shard
out_filename = output_filebase+'-'+(shard_format % num_shards)+'.tfrecord'
print (str(i),'of',str(num_shards),'[',str(start),':]',out_filename)
gen_shard(sess, input_base_dir, image_filenames[start:], out_filename)
And the program started runnig, but Im seeing a lot of files read a error corrosponding to this line
except:
# Some files have bogus payloads, catch and note the error, moving on
print('ERROR',filename)
Can anyone tell me why this is happening? Thankyou for the help in advance.
@weinman
1. Is there any pre-train model file so i can just check it
2. When I ran make command its creating tfrecords files and terminal output is below its skipping some files may be for test use I think
here is the terminal output
177 of 1000 [ 1278825 : 1286050 ] ../data/train/words-177.tfrecord ('SKIPPING', '1993/4/472_nj_51777.jpg') ('SKIPPING', '1991/5/238_d_18979.jpg') ('SKIPPING', '1991/5/204_V_83811.jpg') ('SKIPPING', '1991/4/228_j_41074.jpg') 178 of 1000 [ 1286050 : 1293275 ] ../data/train/words-178.tfrecord ('SKIPPING', '1990/1/447_4_95.jpg') ('SKIPPING', '1989/4/34_NI_51538.jpg') ('SKIPPING', '1988/3/445_CORRECTNESS_17153.jpg') 179 of 1000 [ 1293275 : 1300500 ] ../data/train/words-179.tfrecord ('SKIPPING', '1987/7/56_n_50734.jpg') ('SKIPPING', '1987/6/477_SJ_71221.jpg') ('SKIPPING', '1987/6/102_RADIOTELEPHONE_62145.jpg') ('SKIPPING', '1986/1/175_INDIVIDUALISTICALLY_39086.jpg') 180 of 1000 [ 1300500 : 1307725 ] ../data/train/words-180.tfrecord ('SKIPPING', '1985/1/50_Debaucheries_19549.jpg') 181 of 1000 [ 1307725 : 1314950 ] ../data/train/words-1
While training OCR I got the following error a couple of times only:
Error: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found
I don't understand what it means. Can anyone help?
After running 'make mjsynth-download' and 'make mjsynth-tfrecord', I went to the 3rd step to train the model by running 'make train', but the machine's memory(32G) was soon used up in 2 secends and the host hang and restarted. What's the possible cause of this issue?
I want to recognize more than just English alphabet and numbers (e.g. special Unicode characters). Is this possible and how can I do this?
Suppose I have my own dataset, do I have to write my own data loader and provide
out_charset="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
like in your src/mjsynth.py
hi weinman , I have read the paper and code and try to understand but a few questions confused me, please help me.
bucket_by_sequence_length
with paramter dynamic_pad setted True , in every batch has a fix shape , but different batch may have different shape, so how does cnn in the model work ?@weinman Hello,
1. I have seen this #35 But the which file name its not given even i checked validate.py,
mjsynth.py
, model_fn.py
but was not able to find this line
image = tf.concat([first_row, image], 0)
or this
image = tf.placeholder(tf.uint8, shape=[32, None, 1])
tensorflow/python/framework/ops.py", line 1631, in _create_c_op
raise ValueError(str(e))
ValueError: Dimension size must be evenly divisible by 32 but is 25704 for 'Reshape' (op: 'Reshape') with input shapes: [72,357,1], [4] and with input tensors computed as partial shapes: input[1] = [1,32,?,1].
2. In the default train data are there any Number Present I mean 123. I have checked it but its not availble. If i want to train my coustom data how is the labling done i usally use bonding box but this is different can you tell me how is the lableing done and how to train coustom data. Forgive me for my english is not good.
I want to save the model in PB file mode,Then I can call it in Vc++,But I can't find the output_node_name.
How could I found it?
I was unable to find a Tensorflow operator called ctc_beam_search_decoder_trie. Was this a custom Tensorflow operator you created?
Thanks!
Hi,
I'm using this code to extract CNN features. So, I would like to ask about the variable containing features and how to convert it to a vector and save it to the disk. I'm using the file src/test.py
Thank you in advance for your help
Could you please tell me how to create my own words-000.tfrecord ?
Thank you!
there is no access tp mjsynth dataset. How can I train the model with IAM dataset?
now, I have trained a model with dataset, how can I test the model with one image, which will output the result?
excuse me,in function _preprocess_image(image) (Mjsynth.py), why rescale the pixels value to float([-0.5,0.5]) not float([0,1]). can u tell me why? tks
Hi, I would like to use this code to extract features using CNN, I'm asking if I can use a sliding window width more than 1 pixel .
My goal is to extract a set of features based on CNN and to train a BLSTM-CTC recognizer.
Hi,
I cloned the project and try to train it under tensorflow 1.0, but got the following error information. Could you please give me some advice? Thank you very much! BTW, I'm using ubuntu 16.04 and IBM ppc64 machine.
xiaoren@S822lc1:~/homework/cnn_lstm_ctc_ocr/src$ python train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "train.py", line 207, in <module>
tf.app.run()
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 173, in main
image,width,label = _get_input()
File "train.py", line 83, in _get_input
length_threshold=FLAGS.length_threshold )
File "/home/xiaoren/homework/cnn_lstm_ctc_ocr/src/mjsynth.py", line 69, in bucketed_input_pipeline
dynamic_pad=True)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 389, in bucket_by_sequence_length
shared_name=shared_name)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/bucket_ops.py", line 231, in bucket
control_flow_ops.no_op)
File "/opt/DL/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1726, in cond
raise TypeError("pred must not be a Python bool")
TypeError: pred must not be a Python bool
When I trained your sample data with tensorflow-gpu 1.12, I got this error (I've cloned tf-1.12 branch, but it had same error).
INFO:tensorflow:Using config: {'_eval_distribute': None, '_num_worker_replicas': 1, '_session_config': allow_soft_placement: true
, '_save_checkpoints_steps': None, '_service': None, '_task_id': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_protocol': None, '_master': '', '_tf_random_seed': None, '_save_checkpoints_secs': 120, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd68bd37390>, '_experimental_distribute': None, '_keep_checkpoint_max': 5, '_is_chief': True, '_task_type': 'worker', '_device_fn': None, '_train_distribute': None, '_save_summary_steps': 100, '_model_dir': '../data/model', '_evaluation_master': '', '_num_ps_replicas': 0}
Traceback (most recent call last):
File "train.py", line 182, in
tf.app.run()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 179, in main
classifier.train( input_fn=_get_input, max_steps=FLAGS.max_num_steps )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1234, in _train_model_default
input_fn, model_fn_lib.ModeKeys.TRAIN))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1075, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 1162, in _call_input_fn
return input_fn(**kwargs)
File "train.py", line 130, in _get_input
dataset = pipeline.get_data( FLAGS.static_data, **data_args)
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/pipeline.py", line 79, in get_data
dataset = dpipe.get_dataset( dpipe_args )
File "/home/lionel/Desktop/ML/mlcode/OCR/CRNN/cnn_lstm_ctc_ocr-master/src/mjsynth.py", line 60, in get_dataset
buffer_size=buffer_sz )
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 218, in init
prefetch_input_elements=None)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 134, in init
cycle_length, block_length)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2714, in init
super(InterleaveDataset, self).init(input_dataset, map_func)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 2677, in init
experimental_nested_dataset_support=True)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1860, in init
self._function.add_to_graph(ops.get_default_graph())
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 479, in add_to_graph
self._create_definition_if_needed()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 335, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 344, in _create_definition_if_needed_impl
self._capture_by_value, self._caller_device)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 864, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1794, in tf_data_structured_function_wrapper
ret = func(*nested_args)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 210, in read_one_file
return _TFRecordDataset(filename, compression_type, buffer_size)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/ops/readers.py", line 105, in init
argument_default=_DEFAULT_READER_BUFFER_SIZE_BYTES)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/data/util/convert.py", line 32, in optional_param_to_tensor
argument_value, dtype=argument_dtype, name=argument_name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
as_ref=False)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 442, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/lionel/virtualenv/commonenv/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int64, got 256.0 of type 'float' instead.
I want to load the model only once and pass the N number of images for recognition, but whenever I'm passing images, model is loading again and again. I tried to load model in another function and using the same session variable for future recognition but it is giving error raise RuntimeError('Attempted to use a closed Session.') RuntimeError: Attempted to use a closed Session.
Hi,
When I created tfrecords for my custom dataset, a lot of images got filtered out. Because the input image only contains one character, so precessed image width < min_width
(https://github.com/weinman/cnn_lstm_ctc_ocr/blob/master/src/mjsynth-tfrecord.py#L143).
I am wondering what is the correct way to deal with single char inputs. Do I need to set min_width
to be a smaller value (already tried 3, still filtered out many images), or should I pad the input image with zeros?
Thanks,
Xin
@weinman why we use lexicon.txt? what is lexicon.txt's function? what does that for?
For prediction of some real images, I ran your script:
cd src ; python validate.py < ~/paths_to_images.txt
and received empty output (it didn't print anything into screen).
I went to check your code and found this line:
[output] = sess.run(prediction,{ image: image_data, width: image_data.shape[1]} )
and tried to print(output)
:
SparseTensorValue(indices=array([], shape=(0, 2), dtype=int64), values=array([], dtype=int64), dense_shape=array([1, 0]))
As expected its values is an empty array []
.
What went wrong?
one picture one time needs 30 seconds, -- validate.py
picture is 32*280 around, 5000+ chars, 200mb model size.
how to speed up? 30 seconds is too long.
ValueError: generator
yielded an element of shape (37, 109, 1) where an element of shape (32, ?, 1) was expected.
the pipline.py call preprocess data
dataset = dataset.map( dpipe.preprocess_fn, num_parallel_calls=num_threads ) seems ok and
maptextsynth.py use the new normalize_image method
def _preprocess_image( image ):
"""Rescale image"""
image = pipeline.normalize_image(image)
return image
hey , thank you for your nice code !
I want to load the model just once and pass the session for several predicts .
because of loading the model time .
I just wonder how to do it , please help ! , thanks .
Layer Op KrnSz Stride(v,h) OutDim H W PadOpt
1 Conv 3 1 64 30 30 valid
2 Conv 3 1 64 30 30 same
Pool 2 2 64 15 15
3 Conv 3 1 128 15 15 same
4 Conv 3 1 128 15 15 same
Pool 2 2,1 128 7 14
5 Conv 3 1 256 7 14 same
6 Conv 3 1 256 7 14 same
Pool 2 2,1 256 3 13
7 Conv 3 1 512 3 13 same
8 Conv 3 1 512 3 13 same
Pool 3 3,1 512 1 13
9 LSTM 512
10 LSTM 512
if I want to train more than 3000+ chars, how to modify the model.
cnn layer more deeper, change to maxpooling layer or what?
Training fixed models is vastly wasteful of yours and especially your students time. This focus on narrow AI is only side stepping from our goal of developing human-level AI. The only reasons narrow AI from a person or group with fairly or greater deep understanding of AI can be justifiable by two reasons. One is for a product that is needed fairly quickly before human-level AI ultimately arrives. A product that the world would greatly suffer without before the eventual ultimatum when strong AI arrives. Among special group of others, I list Tesla's AI vision wing in this category in its goal which is part of a far greater picture "necessary" if explanation needed. The other reason is if this narrow is clearing up new sectors like potent technological models different from CNNs or RNNs or even pushing these models into new territories. This project among others is the social equivalent of globalization. Yes, it might help a bit but what it does more is waste talent on things that will eventually be replaced. It is working on a far superior sail when knowing world is on the verge of steam engines which will revolutionize the field. Great courage is required to make great strides. That courage is diving into something that you don't know what actually you are even searching for. That courage is knowing very well a life's work might net nothing. That courage is being selfless taking an impossible chance at breakthrough over some publications in your name. Have your students into new territories that even you don't even know comfortably. Be their leader in routes you feel no one is exploring, perhaps even against your own beliefs. Grow like the AI you are building after reading this if you read this entirely. Be the building block.
I need just small help in training the model in multiple GPU so as option is availble --train_device I'm able to mention only one device. How I can mention the both of gpu as train device.
Hi! Thanks for your work!
Do you know which algorithm is now state-of the-art in OCR since crnn was invented two years ago? Are there some new models that are based (may be) on the crnn (convolutional and recurrent)?
Hi @weinman sir,
I've run the training for this model. In the source file, I've noticed that model is not saving the checkpoint on basis of less loss, and I am not able to get what is accuracy of the model till now. I want to know that, on which basis it is saving the checkpoints.
Thank You...
Hi,
I was just wondering that can we use our own dataset(the images and labels may be different)?
But it looks like your code already has a pre-trained model.....Do we need to delete that pre-trained model? Since that pre-trained model may not be helpful for the new training set...
Thanks!
I tried to find the fork tensorflow that contains "tf.nn.ctc_beam_search_decoder_trie", However, I cannot find it. Could you tell me which one are you using?
Hi, I read your excellent paper and use your code to do some experiment. But I found it can not recognize the consecutive charactor when they are same. For example, "good" will be recognized as "god".
Could you please help me about this problem?
Thanks
Please, can you give me you pre-trained model for test
can you tell me the version of tensorflow?
The testing output is out of order in README
Hi, This paper is very useful. But I am facing an issue of unable to recognise consecutive characters like "password" is recognised as "pasword". Can you help with this how to get rid of it ?
Hi!
I am new in tensorflow, now I am trying to figure out how to work with your model. The thing is, I need to put my own data in it, but my dataset is very imbalanced (for example, the class ‘q’ occurs about 100 times in the dataset, but the class ‘a’ may be more than 10 thousand times). What should I do? How can I use class weights in your code?
I think it may looks like this. In function ‘ctc_loss_layer’ in ‘model.py’ we have rnn_logits - this is output from RNN, what if I multiply it by class weights before put it in CTC loss? Then CTC loss would have the greater weights for rare classes, and that would impact to backpropagation. Am I right? Could you please help me?
when i run validate.py,i encounter an error:
D:\Tensorflow\cnn_lstm_ctc_ocr-master\src>python validate.py d:/Tensorflow/cnn_lstm_ctc_ocr_master/src/11.jpg d:\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from
floatto
np.floatingis deprecated. In future, it will be treated as
np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "validate.py", line 109, in <module> tf.app.run() File "d:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "validate.py", line 89, in main classifier = tf.estimator.Estimator( config=_get_config(), File "validate.py", line 82, in _get_config custom_config = tf.estimator.RunConfig( session_config=device_config ) TypeError: __init__() got an unexpected keyword argument 'session_config'
my tensorflow version is 1.2.1,windows
This isn't an issue but a question. I've read the CRNN paper and have played around with PyTorch implementation of it: https://github.com/meijieru/crnn.pytorch
I've noticed that CRNN is able to detect single words but not multiple words. Example:
My question is, would this project help with detection of multiple words?
Hi!
I have a simple doubt about the calculation of the sequence length after the conv and pool layers. In the following code, why did you calculate the seq len just until the fourth pooling op (after_pool4)?
conv1 = conv_layer(inputs, layer_params[0], training ) # 30,30
conv2 = conv_layer( conv1, layer_params[1], training ) # 30,30
pool2 = pool_layer( conv2, 2, 'valid', 'pool2') # 15,15
conv3 = conv_layer( pool2, layer_params[2], training ) # 15,15
conv4 = conv_layer( conv3, layer_params[3], training ) # 15,15
pool4 = pool_layer( conv4, 1, 'valid', 'pool4' ) # 7,14
conv5 = conv_layer( pool4, layer_params[4], training ) # 7,14
conv6 = conv_layer( conv5, layer_params[5], training ) # 7,14
pool6 = pool_layer( conv6, 1, 'valid', 'pool6') # 3,13
conv7 = conv_layer( pool6, layer_params[6], training ) # 3,13
conv8 = conv_layer( conv7, layer_params[7], training ) # 3,13
pool8 = tf.layers.max_pooling2d( conv8, [3,1], [3,1],
padding='valid', name='pool8') # 1,13
features = tf.squeeze(pool8, axis=1, name='features') # squeeze row dim
kernel_sizes = [ params[1] for params in layer_params]
#Calculate resulting sequence length from original image widths
conv1_trim = tf.constant( 2 * (kernel_sizes[0] // 2),
dtype=tf.int32,
name='conv1_trim')
one = tf.constant(1, dtype=tf.int32, name='one')
two = tf.constant(2, dtype=tf.int32, name='two')
after_conv1 = tf.subtract( widths, conv1_trim)
after_pool2 = tf.floor_div( after_conv1, two )
after_pool4 = tf.subtract(after_pool2, one)
sequence_length = tf.reshape(after_pool4,[-1], name='seq_len') # Vectorize
hi, calculating sequence length in calc_seq_len() (mjsynth-tfrecord.py) should be the same with convnet_layers() (model.py)? may be like this:
import model
kernel_sizes = [ params[1] for params in model.layer_params]
def calc_seq_len(image_width):
conv1_trim = 2 * (kernel_sizes[0] // 2)
after_conv1 = image_width - conv1_trim
after_pool2 = after_conv1 // 2
after_pool4 = after_pool2 - 1
after_pool6 = after_pool4 - 1
after_pool8 = after_pool6
sequence_length = after_pool8
return sequence_length
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.