mvoelk / ssd_detectors Goto Github PK

This project forked from rykov8/ssd_keras

SSD-based object and text detection with Keras, SSD, DSOD, TextBoxes, SegLink, TextBoxes++, CRNN

License: MIT License

Python 1.56% Jupyter Notebook 98.44% Shell 0.01%

keras ssd dsod textboxes seglink textboxespp crnn densnet-seglink densnet-textboxespp virtual-batch-size gradient-accumulation focal-loss distance-iou-loss shrikage-loss

ssd_detectors's Issues

DSOD Low mAP

Not necessarily an issue, but the mAP I got from DSOD512 training on VOC 07+12 and testing on 07 was quite low, approximately 0.13.

Only thing I really changed was using Adam instead of AdamAccumulate because it throws an error on tf 2.0. I also used softmax.

Also, metrics don't show during training other than the loss itself.

def trainMultiGPU():
    # set up data sets
    gt_util_voc = GTUtility("data/VOC2012train/")
    gt_util_voc7 = GTUtility("data/VOC2007train/")
    gt_util_voc_val = GTUtility("data/VOC2012val/", validation=True)
    gt_util_voc7_val = GTUtility("data/VOC2007val/", validation=True)

    gt_util_train = GTUtility.merge(gt_util_voc, gt_util_voc7)
    gt_util_val = GTUtility.merge(gt_util_voc_val, gt_util_voc7_val)

    experiment = 'dsod300_voc12_7'
    batch_size = 16

    # class_weights = prior_util.compute_class_weights(gt_util_train)
    class_weights = np.array(
        [0.00007169, 1.20864663, 1.23607288, 0.81087541, 1.32018959, 1.65339534, 1.47852761, 0.45099343, 0.84154551,
         0.33765636, 1.41315118, 1.32907548, 0.63492811, 1.15680594, 1.18978997, 0.07548318, 0.91531396, 1.21262288,
         1.15910985, 1.49269817, 1.08304682])

    # DSOD paper
    # batch size 128
    # 320k iterations
    # initial learning rate 0.1

    epochs = 1000
    initial_epoch = 0

    with tf.device("/cpu:0"):
        # set up DSOD 512
        model = DSOD512(num_classes=gt_util_train.num_classes, softmax=True)

    prior_util = PriorUtil(model)
    gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=True)
    gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=True)

    # weight decay
    regularizer = keras.regularizers.l2(5e-4)  # None if disabled
    for l in model.layers:
        if l.__class__.__name__.startswith('Conv'):
            l.kernel_regularizer = regularizer

    checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
    if not os.path.exists(checkdir):
        os.makedirs(checkdir)

    optim = keras.optimizers.Adam(lr=1e-3)

    # loss = SSDLoss(alpha=1.0, neg_pos_ratio=3.0)
    loss = SSDFocalLoss(lambda_conf=1.0, class_weights=class_weights)

    model = multi_gpu_model(model, gpus=2)
    model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics)

    # add some callbacks
    reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
    early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

    history = model.fit(
        gen_train.generate(),
        steps_per_epoch=gen_train.num_batches,
        epochs=epochs,
        verbose=1,
        callbacks=[
            keras.callbacks.ModelCheckpoint(checkdir + '/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True,
                                            save_best_only=True, period=3),
            Logger(checkdir),
            reduce_lr,
            early_stopping
        ],
        validation_data=gen_val.generate(),
        validation_steps=gen_val.num_batches,
        class_weight=None,
        workers=1,
        use_multiprocessing=False,
        initial_epoch=initial_epoch)

Hi - thank you for the well written implementation of SSD Text detection using Keras. I have been using the code from SL_end2end_predict.ipynb to get back detected characters along with their bounding boxes.

The Detection model has an output dimension of (batches, 5461, 31). Here, how may I retrieve the coordinates of a prediction box?

Thank you.

TBPP output tensor

Hello! First of all, thank you very much for your implementation of TBPP in Keras. Is well written and simple to use!

I'm having trouble understanding just a thing: The output of a inference in TBPP (at least for me) is a tensor with the shape of (batches, 76454, 18). I understand the eighteen values being the predicted positions, boxes, classes. But the 76454, where this come from?

I'm trying to hack your model with a tfRecord dataset input, because I'm having a big problem with my gpu being idle while my cpu is stressed doing data loading.

Thank you again with all the help you provide to my masters research by implementing this models!

How to evaluate TBPP on arbitrary sized input without resize?

Can I evaluate on arbitrary sized input? is there a way?

Model seems to return error when I change

Input in the TBPP model to None sized array

encode/decode error for tbpp

I am trying to display the output from the tbpp model using real-world images. I have a set of results from the model, and am using the following code:

`from sl_utils import PriorUtil
prior_util = PriorUtil(tbpp_model)

segment_threshold = 0.6; link_threshold = 0.25
res = prior_util.decode(results[0], segment_threshold, link_threshold)
prior_util.plot_results(res)`

I get the following error:

ValueError: operands could not be broadcast together with shapes (76454,6) (76454,8)

The shape of 'results' is (1, 76454, 19)

Error in gt_util.sample_random_batch(batch_size=32, input_size=model.image_size)

ssd_detectors-master\ssd_data.py in preprocess(img, size)
628 img = img.astype(np.float32)
629 mean = np.array([104,117,123])
--> 630 img -= mean[np.newaxis, np.newaxis, :]
631 return img
632
ValueError: operands could not be broadcast together with shapes (512,512) (1,1,3) (512,512)

TBPP model arbitrary input shape

Hi, I'm university student in Korea.

At first, Thank you for your help of my previous issue(visualizing).

This time, I wanted TBPP model's input shape to be arbitrary, so I set TBPP model's input shape asimage.shape.

But when i adjust PriorUtil, because of assert code in ssd_utils.py line.193, I got an assertion error.

To avoid this error, I just erased assert code and I got an result as below

Although I erased assert code, I think there is no problem so far.

Could you let me know why did you put assertion code and is there any problem when I erase assertion code in ssd_utils.py line.193?

Request to add .pkl files to repo

Hi - as of now the .pkl files have to be generated from the datasets to run the end2end files. Requesting the owner to please upload these to the repo to allow running the code using weights files without having to download huge datasets.

May I ask for checkpoints?

Hello!
Thank you so much for your codes and help for everyone

I tried to run your *_train codes, but due to slow learning, I don't think I can ever complete the training..
Your *_evalute codes use trained weights (.h5), which I don't have, so I have trouble running those codes as well

Would you mind providing checkpoints information...?
It'd be great help if you could give me a hand
Thank you!

SL_end2end_predict.ipynb: Model dimensions don't match that in weights file.

In notebook SL_end2end_predict.ipynb:

Model = SL512 weights_path = './models/201809231008_sl512_synthtext/weights.002.h5' segment_threshold = 0.6; link_threshold = 0.25 plot_name = 'sl512_crnn_sythtext' Model = DSODSL512 weights_path = './models/201806021007_dsodsl512_synthtext/weights.012.h5' segment_threshold = 0.55; link_threshold = 0.45 plot_name = 'dsodsl512_crnn_sythtext' sl_graph = tf.Graph() with sl_graph.as_default(): sl_session = tf.compat.v1.Session() with sl_session.as_default(): model = Model() prior_util = PriorUtil(model) load_weights(model, weights_path) image_size = model.image_size

Gives output:

WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. something went wrong conv2d_1 model [[3, 3, 64, 64], [64]] file [(3, 3, 3, 64), (64,)] Layer weight shape (3, 3, 64, 64) not compatible with provided weight shape (3, 3, 3, 64) something went wrong conv2d_2 model [[3, 3, 64, 128], [128]] file [(3, 3, 64, 64), (64,)] Layer weight shape (3, 3, 64, 128) not compatible with provided weight shape (3, 3, 64, 64) something went wrong batch_normalization_2 model [[128], [128], [128], [128]] file [(64,), (64,), (64,), (64,)] Layer weight shape (128,) not compatible with provided weight shape (64,) something went wrong conv2d_3 model [[1, 1, 128, 192], [192]] file [(3, 3, 64, 128), (128,)] Layer weight shape (1, 1, 128, 192) not compatible with provided weight shape (3, 3, 64, 128) something went wrong batch_normalization_4 model [[192], [192], [192], [192]] file [(128,), (128,), (128,), (128,)] Layer weight shape (192,) not compatible with provided weight shape (128,) something went wrong conv2d_4 model [[3, 3, 192, 48], [48]] file [(1, 1, 128, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 128, 192) something went wrong batch_normalization_5 model [[176], [176], [176], [176]] file [(192,), (192,), (192,), (192,)] Layer weight shape (176,) not compatible with provided weight shape (192,) something went wrong conv2d_5 model [[1, 1, 176, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 176, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_6 model [[192], [192], [192], [192]] file [(176,), (176,), (176,), (176,)] Layer weight shape (192,) not compatible with provided weight shape (176,) something went wrong conv2d_6 model [[3, 3, 192, 48], [48]] file [(1, 1, 176, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 176, 192) something went wrong batch_normalization_7 model [[224], [224], [224], [224]] file [(192,), (192,), (192,), (192,)] Layer weight shape (224,) not compatible with provided weight shape (192,) something went wrong conv2d_7 model [[1, 1, 224, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 224, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_8 model [[192], [192], [192], [192]] file [(224,), (224,), (224,), (224,)] Layer weight shape (192,) not compatible with provided weight shape (224,) something went wrong conv2d_8 model [[3, 3, 192, 48], [48]] file [(1, 1, 224, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 224, 192) something went wrong batch_normalization_9 model [[272], [272], [272], [272]] file [(192,), (192,), (192,), (192,)] Layer weight shape (272,) not compatible with provided weight shape (192,) something went wrong conv2d_9 model [[1, 1, 272, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 272, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_10 model [[192], [192], [192], [192]] file [(272,), (272,), (272,), (272,)] Layer weight shape (192,) not compatible with provided weight shape (272,) something went wrong conv2d_10 model [[3, 3, 192, 48], [48]] file [(1, 1, 272, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 272, 192) something went wrong batch_normalization_11 model [[320], [320], [320], [320]] file [(192,), (192,), (192,), (192,)] Layer weight shape (320,) not compatible with provided weight shape (192,) something went wrong conv2d_11 model [[1, 1, 320, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 320, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_12 model [[192], [192], [192], [192]] file [(320,), (320,), (320,), (320,)] Layer weight shape (192,) not compatible with provided weight shape (320,) something went wrong conv2d_12 model [[3, 3, 192, 48], [48]] file [(1, 1, 320, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 320, 192) something went wrong batch_normalization_13 model [[368], [368], [368], [368]] file [(192,), (192,), (192,), (192,)] Layer weight shape (368,) not compatible with provided weight shape (192,) something went wrong conv2d_13 model [[1, 1, 368, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 368, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_14 model [[192], [192], [192], [192]] file [(368,), (368,), (368,), (368,)] Layer weight shape (192,) not compatible with provided weight shape (368,) something went wrong conv2d_14 model [[3, 3, 192, 48], [48]] file [(1, 1, 368, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 368, 192) something went wrong batch_normalization_15 model [[416], [416], [416], [416]] file [(192,), (192,), (192,), (192,)] Layer weight shape (416,) not compatible with provided weight shape (192,) something went wrong conv2d_15 model [[1, 1, 416, 416], [416]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 416, 416) not compatible with provided weight shape (3, 3, 192, 48) something went wrong conv2d_16 model [[1, 1, 416, 192], [192]] file [(1, 1, 416, 416), (416,)] Layer weight shape (1, 1, 416, 192) not compatible with provided weight shape (1, 1, 416, 416) something went wrong batch_normalization_17 model [[192], [192], [192], [192]] file [(416,), (416,), (416,), (416,)] Layer weight shape (192,) not compatible with provided weight shape (416,) something went wrong conv2d_17 model [[3, 3, 192, 48], [48]] file [(1, 1, 416, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 416, 192) something went wrong batch_normalization_18 model [[464], [464], [464], [464]] file [(192,), (192,), (192,), (192,)] Layer weight shape (464,) not compatible with provided weight shape (192,) something went wrong conv2d_18 model [[1, 1, 464, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 464, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_19 model [[192], [192], [192], [192]] file [(464,), (464,), (464,), (464,)] Layer weight shape (192,) not compatible with provided weight shape (464,) something went wrong conv2d_19 model [[3, 3, 192, 48], [48]] file [(1, 1, 464, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 464, 192) something went wrong batch_normalization_20 model [[512], [512], [512], [512]] file [(192,), (192,), (192,), (192,)] Layer weight shape (512,) not compatible with provided weight shape (192,) something went wrong conv2d_20 model [[1, 1, 512, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 512, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_21 model [[192], [192], [192], [192]] file [(512,), (512,), (512,), (512,)] Layer weight shape (192,) not compatible with provided weight shape (512,) something went wrong conv2d_21 model [[3, 3, 192, 48], [48]] file [(1, 1, 512, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 512, 192) something went wrong batch_normalization_22 model [[560], [560], [560], [560]] file [(192,), (192,), (192,), (192,)] Layer weight shape (560,) not compatible with provided weight shape (192,) something went wrong conv2d_22 model [[1, 1, 560, 192], [192]] file [(3, 3, 192, 48), (48,)] Layer weight shape (1, 1, 560, 192) not compatible with provided weight shape (3, 3, 192, 48) something went wrong batch_normalization_23 model [[192], [192], [192], [192]] file [(560,), (560,), (560,), (560,)] Layer weight shape (192,) not compatible with provided weight shape (560,) something went wrong conv2d_23 model [[3, 3, 192, 48], [48]] file [(1, 1, 560, 192), (192,)] Layer weight shape (3, 3, 192, 48) not compatible with provided weight shape (1, 1, 560, 192) something went wrong batch_normalization_24 model [[608], [608], [608], [608]] file [(192,), (192,), (192,), (192,)].....

and so on.

Environment: tensorflow 2.2.0

Please let me know why this is happening and what I could to to solve it.

Vertical degree issue

Hi @mvoelk , good job here to reimplement a bunch of SSD variation with advanced and SoTA techniques. Can I ask you, you mention in https://github.com/mvoelk/ssd_detectors/blob/master/sl_utils.py#L290 that there is a vertical issue problem where it will become nan/inf. Is this has something to do with the way you wrap the dataset from .pkl or is this the issue with the decode function it self? do you have some direction on how to address this issue?

Thank you

About Textboxes++

could you give me the textboxes++ code of keras? I can't train and test the textboxes++ model.Thanks

Light architectures for object detection

Hi, Markus,

Cool repo for detection models.
I'm interested in more light architectures for object detection and tried to change VGG16 (SSD512_body and SSD300_body) to mobile_net or mobile_net_v2.
But I can't get an adequate quality for them on VOC dataset.
Metrics on 30 epoch:
loss 4.34212 conf_loss 0.02317 loc_loss 2.02536
precision 0.54101 recall 0.17049 fmeasure 0.25575 accuracy 0.86921
val_loss 4.28855 val_conf_loss 0.02258 val_loc_loss 2.03045
val_precision 0.49608 val_recall 0.16673 val_fmeasure 0.24578 val_accuracy 0.87617

Probably it's due to new feature maps or incompatibility between backbone and extra blocks or something in training pipeline.
Did you try to do the same? If yes, could you give your contact to discuss how to use these architectures in your repo?

Best Wishes,
Valery

[Question] performance about tbpp with focal loss but without DenseNet

Hi,

I am trying to duplicate the tbpp512fl_synthtext experiment. After the 9th epoch completes, I have: val_precision: 0.8418 - val_recall: 0.6407 - val_accuracy: 0.5731 - val_fmeasure: 0.7189.

It is far from "TextBoxes++ with DenseNet and Focal Loss" performance that you have reported on the ReadMe (0.901 precision, 0.931 recall and 0.916 fmeasure). It also looks to me that I will never reach 90+% performance.

I understand that I have not used DenseNet yet, and you also do not report "TextBoxes++ with Focal Loss" (without DenseNet). Since it will take me days to duplicate the experiment with "TextBoxes++ with DenseNet and Focal Loss", I want to understand whether the result that I obtained above for tbpp512fl_synthtext experiment is reasonable.

It is really surprising to me that using-or-not-using DenseNet would cause such large differences. So, please confirm if you got a chance. Many thanks indeed.

Best Regards
Jie

.mat file missing

Hi ,

Can you please provide the .mat file of dataset.

TBPP++ How to visualize prior boxes

Hi, I'm university student of South Korea.

I'm in beginner level of deep learning.

Is there any way to visualize prior boxes on the syntext image?

++ I want to know how to use function plot_boxes in PriorMaps and plot functions in PriorUtils

How to convert to tflite?

Hello, I'm trying to convert your model in TBPP_end2end_predict_GPUonly.ipynb to tflite format for mobile application.
This is my environment

Python           3.6.9
Notebook         5.3.1
NumPy            1.18.5
Pandas           1.0.5
Matplotlib       3.2.2
OpenCV           4.1.2
TensorFlow       2.3.0
Keras            2.4.0
tqdm             4.41.1
imageio          2.4.1

Here's my code after the concatenate end2end model.

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

open("convert.tflite", "wb").write(tflite_model)

And I got this error:

Exception                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py in toco_convert_protos(model_flags_str, toco_flags_str, input_data_str, debug_info_str, enable_mlir_converter)
    198                                                  debug_info_str,
--> 199                                                  enable_mlir_converter)
    200       return model_str

5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/wrap_toco.py in wrapped_toco_convert(model_flags_str, toco_flags_str, input_data_str, debug_info_str, enable_mlir_converter)
     37       debug_info_str,
---> 38       enable_mlir_converter)
     39 

Exception: /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: error: requires element_shape to be 1D tensor during TF Lite transformation pass
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574:0: note: called from
/content/drive/.shortcut-targets-by-id/1U-eNJ9b4Rq8kRh8RQ-t88VQTH1HUMEmA/ssd_detectors/tbpp_layers.py:209:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:302:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:508:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:134:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:600:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: note: see current operation: %1382 = "tf.TensorListReserve"(%249, %1381) {device = ""} : (tensor<i32>, tensor<i32>) -> tensor<!tf.variant<tensor<*xf32>>>
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: error: failed to legalize operation 'tf.TensorListReserve' that was explicitly marked illegal
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574:0: note: called from
/content/drive/.shortcut-targets-by-id/1U-eNJ9b4Rq8kRh8RQ-t88VQTH1HUMEmA/ssd_detectors/tbpp_layers.py:209:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:302:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:508:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:134:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:600:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: note: see current operation: %1382 = "tf.TensorListReserve"(%249, %1381) {device = ""} : (tensor<i32>, tensor<i32>) -> tensor<!tf.variant<tensor<*xf32>>>


During handling of the above exception, another exception occurred:

ConverterError                            Traceback (most recent call last)
<ipython-input-9-ba886ac57a78> in <module>()
      4 # converter.experimental_new_converter = True
      5 # converter.target_spec.supported_ops =[tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
----> 6 tflite_model = converter.convert()
      7 
      8 open("convert.tflite", "wb").write(tflite_model)

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py in convert(self)
    829 
    830     return super(TFLiteKerasModelConverterV2,
--> 831                  self).convert(graph_def, input_tensors, output_tensors)
    832 
    833 

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py in convert(self, graph_def, input_tensors, output_tensors)
    631         input_tensors=input_tensors,
    632         output_tensors=output_tensors,
--> 633         **converter_kwargs)
    634 
    635     calibrate_and_quantize, flags = quant_mode.quantizer_flags(

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py in toco_convert_impl(input_data, input_tensors, output_tensors, enable_mlir_converter, *args, **kwargs)
    572       input_data.SerializeToString(),
    573       debug_info_str=debug_info_str,
--> 574       enable_mlir_converter=enable_mlir_converter)
    575   return data
    576 

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/convert.py in toco_convert_protos(model_flags_str, toco_flags_str, input_data_str, debug_info_str, enable_mlir_converter)
    200       return model_str
    201     except Exception as e:
--> 202       raise ConverterError(str(e))
    203 
    204   if distutils.spawn.find_executable(_toco_from_proto_bin) is None:

ConverterError: /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: error: requires element_shape to be 1D tensor during TF Lite transformation pass
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574:0: note: called from
/content/drive/.shortcut-targets-by-id/1U-eNJ9b4Rq8kRh8RQ-t88VQTH1HUMEmA/ssd_detectors/tbpp_layers.py:209:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:302:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:508:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:134:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:600:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: note: see current operation: %1382 = "tf.TensorListReserve"(%249, %1381) {device = ""} : (tensor<i32>, tensor<i32>) -> tensor<!tf.variant<tensor<*xf32>>>
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: error: failed to legalize operation 'tf.TensorListReserve' that was explicitly marked illegal
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:574:0: note: called from
/content/drive/.shortcut-targets-by-id/1U-eNJ9b4Rq8kRh8RQ-t88VQTH1HUMEmA/ssd_detectors/tbpp_layers.py:209:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:302:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:508:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saving_utils.py:134:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py:600:0: note: called from
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507:0: note: see current operation: %1382 = "tf.TensorListReserve"(%249, %1381) {device = ""} : (tensor<i32>, tensor<i32>) -> tensor<!tf.variant<tensor<*xf32>>>

Hope you can help me solve this problem, thanks a lot!

Is there a train code to train the entire end to end pipeline??

Hi,
I was going through your code and wanted to ask, do we have to train SegLink and CRNN separately on our dataset or is there a code which does both simultaneously.

I also wanted to ask for ICDAR2015FST do I have to specify the bounding boxes as:

xmin, ymin, xmax, ymax

and for ICDAR2015IST the format is:

x1, y1, x2, y2, x3, y3, x4, y4 ( The 4 points of the box in the anti-clockwise direction ).

Are "Hard Negative Mining" implemented in TBPP model?

Textboxes++ training

I'm trying to train my own textboxes++ using the ipython notebook you provided and I noticed an issue:

you use sigmoid (model = TBPP512_dense(softmax=False)) for the background and foreground class confidence instead of using softmax. However, the original SSD training used softmax. And it makes sense since it's not a multi-lable problem. Is there any reason using sigmoid?

TBPP use_multiprocessing=True problem

Hi, I'm university student in Korea.

Cause I wanted to make training process more faster, I changed parameters of fit_generator as

workers=12,
use_multiprocessing=True,

Then I got a problem like below.

As you can see, last batch of first epoch does not work and gives warning message. Is there any solution for this?

train only with VOC

Hey mvoelk,

your work is great!
I try to train DSOD512 with VOC2012.

from data_voc import GTUtility
gt_util_voc = GTUtility('data/VOC2012/')
gt_util_train, gt_util_val = GTUtility.split(gt_util_voc)
.
.
.
with open(checkdir+'/source.py','wb') as f:
source = ''.join(['# In[%i]\n%s\n\n' % (i, In[i]) for i in range(len(In))])
f.write(source.encode())

and get the Error, because 'In' is unknown.
I don't think that the problem is splitting. Before I change the code I got the same issue. Is In the train List?

Could you help me please?

polygon_to_rbox3 average height

I am wondering why the average height is compute as h = (norm(np.cross(dt, tl-br)) + norm(np.cross(dt, tr-bl))) / (2*(norm(dt)+eps)) in polygon_to_rbox3 function? Why didn't just compute like this (norm(tr - br) + norm(tl - bl))/2?

Original code reference: https://github.com/mvoelk/ssd_detectors/blob/master/utils/bboxes.py#L60

No such layer: conv4_3_1

ssd300_coco_weights_fixed, ssd300_voc_weights_fixed, ssd512_coco_weights_fixed ssd512_voc_weights_fixed.hdf5 have two layers labeled as conv4_3_1:b, W but not conv4_3_1.
This triggers error. Does some one know how to fix the issue?

Testing on my own data.

Hi,
How can I use the tbpp(Densenet) with my own data? Specifically, I see inside tbpp_evaluate.ipynb that I need to do a sample_random_batch() from the gt_util_val, that is created from the split() on the pickle file containing - gt_util_synthtext_seglink.pkl. I understand that I need to create a pickle file with my test data, in order to fit into this pipeline. However, in your code I see that the pkl is generated from a .mat file(I understand that is the Matlab format).

So here is where I am stuck, do I need to create my files in a .mat format, so I can sneak them into the pipeline, for evaluation/testing? Can I create a pkl file directly, bypassing the .mat creation?
I do believe it would be quite useful if there was a small writeup about the format that the input data needs to be in, I've looked hard at Ankush Gupta's SynthText repo(https://github.com/ankush-me/SynthText) and am still wrapping my head around the format used. I'm reading Ankush's paper as you view this issue too.

Please clarify,
Thanks,
Krishna

Location of padding operation

Thanks for sharing your code. I have a comment regarding the location of the padding operation in ssd_data.py.

ssd_detectors/ssd_data.py

Line 538 in 3329c2e

img, y = pad_image(img, aspect_ratio, y)

I think it should be done before resizing the input image, not after, as we would like to preserve the aspect ratio of the original image. So, I think this version of the code has no effect on preserving the aspect ratio.

Test TextBoxes++

Hi, I kinda newbie to git and CV here
How can I test the TextBoxes++ with my own images ? I can see SSD_predict.ipynb and SL_predict.ipynb but no TBPP_predict.ipynb in your git. Can you help me ?
Thanks in advance

Target length zero error

Hi, I was trying to use your CRNN code and found an error written below. I have to admit that I am new to RNN but I think my all inputs were labeled correctly with your code.

I am stuck with this issue for a while. this blocks me producing my first crnn model :(

Traceback (most recent call last):
File "crnn_train.py", line 97, in
initial_epoch=0)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\engine\training.py", line 1658, in fit_generator
initial_epoch=initial_epoch)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\engine\training_generator.py", line 215, in fit_generator
class_weight=class_weight)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\engine\training.py", line 1449, in train_on_batch
outputs = self.train_function(ins)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2979, in call
return self._call(inputs)
File "D:\experimental\random_test\testvenv\lib\site-packages\keras\backend\tensorflow_backend.py", line 2937, in _call
fetched = self._callable_fn(*array_vals)
File "D:\experimental\random_test\testvenv\lib\site-packages\tensorflow\python\client\session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Labels length is zero in batch 2
[[{{node ctc/CTCLoss}}]]
(1) Invalid argument: Labels length is zero in batch 2
[[{{node ctc/CTCLoss}}]]
[[training/SGD/gradients/ctc/CTCLoss_grad/mul/_315]]

SL_end2end_predict.ipynb fails on converting to .py with necessary modifications.

Hi, I have ported the SL_end2end_predict.ipynb to a .py file that loads only user images and gets predictions from them.

I am getting an output tensor shape of (1, 5461,18) for a single image. These are the summarized values:
[[ 0.9778447 0.02215527 0.05647869 ... 0.39081636 0.2786811 0.7213189 ] [ 0.9876583 0.01234164 0.17384888 ... 0.3960406 0.4638915 0.53610843] [ 0.9857997 0.01420039 0.1859522 ... 0.35515028 0.5405665 0.4594335 ] ... [ 0.99148715 0.00851282 0.8232149 ... 0.02318294 0.9777579 0.02224212] [ 0.99127495 0.00872501 -0.4267429 ... 0.0168435 0.98193043 0.01806954] [ 0.98890346 0.01109659 -0.55322266 ... 0.02168869 0.97793293 0.02206702]] [[0.9778447 0.02215527] [0.9876583 0.01234164] [0.9857997 0.01420039] ... [0.99148715 0.00851282] [0.99127495 0.00872501] [0.98890346 0.01109659]] [[ 0.05647869 0.25386062 -0.7133617 0.8498816 0.20088424] [ 0.17384888 0.65766203 -0.7351028 0.9332367 0.14610018] [ 0.1859522 1.1206706 -0.6635226 0.9163737 0.17239925] ... [ 0.8232149 -1.1342822 -3.839357 -1.5365617 0.0083948 ] [-0.4267429 -0.5418013 -3.4509156 -1.672849 -0.05473372] [-0.55322266 -0.27144217 -0.29643777 -0.03384027 0.6844612 ]] [[0.9789397 0.02106032 0.9673921 ... 0.03583498 0.9666423 0.03335765] [0.98277885 0.01722118 0.9819172 ... 0.02573826 0.97589934 0.02410063] [0.97954214 0.0204578 0.97924906 ... 0.02678417 0.975162 0.02483795] ... [0.97945476 0.02054522 0.9785146 ... 0.01981203 0.97710925 0.02289074] [0.9828745 0.01712552 0.9792094 ... 0.01742494 0.97953975 0.02046019] [0.9779886 0.02201145 0.9780286 ... 0.02184003 0.9782518 0.0217482 ]] [[0.33596185 0.6640381 0.30375123 ... 0.39081636 0.2786811 0.7213189 ] [0.6212505 0.37874946 0.12344692 ... 0.3960406 0.4638915 0.53610843] [0.53112626 0.46887374 0.17611167 ... 0.35515028 0.5405665 0.4594335 ] ... [0.9766356 0.02336443 0.9767829 ... 0.02318294 0.9777579 0.02224212] [0.98007303 0.019927 0.97990566 ... 0.0168435 0.98193043 0.01806954] [0.9779488 0.02205116 0.9778461 ... 0.02168869 0.97793293 0.02206702]]
The issue is, in sl_utils.py:304 confs = segment_labels[:,1]
Extracts
[0.02215527 0.01234164 0.01420039 ... 0.00851282 0.00872501 0.01109659] which do not look like the confidence values. Is my model output incorrect because of the input image?

My input is:
`for img_path in glob.glob('./examples_images/*'):
img = cv2.imread(img_path)
images_orig.append(np.copy(img))
h, w = image_size
resized_img=()
resized_img = cv2.resize(img, (w,h),resized_img, cv2.INTER_LINEAR)
resized_img = resized_img[:, :, (2,1,0)] / 255 # BGR to RGB
images.append(resized_img)

images = np.asarray(images)

preds = det_model.predict(images, batch_size=1, verbose=1)
`
Attached my python file.
sl_crnn.py.txt

Thank you once again for a very helpful repo. Would appreciate your kind help on this.

sample_random_batch(), where is it?

Hello,
What is the sample_random_batch() function inside TBPP_train.ipynb? Is it part of the dict returned by pickle.load()? I'm sorry if I sound ignorant, I can't seem to find it anywhere, please advise.

Thanks,
Kumar

the metrics are too small

When I train the TBPP model with my own data, the metrics are too small, such as follow:

loss: 17.0426 - conf_loss: 0.0283 - loc_loss: 16.7599 - precision: 4.2830e-04 - recall: 0.0217 - accuracy: 3.8194e-04 - fmeasure: 7.5787e-04 - num_pos: 5828.8542 - num_neg: 3663963.1458

How can I fix it?

[Question] Meaning of log

Hello Sir. @mvoelk
Thank you for your great work

What is the meaning of log

{"neg_seg_conf_loss": 0.3863193988800049, "link_fmeasure": 0.0, "num_neg_seg": 5007.0, "seg_recall": 0.0, "pos_seg_conf_loss": 1.3579927682876587, "link_recall": 0.0, "link_conf_loss": 0.7612245082855225, "seg_fmeasure": 0.0, "pos_link_conf_loss": 1.7857060432434082, "epoch": 0, "lr": 0.0010000000474974513, "seg_accuracy": 0.304583340883255, "seg_precision": 0.0, "seg_conf_loss": 0.629237711429596, "num_pos_seg": 1669.0, "link_precision": 0.0, "link_accuracy": 0.0, "seg_loc_loss": 3.570629596710205, "loss": 4.961091995239258, "iteration": 33, "batch": 33, "time": 213.97210311889648, "neg_link_conf_loss": 0.41973066329956055}

{"neg_seg_conf_loss": 0.24363349378108978, "link_fmeasure": 0.5489721298217773, "num_neg_seg": 5835.0, "seg_recall": 0.35218510031700134, "pos_seg_conf_loss": 1.1648257970809937, "link_recall": 0.38869863748550415, "link_conf_loss": 0.5120393633842468, "seg_fmeasure": 0.48598790168762207, "pos_link_conf_loss": 1.263663411140442, "epoch": 3, "lr": 0.0010000000474974513, "seg_accuracy": 0.39625000953674316, "seg_precision": 0.7837528586387634, "seg_conf_loss": 0.47393155097961426, "num_pos_seg": 1945.0, "link_precision": 0.9341563582420349, "link_accuracy": 0.37833333015441895, "seg_loc_loss": 2.074434757232666, "loss": 3.060405731201172, "iteration": 99318, "batch": 2709, "time": 275985.530739069, "neg_link_conf_loss": 0.26149797439575195}

How can i tell that my model have good accuracy or converge ?

Thank you

Trying TBPP300

Hi, I'm junior engineer in Korea

To lighten the model I wanted to use TBPP300 model and used ssd300 body on TBPP300.

However, I got an assertion error when computing prior boxes.

Could you please let me know why this happens and how to avoid this problem?

[Question] Can you please upload the pickled dataset that you used for training SegLink?

Hello,

Can you please upload the pickled dataset that you used for training SegLink?

It would be great if we can just run the code first and then try to understand the pipeline. I am asking this because I am finding it difficult to understand how the data is prepared for training SegLink. I have trained object detectors before but I think I am missing a step when it comes to training text detectors. So checking your data and playing with it would definitely help me understand better the pipeline.

Thanks in advance.

How to train other dataset like svt in sythtext sl model

i loaded the dataset using below code

from data_svt import GTUtility
gt_util = GTUtility('./data/svt/')
gt_util_train, gt_util_val = gt_util.split(0.7)

then i runned the code

# SegLink + DenseNet
model = DSODSL512()
#model = DSODSL512(activation='leaky_relu')
weights_path = None
batch_size = 6
experiment = 'dsodsl512_synthtext'

if weights_path is not None:
    if weights_path.find('ssd512') > -1:
        layer_list = [
            'conv1_1', 'conv1_2',
            'conv2_1', 'conv2_2',
            'conv3_1', 'conv3_2', 'conv3_3',
            'conv4_1', 'conv4_2', 'conv4_3',
            'conv5_1', 'conv5_2', 'conv5_3',
            'fc6', 'fc7',
            'conv6_1', 'conv6_2',
            'conv7_1', 'conv7_2',
            'conv8_1', 'conv8_2',
            'conv9_1', 'conv9_2',
        ]
        freeze = [
            'conv1_1', 'conv1_2',
            'conv2_1', 'conv2_2',
            'conv3_1', 'conv3_2', 'conv3_3',
            #'conv4_1', 'conv4_2', 'conv4_3',
            #'conv5_1', 'conv5_2', 'conv5_3',
        ]
        
        load_weights(model, weights_path, layer_list)
        for layer in model.layers:
            layer.trainable = not layer.name in freeze
    else:
        load_weights(model, weights_path)

prior_util = PriorUtil(model)

and finally

epochs = 100
initial_epoch = 0

gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=False)
gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=False)

checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
if not os.path.exists(checkdir):
    os.makedirs(checkdir)

with open(checkdir+'/source.py','wb') as f:
    source = ''.join(['# In[%i]\n%s\n\n' % (i, In[i]) for i in range(len(In))])
    f.write(source.encode())

#optim = keras.optimizers.SGD(lr=1e-3, momentum=0.9, decay=0, nesterov=True)
optim = keras.optimizers.Adam(lr=1e-3, beta_1=0.9, beta_2=0.999, epsilon=0.001, decay=0.0)

# weight decay
regularizer = keras.regularizers.l2(5e-4) # None if disabled
#regularizer = None
for l in model.layers:
    if l.__class__.__name__.startswith('Conv'):
        l.kernel_regularizer = regularizer

loss = SegLinkLoss(lambda_offsets=1.0, lambda_links=1.0, neg_pos_ratio=3.0)
#loss = SegLinkFocalLoss()
#loss = SegLinkFocalLoss(lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0)
#loss = SegLinkFocalLoss(gamma_segments=3, gamma_links=3)

model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics)

history = model.fit_generator(
        gen_train.generate(), 
        steps_per_epoch=gen_train.num_batches, 
        epochs=epochs, 
        verbose=1, 
        callbacks=[
            keras.callbacks.ModelCheckpoint(checkdir+'/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True),
            Logger(checkdir),
            #LearningRateDecay()
        ], 
        validation_data=gen_val.generate(), 
        validation_steps=gen_val.num_batches,
        class_weight=None,
        max_queue_size=1, 
        workers=1, 
        #use_multiprocessing=False, 
        initial_epoch=initial_epoch, 
        #pickle_safe=False, # will use threading instead of multiprocessing, which is lighter on memory use but slower
        )

but i get this error

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/100

ValueError Traceback (most recent call last)
in ()
44 class_weight=None,
45 max_queue_size=1,
---> 46 workers=1,
47 #use_multiprocessing=False,
48 #initial_epoch=initial_epoch,

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1416 use_multiprocessing=use_multiprocessing,
1417 shuffle=shuffle,
-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
179 batch_index = 0
180 while steps_done < steps_per_epoch:
--> 181 generator_output = next(output_generator)
182
183 if not hasattr(generator_output, 'len'):

/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py in get(self)
707 "use_multiprocessing=False, workers > 1."
708 "For more information see issue #1638.")
--> 709 six.reraise(*sys.exc_info())

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
691 if value.traceback is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None

/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py in get(self)
683 try:
684 while self.is_running():
--> 685 inputs = self.queue.get(block=True).get()
686 self.queue.task_done()
687 if inputs is not None:

/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
668 return self._value
669 else:
--> 670 raise self._value
671
672 def _set(self, i, obj):

/usr/lib/python3.6/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
117 job, i, func, args, kwds = task
118 try:
--> 119 result = (True, func(*args, **kwds))
120 except Exception as e:
121 if wrap_exception and func is not _helper_reraises_exception:

/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py in next_sample(uid)
624 The next value of generator uid.
625 """
--> 626 return six.next(_SHARED_SEQUENCES[uid])
627
628

/content/drive/My Drive/ssd_detectors_master/ssd_data.py in generate(self, debug, encode, seed)
565 if len(targets) == batch_size:
566 if encode:
--> 567 targets = [self.prior_util.encode(y) for y in targets]
568 targets = np.array(targets, dtype=np.float32)
569 tmp_inputs = np.array(inputs, dtype=np.float32)

/content/drive/My Drive/ssd_detectors_master/ssd_data.py in (.0)
565 if len(targets) == batch_size:
566 if encode:
--> 567 targets = [self.prior_util.encode(y) for y in targets]
568 targets = np.array(targets, dtype=np.float32)
569 tmp_inputs = np.array(inputs, dtype=np.float32)

/content/drive/My Drive/ssd_detectors_master/sl_utils.py in encode(self, gt_data, debug)
138 polygons = []
139 for word in gt_data:
--> 140 xy = np.reshape(word[:8], (-1, 2))
141 xy = np.copy(xy) * (self.image_w, self.image_h)
142 polygons.append(xy)

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in reshape(a, newshape, order)
290 [5, 6]])
291 """
--> 292 return _wrapfunc(a, 'reshape', newshape, order=order)
293
294

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
54 def _wrapfunc(obj, method, *args, **kwds):
55 try:
---> 56 return getattr(obj, method)(*args, **kwds)
57
58 # An AttributeError occurs if the object does not have

ValueError: cannot reshape array of size 5 into shape (2)

and how i modify this model for custom text or object detections dataset i used labelimg to create custom dataset so how can i use this dataset in your model.

Thanking You

How do I train a very lightweight object detector?

Without bias, what do you think are the major differences between your repository and the other repository (https://github.com/pierluigiferrari/ssd_keras)?

For me, I want to train a very lightweight/fast object detection model for recognizing a single solid object e.g. a play station joystick. I tried transfer learning on tensorflow object detection API with SSDLiteMobileNetV2 but it's not fast enough because it was made to be big so that it can predict multiple classes. But I want to predict only one class which is a rigid object that won't deform or change shape at all.

That's why I'm thinking of defining MobileNetV2 to be a bit smaller and training SSD from scratch (as I think it's not possible to reuse the weights from the bigger model) so that I could achieve faster inference on a mobile phone. And maybe later I will convert the model to TF Lite.
For example, I want my model to run fast like this paper: https://arxiv.org/abs/1907.05047

Which repo should I use for easy and efficient implementation?
@mvoelk

Training SegLink model on my data

I was trying to train the DSODSL512 model using my own data, which is in ICDAR-FST2015 data format.
So, when I tried to train the other models (TB, DSOD) using the same GTUtility and InputGenerators, it worked perfectly, but when I tried doing the same on SegLink models, this error was raised:

ValueError: cannot reshape array of size 5 into shape (2)

The full trace of the error is given below. Please look into it.

P.S. I had changed the GTUtility for the ICDAR2015 to reflect my folder structure.

ValueError                                Traceback (most recent call last)
<ipython-input-6-a02fda50a65c> in <module>
     38         workers=1,
     39         #use_multiprocessing=False,
---> 40         initial_epoch=initial_epoch,
     41         #pickle_safe=False, # will use threading instead of multiprocessing, which is lighter on memory use but slower
     42         )

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    179             batch_index = 0
    180             while steps_done < steps_per_epoch:
--> 181                 generator_output = next(output_generator)
    182 
    183                 if not hasattr(generator_output, '__len__'):

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\utils\data_utils.py in get(self)
    707                     "`use_multiprocessing=False, workers > 1`."
    708                     "For more information see issue #1638.")
--> 709             six.reraise(*sys.exc_info())

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\utils\data_utils.py in get(self)
    683         try:
    684             while self.is_running():
--> 685                 inputs = self.queue.get(block=True).get()
    686                 self.queue.task_done()
    687                 if inputs is not None:

e:\installed_programs\anaconda3\envs\keras\lib\multiprocessing\pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

e:\installed_programs\anaconda3\envs\keras\lib\multiprocessing\pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    117         job, i, func, args, kwds = task
    118         try:
--> 119             result = (True, func(*args, **kwds))
    120         except Exception as e:
    121             if wrap_exception and func is not _helper_reraises_exception:

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\keras\utils\data_utils.py in next_sample(uid)
    624         The next value of generator `uid`.
    625     """
--> 626     return six.next(_SHARED_SEQUENCES[uid])
    627 
    628 

F:\SSD Detectors\ssd_detectors\ssd_data.py in generate(self, debug, encode, seed)
    563                 if len(targets) == batch_size:
    564                     if encode:
--> 565                         targets = [self.prior_util.encode(y) for y in targets]
    566                         targets = np.array(targets, dtype=np.float32)
    567                     tmp_inputs = np.array(inputs, dtype=np.float32)

F:\SSD Detectors\ssd_detectors\ssd_data.py in <listcomp>(.0)
    563                 if len(targets) == batch_size:
    564                     if encode:
--> 565                         targets = [self.prior_util.encode(y) for y in targets]
    566                         targets = np.array(targets, dtype=np.float32)
    567                     tmp_inputs = np.array(inputs, dtype=np.float32)

F:\SSD Detectors\ssd_detectors\sl_utils.py in encode(self, gt_data, debug)
    139         polygons = []
    140         for word in gt_data:
--> 141             xy = np.reshape(word[:8], (-1, 2))
    142             xy = np.copy(xy) * (self.image_w, self.image_h)
    143             polygons.append(xy)

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\numpy\core\fromnumeric.py in reshape(a, newshape, order)
    290            [5, 6]])
    291     """
--> 292     return _wrapfunc(a, 'reshape', newshape, order=order)
    293 
    294 

e:\installed_programs\anaconda3\envs\keras\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     54 def _wrapfunc(obj, method, *args, **kwds):
     55     try:
---> 56         return getattr(obj, method)(*args, **kwds)
     57 
     58     # An AttributeError occurs if the object does not have

ValueError: cannot reshape array of size 5 into shape (2)

The script used is:

#!/usr/bin/env python
# coding: utf-8

# In[1]:


import numpy as np
import matplotlib.pyplot as plt
import keras
import os
import time
import pickle

from sl_model import SL512, DSODSL512
from ssd_data import InputGenerator
from sl_utils import PriorUtil
from sl_training import SegLinkLoss, SegLinkFocalLoss

from utils.training import Logger, LearningRateDecay
from utils.model import load_weights, calc_memory_usage


# ### Data



# In[2]:


from data_icdar2015fst import GTUtility
gt_util_train = GTUtility('data/ICDAR_Camera_photos')
gt_util_val = GTUtility('data/ICDAR_Camera_photos', test=True)


# ### Model


# In[3]:


# SegLink + DenseNet
model = DSODSL512()
#model = DSODSL512(activation='leaky_relu')
weights_path = None
batch_size = 6
experiment = 'dsodsl512_synthtext'


# In[4]:


if weights_path is not None:
    if weights_path.find('ssd512') > -1:
        layer_list = [
            'conv1_1', 'conv1_2',
            'conv2_1', 'conv2_2',
            'conv3_1', 'conv3_2', 'conv3_3',
            'conv4_1', 'conv4_2', 'conv4_3',
            'conv5_1', 'conv5_2', 'conv5_3',
            'fc6', 'fc7',
            'conv6_1', 'conv6_2',
            'conv7_1', 'conv7_2',
            'conv8_1', 'conv8_2',
            'conv9_1', 'conv9_2',
        ]
        freeze = [
            'conv1_1', 'conv1_2',
            'conv2_1', 'conv2_2',
            'conv3_1', 'conv3_2', 'conv3_3',
            #'conv4_1', 'conv4_2', 'conv4_3',
            #'conv5_1', 'conv5_2', 'conv5_3',
        ]
        
        load_weights(model, weights_path, layer_list)
        for layer in model.layers:
            layer.trainable = not layer.name in freeze
    else:
        load_weights(model, weights_path)

prior_util = PriorUtil(model)


# ### Training

# In[5]:


epochs = 10
initial_epoch = 0

gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=True)
gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=True)


# In[6]:


checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
if not os.path.exists(checkdir):
    os.makedirs(checkdir)

with open(checkdir+'/source.py','wb') as f:
    source = ''.join(['# In[%i]\n%s\n\n' % (i, In[i]) for i in range(len(In))])
    f.write(source.encode())

#optim = keras.optimizers.SGD(lr=1e-3, momentum=0.9, decay=0, nesterov=True)
optim = keras.optimizers.Adam(lr=1e-3, beta_1=0.9, beta_2=0.999, epsilon=0.001, decay=0.0)

# weight decay
regularizer = keras.regularizers.l2(5e-4) # None if disabled
#regularizer = None
for l in model.layers:
    if l.__class__.__name__.startswith('Conv'):
        l.kernel_regularizer = regularizer

#loss = SegLinkLoss(lambda_offsets=1.0, lambda_links=1.0, neg_pos_ratio=3.0)
loss = SegLinkFocalLoss(lambda_segments=100.0, lambda_offsets=1.0, lambda_links=100.0, gamma_segments=2, gamma_links=2)

model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics)

history = model.fit_generator(
        gen_train.generate(), 
        steps_per_epoch=gen_train.num_batches, 
        epochs=epochs, 
        verbose=1, 
        callbacks=[
            keras.callbacks.ModelCheckpoint(checkdir+'/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True),
            Logger(checkdir),
            #LearningRateDecay()
        ], 
        validation_data=gen_val.generate(), 
        validation_steps=gen_val.num_batches,
        class_weight=None,
        max_queue_size=1, 
        workers=1, 
        #use_multiprocessing=False, 
        initial_epoch=initial_epoch, 
        #pickle_safe=False, # will use threading instead of multiprocessing, which is lighter on memory use but slower
        )

Word cropping causing truncated reads

Hi @mvoelk !
First of all, thanks a lot for your work.

I found a problem using the code from SL_end2end_predict.ipynb. In my specific use case I want to read some long words (actually sequence of numbers). The detector has no problems and it extracts correctly the bounding box (verified plotting the content of boxes). The issue is that this long word is truncated by the function crop_words and so the output of the CRNN model is wrong.

It doesn't seem to me that cropping a long word is a good way to handle the situation. How do you think I can fix this?

Thanks.

While training the precision, recall and fmeasure values are 0 and all the losses are NaN

When I execute the following code:

gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=False)
gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=False)
tmp_inputs, tmp_targets = next(gen_train.generate())

I get the following RuntimeWarning

ssd_detectors/tbpp_utils.py:83: RuntimeWarning: divide by zero encountered in log
  offsets_rboxs[prior_mask,4] = np.log(gt_rboxes[:,4] / priors_wh[:,1]) / variances_wh[:,1]

I get the same warning when I try to train TBPP512 or TBPP512_dense model. Also while training my Precision, recall metrics are 0 and conf_loss and loc_loss are NaN. Is it due to the above warning ? If not then how can I debug? Here are the metrics for 1st epoch:

Epoch 1/100
5/5 [==============================] - 55s 11s/step - loss: nan - conf_loss: nan - loc_loss: nan - precision: 0.0000e+00 - recall: 0.0000e+00 - accuracy: 0.8000 - fmeasure: 0.0000e+00 - num_pos: 43.2000 - num_neg: 611588.8000 - val_loss: nan - val_conf_loss: nan - val_loc_loss: nan - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_accuracy: 1.0000 - val_fmeasure: 0.0000e+00 - val_num_pos: 80.5000 - val_num_neg: 611551.5000

I have used a custom dataset with 1 class only and modified it according to the format required by GTUtility. I've also verified the values in gt_util_train and gt_util_val and they seem to be correct.

How to train CRNN using my own dataset?

I want to train the model on my own data for a specific use case. I have my dataset in ICDAR-FST2015 dataset format, but the thing is, the InputGenerator in crnn_data which is used for training the CRNN model seems to enter an infinite loop and the training part (model.fit_generator()) doesn't show any progress even after hours. Is this normal behavior? Should I change my dataset into another format (like PASCAL-VOC) and try the same thing again?
I have been stuck at this for days now and any suggestions/help will highly be appreciated.
Thanks.
The steps for the training followed:

from data_icdar2015fst import GTUtility
gt_util_train = GTUtility('path')
gt_util_val = GTUtility('path', test=True)

from crnn_utils import alphabet87 as alphabet
input_width = 600
input_height = 800
batch_size = 8
input_shape = (input_width, input_height, 1)

# model, model_pred = CRNN(input_shape, len(alphabet), gru=False)
# experiment = 'crnn_lstm_synthtext'

model, model_pred = CRNN(input_shape, len(alphabet), gru=True)
experiment = 'crnn_gru_synthtext'

max_string_len = model_pred.output_shape[1]

gen_train = InputGenerator(gt_util_train, batch_size, alphabet, input_shape[:2], 
                           grayscale=True, max_string_len=max_string_len)
gen_val = InputGenerator(gt_util_val, batch_size, alphabet, input_shape[:2], 
                         grayscale=True, max_string_len=max_string_len)

checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
if not os.path.exists(checkdir):
    os.makedirs(checkdir)

with open(checkdir+'/source.py','wb') as f:
    source = ''.join(['# In[%i]\n%s\n\n' % (i, In[i]) for i in range(len(In))])
    f.write(source.encode())

optimizer = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
#optimizer = Adam(lr=0.02, epsilon=0.001, clipnorm=1.)

# dummy loss, loss is computed in lambda layer
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=optimizer)

#model.summary()

model.fit_generator(generator=gen_train.generate(), # batch_size here?
                    steps_per_epoch=gt_util_train.num_objects // batch_size,
                    epochs=10,
                    validation_data=gen_val.generate(), # batch_size here?
                    validation_steps=gt_util_val.num_objects // batch_size,
                    verbose=2,
                    callbacks=[
                        ModelCheckpoint(checkdir+'/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True),
                        ModelSnapshot(checkdir, 10000),
                        Logger(checkdir)
                    ],
                    initial_epoch=0)

[Question] How gt_util_synthtext_seglink.pkl is generated?

Hi,

I am trying to experiment with tbpp training program. However, I am a little confused about gt_util_synthtext_seglink.pkl file. How exactly should I generate the file?

I have tried running data_synthtext.py, which generates gt_util_synthtext.pkl. I then simply rename it to gt_util_synthtext_seglink.pkl. However, with this file, when running tbpp training program, I receive exceptions. I am attaching the detailed message below, but I think the problem happens at:

File "Z:\Users\jie\projects\ssd_detectors\tbpp_utils.py", line 21, in
gt_rboxes = np.array([polygon_to_rbox3(np.reshape(p, (-1,2))) for p in gt_data[:,:8]])

p is of size 5, of course, won't be able to reshape to (2).

I suspect that it is because the pickle file is not in the expected format. Could anyone explain to me how exactly the pickle file is generated. Many thanks indeed.

Best Regards
Jie

C:\Python36\python.exe Z:/Users/jie/projects/ssd_detectors/txbb_train.py
Using TensorFlow backend.
layer missing input_2
2018-08-25 09:26:20.701900: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-08-25 09:26:21.011315: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1404] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-08-25 09:26:21.011663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1483] Adding visible gpu devices: 0
2018-08-25 09:26:21.703413: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:964] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-25 09:26:21.703617: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:970] 0
2018-08-25 09:26:21.703747: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:983] 0: N
2018-08-25 09:26:21.703968: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1096] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-08-25 09:26:21.704888: E T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:228] Illegal GPUOptions.experimental.num_dev_to_dev_copy_streams=0 set to 1 instead.
layer missing zero_padding2d_2
something went wrong conv4_3_norm_mbox_conf
model [[3, 5, 512, 28], [28]]
file [(3, 3, 512, 84), (84,)]
Layer weight shape (3, 5, 512, 28) not compatible with provided weight shape (3, 3, 512, 84)
layer missing fc7_mbox_conf
layer missing conv6_2_mbox_conf
layer missing conv7_2_mbox_conf
layer missing conv8_2_mbox_conf
layer missing conv9_2_mbox_conf
layer missing conv10_2_mbox_conf
something went wrong conv4_3_norm_mbox_loc
model [[3, 5, 512, 56], [56]]
file [(3, 3, 512, 16), (16,)]
Layer weight shape (3, 5, 512, 56) not compatible with provided weight shape (3, 3, 512, 16)
layer missing fc7_mbox_loc
layer missing conv6_2_mbox_loc
layer missing conv7_2_mbox_loc
layer missing conv8_2_mbox_loc
layer missing conv9_2_mbox_loc
layer missing conv10_2_mbox_loc
layer missing fc7_mbox_conf_flat
layer missing conv6_2_mbox_conf_flat
layer missing conv7_2_mbox_conf_flat
layer missing conv8_2_mbox_conf_flat
layer missing conv9_2_mbox_conf_flat
layer missing conv10_2_mbox_conf_flat
layer missing fc7_mbox_loc_flat
layer missing conv6_2_mbox_loc_flat
layer missing conv7_2_mbox_loc_flat
layer missing conv8_2_mbox_loc_flat
layer missing conv9_2_mbox_loc_flat
layer missing conv10_2_mbox_loc_flat
layer missing conv4_3_norm_mbox_priorbox
layer missing fc7_mbox_priorbox
layer missing conv6_2_mbox_priorbox
layer missing conv7_2_mbox_priorbox
layer missing conv8_2_mbox_priorbox
layer missing conv9_2_mbox_priorbox
layer missing conv10_2_mbox_priorbox
layer missing mbox_priorbox
Epoch 1/100
Traceback (most recent call last):
File "Z:/Users/jie/projects/ssd_detectors/txbb_train.py", line 89, in
initial_epoch=initial_epoch,
File "C:\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python36\lib\site-packages\keras\engine\training.py", line 1426, in fit_generator
initial_epoch=initial_epoch)
File "C:\Python36\lib\site-packages\keras\engine\training_generator.py", line 155, in fit_generator
generator_output = next(output_generator)
File "C:\Python36\lib\site-packages\keras\utils\data_utils.py", line 793, in get
six.reraise(value.class, value, value.traceback)
File "C:\Python36\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Python36\lib\site-packages\keras\utils\data_utils.py", line 658, in _data_generator_task
generator_output = next(self._generator)
File "Z:\Users\jie\projects\ssd_detectors\ssd_data.py", line 530, in generate
y = self.prior_util.encode(y)
File "Z:\Users\jie\projects\ssd_detectors\tbpp_utils.py", line 21, in encode
gt_rboxes = np.array([polygon_to_rbox3(np.reshape(p, (-1,2))) for p in gt_data[:,:8]])
File "Z:\Users\jie\projects\ssd_detectors\tbpp_utils.py", line 21, in
gt_rboxes = np.array([polygon_to_rbox3(np.reshape(p, (-1,2))) for p in gt_data[:,:8]])
File "C:\Python36\lib\site-packages\numpy\core\fromnumeric.py", line 279, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "C:\Python36\lib\site-packages\numpy\core\fromnumeric.py", line 51, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 5 into shape (2)

Process finished with exit code 1

having problems with OOM

Hello, Mr.Volk
Thank you very much for your nice codes!
I have one question for you

I'm new to deep learning, have only basic understanding about keras codes, and currently trying to run your DSOD_train.py.
Problem is, I keep getting OOM errors while executing the "Train" section of the code (error message below)

I tried to use only one GPU out of two I have, and to use 'allow_growth' option in tensorflow, and neither worked
I believe I need to reduce the size of minibatch(guess your code using batch size 128, am I right?), but I have no idea where to find the code to make this change. (just changing batch_size = 26 to some number lower didn't solve the problem, so I searched your .py files, ended up with no clue)
I'd really appreciate your help on my problem

By the way, I'm using Ubuntu 16.04 and latest tensorflow-keras

------------------------------------------------error message

ResourceExhaustedError Traceback (most recent call last)
in
49 workers=1,
50 #use_multiprocessing=False,
---> 51 initial_epoch=initial_epoch)

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
215 outs = model.train_on_batch(x, y,
216 sample_weight=sample_weight,
--> 217 class_weight=class_weight)
218
219 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1215 ins = x + y + sample_weights
1216 self._make_train_function()
-> 1217 outputs = self.train_function(ins)
1218 return unpack_singleton(outputs)
1219

/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):

/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in call(self, *args, **kwargs)
1456 ret = tf_session.TF_SessionRunCallable(self._session._session,
1457 self._handle, args,
-> 1458 run_metadata_ptr)
1459 if run_metadata:
1460 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[6,1376,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node batch_normalization_302/FusedBatchNorm}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[loss_5/mul/_21899]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

(1) Resource exhausted: OOM when allocating tensor with shape[6,1376,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node batch_normalization_302/FusedBatchNorm}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Endcode-Decode problem

It's such a great implementation. But when I tried to visualize the ground truth images after training 1 epoch in Seglink ( i saw that some ground truth bounding boxes didnt fit the text although i have tested them all before training).

Visualize method: with normalized coordinates of bounding box, I first encoded them, then decoded (used sl_utils). After that, i drew the images.

But that problem doesnt appear in all images but some. So I wonder maybe there were some bug in either encode or decode part. Can you spend some time having a look on it ?

If you need to know anything in detail, just let me know.

Using SL_train.py to train with my own dataset (VOC format)

Hi,
I am using SL_train.ipynb to train with my own VOC format dataset on Windows10. I used LabelImg to label the groundtruth annotation, and used data_voc.py to generate the pickle file.
I've only used 5 images (3 for training, 1 for val, 1 for test). I set the batch size to 1. But the training process kept raising the following InvalidArgumentError after passing through the first image.
Can you help? Thanks.

1/4 [======>.......................] - ETA: 1:29 - loss: 20.9251 - seg_conf_loss: 3.7705 - seg_loc_loss: 10.3475 - link_conf_loss: 6.8071 - num_pos_seg: 28.0000 - num_neg_seg: 84.0000 - pos_seg_conf_loss: 3.3756 - neg_seg_conf_loss: 3.9021 - pos_link_conf_loss: 2.0223 - neg_link_conf_loss: 8.4020 - seg_precision: 0.0000e+00 - seg_recall: 0.0000e+00 - seg_accuracy: 0.0000e+00 - seg_fmeasure: 0.0000e+00 - link_precision: 0.0000e+00 - link_recall: 0.0000e+00 - link_accuracy: 0.0000e+00 - link_fmeasure: 0.0000e+00
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-14-29316809ab04> in <module>()
     46         workers=1,
     47         #use_multiprocessing=False,
---> 48         initial_epoch=initial_epoch,
     49         #pickle_safe=False, # will use threading instead of multiprocessing, which is lighter on memory use but slower
     50         )

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1413             use_multiprocessing=use_multiprocessing,
   1414             shuffle=shuffle,
-> 1415             initial_epoch=initial_epoch)
   1416 
   1417     @interfaces.legacy_generator_methods_support

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    211                 outs = model.train_on_batch(x, y,
    212                                             sample_weight=sample_weight,
--> 213                                             class_weight=class_weight)
    214 
    215                 outs = to_list(outs)

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1213             ins = x + y + sample_weights
   1214         self._make_train_function()
-> 1215         outputs = self.train_function(ins)
   1216         return unpack_singleton(outputs)
   1217 

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
   2664                 return self._legacy_call(inputs)
   2665 
-> 2666             return self._call(inputs)
   2667         else:
   2668             if py_any(is_tensor(x) for x in inputs):

d:\Anaconda3\envs\text_detection\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs)
   2634                                 symbol_vals,
   2635                                 session)
-> 2636         fetched = self._callable_fn(*array_vals)
   2637         return fetched[:len(self.outputs)]
   2638 

d:\Anaconda3\envs\text_detection\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args)
   1452         else:
   1453           return tf_session.TF_DeprecatedSessionRunCallable(
-> 1454               self._session._session, self._handle, args, status, None)
   1455 
   1456     def __del__(self):

d:\Anaconda3\envs\text_detection\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    517             None, None,
    518             compat.as_text(c_api.TF_Message(self.status.status)),
--> 519             c_api.TF_GetCode(self.status.status))
    520     # Delete the underlying status object from memory otherwise it stays alive
    521     # as there is a reference to status from this from the traceback due to

InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
	 [[Node: training_1/Adam/gradients/loss_1/predictions_loss/TopKV2_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _class=["loc:@train...rseToDense"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss_1/predictions_loss/TopKV2:1, training_1/Adam/gradients/loss_1/predictions_loss/TopKV2_grad/stack)]]

Understanding the flow

Hi mvoelk,
I recently saw your code and unable to understand from where to start.

Unable to find main function.

Can you please help.

about environment set (tf version?)

environment set1: (use tf2， follow the environment.ipynb)
OS debian stretch/sid
Python 3.7.4
NumPy 1.17.2
Pandas 1.0.4
Matplotlib 3.2.1
OpenCV 3.4.3
TensorFlow 2.0.0-beta1
Keras 2.2.4-tf
tqdm 4.46.1
imageio 2.6.1

environment set2:
OS debian stretch/sid
Python 3.7.5
NumPy 1.18.0
Pandas 0.25.3
Matplotlib 3.2.1
OpenCV 3.4.3
TensorFlow 1.15.0
Keras 2.2.4-tf
tqdm 4.41.1
imageio 2.8.0

when use set1:
it run wrong in PriorUtil:

Traceback (most recent call last):
File "/mnt/downloads/github_src/ssd_detectors/SSD_predict.py", line 40, in
prior_util = PriorUtil(model)
File "/mnt/downloads/github_src/ssd_detectors/ssd_utils.py", line 353, in init
self.update_priors()
File "/mnt/downloads/github_src/ssd_detectors/ssd_utils.py", line 375, in update_priors
m.compute_priors()
File "/mnt/downloads/github_src/ssd_detectors/ssd_utils.py", line 193, in compute_priors
linx = np.array([(0.5 + i) for i in range(map_w)]) * step_x
TypeError: 'NoneType' object cannot be interpreted as an integer

Traceback (most recent call last):
File "/mnt/downloads/github_src/ssd_detectors/SL_end2end_predict.py", line 41, in
prior_util = PriorUtil(model)
File "/mnt/downloads/github_src/ssd_detectors/sl_utils.py", line 45, in init
if i > 0 and np.all(np.array(previous_map_size) != np.array(map_size)*2):
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

when use set2
SSD_predict and SL_predict
it works well.

but alse print that:

layer missing zero_padding2d_5
file []

what is wrong...

when use set2 run

layer missing reshape_1
file []
something went wrong bidirectional_1
model [[512, 1024], [256, 1024], [1024], [512, 1024], [256, 1024], [1024]]
file [(512, 768), (256, 768), (768,), (512, 768), (256, 768), (768,)]
Layer weight shape (512, 1024) not compatible with provided weight shape (512, 768)
layer missing bidirectional_2
file [(512, 768), (256, 768), (768,), (512, 768), (256, 768), (768,)]
layer missing label_input
file []
layer missing input_length
file []
layer missing label_length
file []
layer missing ctc
file []

Traceback (most recent call last):
File "/mnt/downloads/github_src/ssd_detectors/SL_end2end_predict.py", line 152, in
res_crnn = crnn_model.predict(words)
File "/home/hyj/anaconda3/envs/py37tf15/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 908, in predict
use_multiprocessing=use_multiprocessing)
File "/home/hyj/anaconda3/envs/py37tf15/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 723, in predict
callbacks=callbacks)
File "/home/hyj/anaconda3/envs/py37tf15/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
batch_outs = f(ins_batch)
File "/home/hyj/anaconda3/envs/py37tf15/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in call
run_metadata=self.run_metadata)
File "/home/hyj/anaconda3/envs/py37tf15/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(3, 512), b.shape=(512, 256), m=3, n=256, k=512
[[{{node bidirectional/forward_lstm_1/while/MatMul}}]]
[[softmax/truediv/_209]]
(1) Internal: Blas GEMM launch failed : a.shape=(3, 512), b.shape=(512, 256), m=3, n=256, k=512
[[{{node bidirectional/forward_lstm_1/while/MatMul}}]]
0 successful operations.
0 derived errors ignored.

thank you !

How to use

Hello, sir, thanks for this excellent work. I am a newcomer for text detection and recognition. I followed your instruction from the "Usage" part, but I just cannot run any scripts successfully at all. It seems a lot of datasets need to be pre-installed.
Can you provide some more specific explanations on how to use this code? or a simple example for the user to reproduce the process will also be very helpful. Thanks for your efforts.

[Question] how to set Focal loss Parameters

Hi @mvoelk

I am trying to use focal loss in SSD model, as per the code in DSOD_train.ipynb, lines

loss = SSDFocalLoss(lambda_conf=10000, class_weights=class_weights)
class_weights = np.array([0.00007205, 1.3919328 , 1.43665262, 1.30902077, 1.36668928, 1.2391509 , 1.21337629, 0.41527107, 1.1458096 , 0.29150119, 1.25713104, 0.61941517, 1.03175604, 1.21542005, 1.01947561, 0.0542007 , 1.12664538, 1.14966073, 1.12464889, 1.49998021, 1.09218961])

I want to understand how these parameters were calculated for "lamba_conf" and "class_weights" please confirm if you got a chance. Many thanks indeed

Best Regards
Vaibhav

CRNN output format

Hi thank you once again. As mentioned in #49 I am trying to get back the predicted words (list of characters) + the coordinates of their bounding box using the code from SL_end2end_predict.ipynb on my custom images.

Could you please list how to retrieve this from the prediction returned by the rec_model?

I initialise my CRNN with

rec_model = CRNN((input_width, input_height, 1), len(alphabet), gru=True, prediction_only=True)

Thank you.

[Question] Where to get dataset used for training CRNN (ICDAR2013)?

Hello again,

I just wanted to know where you got the ICDAR2013 dataset to train your CRNN model? I found other datasets here : http://rrc.cvc.uab.es but I can't find ICDAR2013.

In fact, I just want to run your code for training CRNN just to see the workflow and then change the dataset to my own customed one.

Thanks in advance!

mvoelk / ssd_detectors Goto Github PK

ssd_detectors's Issues

When I train the TBPP model with my own data, the metrics are too small, such as follow:

How can I fix it?

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Epoch 1/100

Recommend Projects

Recommend Topics

Recommend Org

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Epoch 1/100