Git Product home page Git Product logo

keras-yolo3's Introduction

YOLO3 (Detection, Training, and Evaluation)

Dataset and Model

Dataset mAP Demo Config Model
Kangaroo Detection (1 class) (https://github.com/experiencor/kangaroo) 95% https://youtu.be/URO3UDHvoLY check zoo https://bit.ly/39rLNoE
License Plate Detection (European in Romania) (1 class) (https://github.com/RobertLucian/license-plate-dataset) 90% https://youtu.be/HrqzIXFVCRo check zoo https://bit.ly/2tIpvPl
Raccoon Detection (1 class) (https://github.com/experiencor/raccoon_dataset) 98% https://youtu.be/lxLyLIL7OsU check zoo https://bit.ly/39rLNoE
Red Blood Cell Detection (3 classes) (https://github.com/experiencor/BCCD_Dataset) 84% https://imgur.com/a/uJl2lRI check zoo https://bit.ly/39rLNoE
VOC (20 classes) (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) 72% https://youtu.be/0RmOI6hcfBI check zoo https://bit.ly/39rLNoE

Todo list:

  • Yolo3 detection
  • Yolo3 training (warmup and multi-scale)
  • mAP Evaluation
  • Multi-GPU training
  • Evaluation on VOC
  • Evaluation on COCO
  • MobileNet, DenseNet, ResNet, and VGG backends

Installing

To install the dependencies, run

pip install -r requirements.txt

And for the GPU to work, make sure you've got the drivers installed beforehand (CUDA).

It has been tested to work with Python 2.7.13 and 3.5.3.

Detection

Grab the pretrained weights of yolo3 from https://pjreddie.com/media/files/yolov3.weights.

python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg

Training

1. Data preparation

Download the Raccoon dataset from from https://github.com/experiencor/raccoon_dataset.

Organize the dataset into 4 folders:

  • train_image_folder <= the folder that contains the train images.

  • train_annot_folder <= the folder that contains the train annotations in VOC format.

  • valid_image_folder <= the folder that contains the validation images.

  • valid_annot_folder <= the folder that contains the validation annotations in VOC format.

There is a one-to-one correspondence by file name between images and annotations. If the validation set is empty, the training set will be automatically splitted into the training set and validation set using the ratio of 0.8.

Also, if you've got the dataset split into 2 folders such as one for images and the other one for annotations and you need to set a custom size for the validation set, use create_validation_set.sh script to that. The script expects the following parameters in the following order:

./create_validation_set.sh $param1 $param2 $param3 $param4
# 1st param - folder where the images are found
# 2nd param - folder where the annotations are found
# 3rd param - number of random choices (aka the size of the validation set in absolute value)
# 4th param - folder where validation images/annots end up (must have images/annots folders inside the given directory as the 4th param)

2. Edit the configuration file

The configuration file is a json file, which looks like this:

{
    "model" : {
        "min_input_size":       352,
        "max_input_size":       448,
        "anchors":              [10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326],
        "labels":               ["raccoon"]
    },

    "train": {
        "train_image_folder":   "/home/andy/data/raccoon_dataset/images/",
        "train_annot_folder":   "/home/andy/data/raccoon_dataset/anns/",      
          
        "train_times":          10,             # the number of time to cycle through the training set, useful for small datasets
        "pretrained_weights":   "",             # specify the path of the pretrained weights, but it's fine to start from scratch
        "batch_size":           16,             # the number of images to read in each batch
        "learning_rate":        1e-4,           # the base learning rate of the default Adam rate scheduler
        "nb_epoch":             50,             # number of epoches
        "warmup_epochs":        3,              # the number of initial epochs during which the sizes of the 5 boxes in each cell is forced to match the sizes of the 5 anchors, this trick seems to improve precision emperically
        "ignore_thresh":        0.5,
        "gpus":                 "0,1",

        "saved_weights_name":   "raccoon.h5",
        "debug":                true            # turn on/off the line that prints current confidence, position, size, class losses and recall
    },

    "valid": {
        "valid_image_folder":   "",
        "valid_annot_folder":   "",

        "valid_times":          1
    }
}

The labels setting lists the labels to be trained on. Only images, which has labels being listed, are fed to the network. The rest images are simply ignored. By this way, a Dog Detector can easily be trained using VOC or COCO dataset by setting labels to ['dog'].

Download pretrained weights for backend at:

https://bit.ly/39rLNoE

This weights must be put in the root folder of the repository. They are the pretrained weights for the backend only and will be loaded during model creation. The code does not work without this weights.

3. Generate anchors for your dataset (optional)

python gen_anchors.py -c config.json

Copy the generated anchors printed on the terminal to the anchors setting in config.json.

4. Start the training process

python train.py -c config.json

By the end of this process, the code will write the weights of the best model to file best_weights.h5 (or whatever name specified in the setting "saved_weights_name" in the config.json file). The training process stops when the loss on the validation set is not improved in 3 consecutive epoches.

5. Perform detection using trained weights on image, set of images, video, or webcam

python predict.py -c config.json -i /path/to/image/or/video

It carries out detection on the image and write the image with detected bounding boxes to the same folder.

If you wish to change the object threshold or IOU threshold, you can do it by altering obj_thresh and nms_thresh variables. By default, they are set to 0.5 and 0.45 respectively.

Evaluation

python evaluate.py -c config.json

Compute the mAP performance of the model defined in saved_weights_name on the validation dataset defined in valid_image_folder and valid_annot_folder.

keras-yolo3's People

Contributors

clement10601 avatar experiencor avatar robertlucian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-yolo3's Issues

something wrong with the coordinates of bbox

I convert the yolov3.weights to yolov3.h5, and then test the image(dog.jpg). The results look like this,

dog_detected

the darknet test looks like:
predictions

we can find the coordinates of bbox are something wrong, 'y' coordinate seems larger than the darknet test.

I compare the code with the darknet code, it looks all correct. Is there anyone meets the same problem? How can i solve it?

Can it use the model trained from darknet?

Hi,
Thank you for the sharing.
I am wondering if I can use the model trained from darknet?
And is the available pre-trained model trained by raccoon-dataset?

Thank you

how to make my dataset

Can I use labelImg to make my own dataset? Is the xml document made by labelImg same as xml you offered ?

confuse about some code.

There are some code that I can't figure out by myself, please help!

  1. in generator.py file, should the true_box_index be set to zero for every train_instance ?

    true_box_index = 0

  2. in yolo.py file, pred_box_conf - 0 and true_box_wh + tf.zeros_like(true_box_wh) * (1-object_mask) make no difference, I really confused about warmup training, what it is trying to do ?
    please give more explanations.

    conf_delta = pred_box_conf - 0

true_box_wh + tf.zeros_like(true_box_wh) * (1-object_mask),

  1. in YoloLayer there is a parameter named max_grid, when create model you pass the max_input_size to it, which is 480. but inside the create_yolov3_model this value has been multiply by 2 or 4 in different yolo layer. is that right?
    [2*num for num in max_grid],

    [4*num for num in max_grid],

    Thanks a lot !!!

Questions about predict file in Video.

Thank you for your code post. Recently, I trained my own dataset, the results were great in predicting images. But when I tested my video which was recorded by my cellphone, got nothing! The video format I tested was MP4. Is the video resolution much high? Are there some other requirements to video except format??

Issues with training.

Hey man,
This is the error I am getting right now.
Traceback (most recent call last): File "gen_anchors.py", line 132, in <module> _main_(args) File "gen_anchors.py", line 111, in _main_ centroids = run_kmeans(annotation_dims, num_anchors) File "gen_anchors.py", line 57, in run_kmeans indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)] File "gen_anchors.py", line 57, in <listcomp> indices = [random.randrange(ann_dims.shape[0]) for i in range(anchor_num)] File "/home/abraham/anaconda3/envs/yolo/lib/python3.5/random.py", line 195, in randrange raise ValueError("empty range for randrange()") ValueError: empty range for randrange()
Its due to the pkl file for the dataset. How to create pkl for the same?
What is the content for the pkl file?

*Btw its a nice repo ;), you made yolo v3 in such a short notice

Unable to train

Traceback
(most recent call last): File "train.py", line 284, in <module> _main_(args) File "train.py", line 231, in _main_ scales = config['train']['scales'], File "train.py", line 144, in create_model template_model.load_weights(saved_weights_name) File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2647, in load_weights with h5py.File(filepath, mode='r') as f: File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 269, in __init__ fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 99, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 78, in h5py.h5f.open IOError: Unable to open file (file signature not found)

I met the error when I tried to train the raccoon model.
I put both raccoon.h5 and backend.h5 files in the folder.
Thank you!

problem about config path in windows

Can I use your code to train in the windows system?
I tried once ,but I found the config path is not suit for the code,like the path below
image
Can you tell me how to change the path Thanks

As a package?

Could you make this as a package so I can install it with pip, It would be useful on kaggle (which doesn't allow loose files but does allow for pip installs from github)

Keras/tensorflow version requirements?

I'm currently using Tensorflow 1.3 and Keras 2.1.5, and I'm experiencing issues related to LeakyRelu. Could you please confirm what version's are required for this code to run?

Thanks in advance.

OOM when training raccoon dataset

using train.py with only change dataset path, oom after print

Epoch 1/103
('resizing: ', 320, 320)

i am using tf gpu 1.4.1 keras gpu 2.1.5 python 2.7.13 single gtx1080 yolo3_one_file_to_detect_them_all is valid

How did you get the backend.h5?

1.Did your backend.h5 pre-trained on the ImageNet?
2.Can you give me the details of training process about backend.h5?
Thank you so much!

config files in the zoo

Thank you for creating this repo! I have a few questions on training and config files:

The config_kangaroo.json and config_raccoon.json has different parameters, and config_raccoon.json seems more match the code. Which one is the set of parameters to use? Does "scales" mean train on different scales of input images?
What parameter would reproduce the result in README?

For example, only "scale" was used in train.py

train_model, infer_model = create_model(
        nb_class            = len(labels),
        anchors             = config['model']['anchors'], 
        max_box_per_image   = max_box_per_image, 
        max_grid            = [config['model']['max_input_size'], config['model']['max_input_size']], 
        batch_size          = config['train']['batch_size'], 
        warmup_batches      = warmup_batches,
        ignore_thresh       = config['train']['ignore_thresh'],
        multi_gpu           = multi_gpu,
        saved_weights_name  = config['train']['saved_weights_name'],
        lr                  = config['train']['learning_rate'],
        scales              = config['train']['scales'],
    )

config_raccoon.json

"scales": [1,5,10],

config_kangaroo.json

"grid_scales":          [1,1,1],
 "obj_scale":            5,
 "noobj_scale":          1,
"xywh_scale":           1,
 "class_scale":          1,

After run training, I got:

Epoch 00035: loss did not improve from 10.03454
Epoch 00035: reducing learning rate to 1.00000001169e-08.
- 36s - loss: 10.3701 - yolo_layer_1_loss: 1.2003 - yolo_layer_2_loss: 3.5482 -
 yolo_layer_3_loss: 5.6216
Epoch 00035: early stopping
Premature end of JPEG file
kangaroo: 0.7736
mAP: 0.7736

Getting Unknown Layer : YoloLayer on predict.py

When I try to do prediction I am getting this error:

Traceback (most recent call last):
  File "predict.py", line 151, in <module>
    _main_(args)
  File "predict.py", line 34, in _main_
    infer_model = load_model(config['train']['saved_weights_name'])
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\models.py", line 243, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\models.py", line 317, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\layers\__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object
    list(custom_objects.items())))
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\engine\topology.py", line 2514, in from_config
    process_layer(layer_data)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\engine\topology.py", line 2500, in process_layer
    custom_objects=custom_objects)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\layers\__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object
    list(custom_objects.items())))
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\engine\topology.py", line 2514, in from_config
    process_layer(layer_data)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\engine\topology.py", line 2500, in process_layer
    custom_objects=custom_objects)
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\layers\__init__.py", line 55, in deserialize
    printable_module_name='layer')
  File "C:\Users\ankit\AppData\Roaming\Python\Python36\site-packages\keras\utils\generic_utils.py", line 138, in deserialize_keras_object
    ': ' + class_name)
ValueError: Unknown layer: YoloLayer

I tried adding YoloLayer to custom objects as well but then I get:
TypeError: __init__() missing 5 required positional arguments: 'anchors', 'max_grid', 'batch_size', 'warmup_batches', and 'ignore_thresh'

Any clues?

training scales

With the grid scales == [ 1, 1, 1] I notice that the losses obviously scale based on the grid size. I have a few questions related to this.

  1. Do you want the losses to be roughly equal per output head and if so is it fine to adjust the grid_scale accordingly ( ie [1, .5, .25])

  2. What do you find is a good loss now for a model? It used to be that I knew I had a good yolov2 model when the loss was < 0.1. Now I am getting losses around ~20. Possible there are other issues at play here and I have made a mistake somewhere. I am not using your code as is, but instead incorporating the new loss in my own code.

Got the error when training the kangaroo dataset

ValueError:Dimension 0 in both shapes must be equal, but are 1 and 255. Shapes are [1,1,1024,18] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,18], [255,1024,1,1].

I wonder how I can solve it. Thank U

question about evaluate

2018-04-23 11-05-05
What is the " "cache_name": " 's function ? And what do i need to add ? a dir path?or a label name?

question about backend

Thank you very much for your contribution!
I download your code from github, and test on my computer.
When I run "python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg",the problem occured:
Traceback (most recent call last):
File "train.py", line 101, in
main(args)
File "train.py", line 70, in main
anchors = config['model']['anchors'])
x = LeakyReLU(alpha=0.1)(x)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 454, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/advanced_activations.py", line 46, in call
return K.relu(inputs, alpha=self.alpha)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 2933, in relu
x = tf.nn.leaky_relu(x, alpha)
AttributeError: 'module' object has no attribute 'leaky_relu'

I don't know why. I also run yolo2 on my computer, and I can train on my own dataset, but after this error occured, yolo2 can't run,too. And has the same problem. Do you have any idea?

ZeroDivisionError: float division by zero

getting ZeroDivisionError: float division by zero on function bbox_iou while running python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg. Iam not sure if i missing anything

backend

Could you explain how you got the pretrained weights for backend(backend.h5)

Error: Unable to open file (unable to open file: name = 'model.h5'

I run the command

python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg
and it appear the error

OSError: Unable to open file (unable to open file: name = 'model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I don't see the model.hd5 in the solution. Can you update it.

One more thing, in line 400 to 403, the code is

load the weights trained on COCO into the model

#weight_reader = WeightReader(weights_path)
yolov3.load_weights("model.h5")
yolov3.save("backend.h5")

That mean we do not use the input weight for predicting, insted of we use a file "model.h5"

and in line 403 yolov3.save("backend.h5"). Why we must save model to file?

Thank you very much.

Little bug in predict.py

Hi there! Thanks a lot that I successfully trained on my own data using your code.
Yet I found some bugs in predict.py:
Line 109: image_paths = [inp_file for inp_file in image_paths if (inp_file[-4:] == '.jpg' or inp_file == '.png')]
The 'inp_file' should be 'inp_file[-4:]'
And because it didn't include '.JPEG' files, it confused me for a while when I tested on my data which are all .JPEG images (with no outputs), so I changed this line as:
image_paths = [inp_file for inp_file in image_paths if (inp_file[-4:] in ['.jpg', '.png', 'JPEG'])]

Still, I got a warning saying: 'warnings.warn('No training configuration found in save file: '
I don't know why but it does not matter.

Predict: Division by Zero when no object was found

Python 3.6, Win10, GTX 1060

I've trained a model to detect a single class (similar to the examples).
When using the predict script on an image where it doesn't detect any object of said class, the subsequent code for handling bounding boxes runs into a division by zero bbox/bbox_iou() because union becomes zero.
I'm not sure how to fix the underlying issue but was able to work around it.

I've worked around this by naively replacing the return statement with:
return float(intersect) / union if union != 0 else 0.

This get's me further but then also had to add this to bbox/draw_boxes()

if abs(box.xmin) > 1000 or abs(box.xmax) > 1000 or abs(box.ymin) > 1000 or abs(box.ymax) > 1000:
    continue

because the [xy]min/max values are basically invalid values (insanely small/large)

I guess somewhere before all this the boxes should have been discarded to never even reach those code paths but I'm not familiar enough to spot it.

Using weights trained here with Darknet?

Hi, I successfully trained this keras yolo3 network on my own custom dataset and it is giving me very good results, however the network takes up to 2 seconds just to generate predictions on each image.

(This is specifically the time for boxes = get_yolo_boxes.... to run, not any of the extra stuff)

Is it possible to speed up predictions in any way, or even use these weights with the straight C version of Darknet?

Thanks a lot.

Littel mistake in predict file

Hello,
I found a little mistake in predict file, line 109

image_paths = [inp_file for inp_file in image_paths if (inp_file[-4:] == '.jpg' or inp_file == '.png')]

inp_file for png should be inp_file[-4:] . People who have png images will never predict there output.

correct line should be

image_paths = [inp_file for inp_file in image_paths if (inp_file[-4:] == '.jpg' or inp_file[-4:] == '.png')]

best of luck

Getting error while training

`
2018-04-15 16:37:36.355877: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ****************************************************************************************************
2018-04-15 16:37:36.362129: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at conv_ops.cc:677 : Resource exhausted: OOM when allocating tensor with shape[8,1024,13,13] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8,13,13,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: replica_0/model_1/conv_73/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/leaky_72/LeakyRelu/Maximum, conv_73/kernel/read/_2395)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: training/Adam/gradients/replica_0/model_1/bnorm_47/FusedBatchNorm_grad/FusedBatchNormGrad/_5027 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39158...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 251, in
main(args)
File "train.py", line 210, in main
max_queue_size = 8
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 2224, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1883, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2478, in call
**self.session_kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[8,13,13,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: replica_0/model_1/conv_73/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/leaky_72/LeakyRelu/Maximum, conv_73/kernel/read/_2395)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: training/Adam/gradients/replica_0/model_1/bnorm_47/FusedBatchNorm_grad/FusedBatchNormGrad/_5027 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39158...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'replica_0/model_1/conv_73/convolution', defined at:
File "train.py", line 251, in
main(args)
File "train.py", line 190, in main
saved_weights_name = config['train']['saved_weights_name']
File "train.py", line 113, in create_model
train_model = multi_gpu_model(template_model, gpus=multi_gpu)
File "/content/drive/drive/keras-yolo3/utils/multi_gpu_model.py", line 48, in multi_gpu_model
outputs = model(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/topology.py", line 619, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/topology.py", line 2085, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/topology.py", line 2236, in run_internal_graph
output_tensors = _to_list(layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py", line 168, in call
dilation_rate=self.dilation_rate)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 3335, in conv2d
data_format=tf_data_format)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 781, in convolution
return op(input, filter)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 869, in call
return self.conv_op(inp, filter)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 521, in call
return self.call(inp, filter)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 205, in call
name=self.name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 631, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[8,13,13,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: replica_0/model_1/conv_73/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/leaky_72/LeakyRelu/Maximum, conv_73/kernel/read/_2395)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: training/Adam/gradients/replica_0/model_1/bnorm_47/FusedBatchNorm_grad/FusedBatchNormGrad/_5027 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_39158...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

`

error when run “ python yolo3_one_file_to_detect_them_all.py -w yolov3.weights -i dog.jpg ”

loading weights of convolution #1
loading weights of convolution #2
loading weights of convolution #3
loading weights of convolution #4
Traceback (most recent call last):
File "yolo3_one_file_to_detect_them_all.py", line 440, in
main(args)
File "yolo3_one_file_to_detect_them_all.py", line 411, in main
weight_reader.load_weights(yolov3)
File "yolo3_one_file_to_detect_them_all.py", line 67, in load_weights
size = np.prod(norm_layer.get_weights()[0].shape)
AttributeError: 'NoneType' object has no attribute 'get_weights'

using python3.6.1 keras 2.0.3 tf 1.4.1

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="2"

have no clue to debug, thanks for hint

Training was interrupted

1.Today I found my computer was shut down and my training was interrupted.If I restart the training,the project can reread the weight file I trained before?Or the project just start another training again?

I hope you can help me Thankyou

recall and tensorboard issue

  1. the latest training code, tensorboard doesn't show validation loss, only training loss_[1-3] left?
  2. when I trained with my own dataset which contained only one class(person), I got some wrong boxes which contained other things like dog or horse with label person. I think the model detected some objects but cannot figure out they are not what i want. this derive a high recall but low precision. what can i do with this except level-up the threshold? Does this have a relation with the classification loss changing from softmax to cross-entropy?

error in loading pretrained weights

I trained a yolo3 model with backend.h5 for hours and get a new_weights.h5 (739MB), when I tried to continue training by loading new_weights.h5, there is error below, could someone kindly explain to me?

Loading pretrained weights.

Traceback (most recent call last):
File "train.py", line 252, in
main(args)
File "train.py", line 190, in main
saved_weights_name = config['train']['saved_weights_name']
File "train.py", line 108, in create_model
template_model.load_weights(saved_weights_name)
File "/home/hemp/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 2656, in load_weights
f, self.layers, reshape=reshape)
File "/home/hemp/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 3354, in load_weights_from_hdf5_group
str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 1 layers into a model with 147 layers.

missing utils.colors script

First of all,thank you for your post, i'm working on a project, it helped me a lot. But i also met a problem.
while training, bbox.py needs to import get_color from utils.colors, but there is no such a script in utils folder.

About cashe file and labels for training

Thanks for a great work.
I wonder 2 things that I cant understand.

  1. How can i add 3 new labels in config.json the example you show was only raccoons ,but what I want to do is to add 3 labels in one picture so I guess below
    "labels": ["a","b","c"]
    Does it work?

  2. the pkl file is newly added and it want on readme. So can you tell me how to make pkl files?

Multi- GPU Support

Hi Experiencor,

I was waiting for YOLOV3. Thanks for posting.
As you know TF.Estimators are on the way, If possible please add multiple gpu training. Its take too much time to train even on Single GPU.

Not able to train

I m using google colab to run the code but training and saving the weights takes too much time.
Is there any way i can run this code on google colab?

some question about yolov3

For abjectness scores it use logistic regression . what is this advantage ? in the yolo2 what is use for abjectness scores? why this train multi-label classification? what was in the v2? multi-class classification ? what is advantage of this ?

Question about predict

2018-04-23 10-43-09
Firstly,I want to thank you for answering my stupid questions.There is another question:
I used the weights that I trained last night to predict the raccoon, but on the test image ,there is only a boundingbox without label name .Is this right? And how can I get the label name?
2018-04-23 10-50-51
This is my config.json,I do add the label name .

Question about GPU

Hi, I just downloaded the whole code. I wanna run the code on GPU, but dont want to occupy the whole GPU, how to change the GPU occupancy rate?

while detect dog pic, occurs error. what's the version of python?

Hi, while i run python yolo3_one_file_to_detect_them_all.py -w yolo3.weights -i dog.jpg command.i get those error message, can u help me?

Traceback (most recent call last):
  File "yolo3_one_file_to_detect_them_all.py", line 431, in <module>
    _main_(args)
  File "yolo3_one_file_to_detect_them_all.py", line 407, in _main_
    new_image = preprocess_input(image, net_h, net_w)
  File "yolo3_one_file_to_detect_them_all.py", line 270, in preprocess_input
    resized = cv2.resize(image[:,:,::-1]/255., (new_w, new_h))
TypeError: integer argument expected, got float

Getting StopIteration: 'NoneType' object is not subscriptable

count 	[44][5]
loss: 	[44][57.3060036]
loss: 	[44][173.999207]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py", line 578, in get
    inputs = self.queue.get(block=True).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py", line 401, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/content/drive/keras-yolo3/generator.py", line 73, in __getitem__
    img, all_objs = self._aug_image(train_instance, net_h, net_w)
  File "/content/drive/keras-yolo3/generator.py", line 160, in _aug_image
    image = cv2.imread(image_name)[:,:,::-1] # RGB image
TypeError: 'NoneType' object is not subscriptable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 251, in <module>
    _main_(args)
  File "train.py", line 210, in _main_
    max_queue_size   = 4
  File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 2192, in fit_generator
    generator_output = next(output_generator)
  File "/usr/local/lib/python3.6/dist-packages/keras/utils/data_utils.py", line 584, in get
    six.raise_from(StopIteration(e), e)
  File "<string>", line 3, in raise_from
StopIteration: 'NoneType' object is not subscriptable

small bug

def get_yolo_boxes(...) of utils.py

image_h image_w might be different in images which is useful for correct_yolo_boxes

cause error when try using batch to boost evaluate ( but its okay to eval pic one by one)

some problem about training my own data

2018-04-27 21-47-04
As you can seen,There is a error names StopIteration: 'NoneType' object is not subscriptable
2018-04-27 22-00-46

I want to know what is the problem and how to solve it.Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.