Guideline to train against other datasets with different classes,about thtrieu/darkflow

Comments (46)

thtrieu commented on May 19, 2024 3

Yes.
Just to clarify:
Doing the following:

copy the config of X.cfg to Y.cfg
Change the output of last FC
Change the number of class in detection layer
call ./flow --model Y.cfg --load X.weights

Will result in: (N-1) first layer of Y is loaded from X.weights, last layer of Y will be initialized. You can check the status if each layer is loaded or initialized when the program prints a table of layers.

from darkflow.

thtrieu commented on May 19, 2024 1

Correct, notice the first word is also there: The first matching layers are reused. The first mismatch will cause the left of the net to be initialized

from darkflow.

y22ma commented on May 19, 2024 1

Ok, working on that right now. Could you please point me to where the process script decide to reinitialise a layer when changes are detected?

from darkflow.

thtrieu commented on May 19, 2024 1

A bit complicated:

A "weight walker" in ./utils/loader.py is used to load the source weight file.
Then "weight loader" in ./utils/loader.py cycles through each pair of layer between the source config and the destination config (can be the same config) and yield the weights as long as the pair is identical (by comparing layer.signature). If the pair is not identical, None is yielded.
Then comes thee part tensorflow in charge ./net/ops/baseop.py, the layer is wrapped into tensorflow's variables and placholders. If the value of that layer collected from previous step is None, it'll be initialized, if not, it'll be used as initial value.

Hope this helps.

from darkflow.

y22ma commented on May 19, 2024 1

To train I did:
./flow --train --model cfg/v1.1/tiny-yolov1-5c.cfg --load bin/tiny_yolo.weights --annotation <path to my annotations> --dataset <path to my images>

To run I did:
./flow --test <path to my test images> --model cfg/v1.1/tiny-yolov1-5c.cfg --load -1

Interestingly, when I pass in -1 to --load to load the latest check point to both --train and --test option, I got the following output

Source | Train? | Layer description                | Output size
-------+--------+----------------------------------+---------------
       |        | input                            | (?, 448, 448, 3)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 448, 448, 16)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 224, 224, 16)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 224, 224, 32)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 112, 112, 32)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 112, 112, 64)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 56, 56, 64)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 56, 56, 128)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 28, 28, 128)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 28, 28, 256)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 14, 14, 256)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 14, 14, 512)
 Load  |  Yep!  | maxp 2x2p0_2                     | (?, 7, 7, 512)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 7, 7, 1024)
 Init  |  Yep!  | conv 3x3p1_1  +bnorm  leaky      | (?, 7, 7, 256)
 Load  |  Yep!  | flat                             | (?, 12544)
 Init  |  Yep!  | full 12544 x 735  linear         | (?, 735)
-------+--------+----------------------------------+---------------

Failing to load any convolution layers it seems, no wonder it spits out NaN :( It does not do this when I pass in weights file as my --load argument. Seem to tell me that there might be versioning issues with ckpt format. I'm currently using TensorFlow 12.1, if it helps.

tiny-yolov1-5c.cfg was modified from tiny-yolov1.cfg, with changes to [connected] and [detection] posted above.

from darkflow.

y22ma commented on May 19, 2024 1

Getting some results that makes sense now! Yolo is picking up cars in the dataset, although the bounding box is often drawn with an offset and with the wrong width/height.

from darkflow.

thtrieu commented on May 19, 2024 1

Please update new commit
Make sure you are using Python3 and Tensorflow 0.12
Please make sure you overfit successfully a small dataset (3 ~ 5) images successfully before going any further (for configs with batch-norm, use larger epoch number so that the moving averages are converged)

That will single out many possibilities. Debugging Deep Learning application is not simple.

from darkflow.

thtrieu commented on May 19, 2024

Good suggestion on self-driving dataset.

Indeed labels.txt is in root (./labels.txt), and the model's name should be different from the default ones. This seems like a bad design, so I am open to your suggestions.

from darkflow.

y22ma commented on May 19, 2024

Well I wonder if it's possible to dynamically construct a FC layer according to the number of classes you have in a labels.txt.

For example, if I want to use yolo_tiny, but for a 5 class dataset rather than a 20class dataset, we could reformat the FC layers to generate appropriate numbers of output.

In darkflow's current form, I would have to modify the yolo_tiny.cfg file, and tell the training script to ignore the FC weights and reinitialize new ones?

from darkflow.

thtrieu commented on May 19, 2024

That's a good suggestion too. The current design of darkflow does not allow doing so, one can modify the source code at ./cfg/process.py so that while parsing for number of outputs in a .cfg, it counts the number of line in ./labels.txt instead. But another number that also affects the last FC's output size is number of boxes, for this you have to look further in the .cfg file, at [detection] layer. I personally don't think it is necessary to build this complicated behavior, but you can always customize the source as you like (just that process.py is a bit messy).

To completely initialize the new net, just leave the --load option, to load the first identical layers of your new net from, say, yolo-tiny.weights, point --load to this file. There will be a table printed out indicating which layer is loaded, which is initialized.

from darkflow.

y22ma commented on May 19, 2024

Just so I understand you fully, when you say "identical layer", I just need to not modify the layers I don't want to change, and darkflow would detect changes in a new cfg file and initialize those variables properly?

from darkflow.

y22ma commented on May 19, 2024

@thtrieu just realized layers don't have IDs, and to introduce change the cfg file, you actually have the change the layer structure. Just wondering if I could avoid that?

The reasons being, I want to swap out FC layers, train them and finetune the entire network with lower learning rate. I want to load the pretrain weights, still train it.

Is it possible to add an extra parameter that specify train=true and reinitilize=random or something of that sorts for each layer?

from darkflow.

thtrieu commented on May 19, 2024

Surely you can do that, but it will require source code modification though.

from darkflow.

y22ma commented on May 19, 2024

Reading through weight_loader in loader.py, I'm having a hard time locating the exact line where the signature is compared and rejected. Could you kindly clarify?

In the mean time, I'm planning to not touch the convolution layers at all and swap out the FC and detection layer with the following.

[connected]
output= 735
activation=linear

[detection]
classes=5
coords=4
rescore=1
side=7
num=2
softmax=0
sqrt=1
jitter=.2

object_scale=1
noobject_scale=.5
class_scale=1
coord_scale=5

Will keep you updated on how it works

from darkflow.

thtrieu commented on May 19, 2024

The comparison is done at line 30 by == operator being overloaded in the definition of class Layer
If I understand you correctly, all you have to do is changing the definition of the last FC layer in .cfg like above, and then call for a partial load, no source code need to be modify.

from darkflow.

y22ma commented on May 19, 2024

and the number of classes in the detection layer as you have mentioned before right?

from darkflow.

y22ma commented on May 19, 2024

Training now! To detail what i'm doing, I loaded the CSV annotation file from udacity dataset to produce dumps in the same format you expect in data.py, and in the udacity dataset there are 5 difference classes.

Will keep you posted, and any tips would be much appreciated!

from darkflow.

thtrieu commented on May 19, 2024

The bullet points at the end of this post might be helpful https://thtrieu.github.io/notes/Fine-tuning-YOLO-4-classes#hand-picking-good-feature

Besides, I would love to reference your training results/demo on this repo's README. If that's okay, do notify me when you're ready.

from darkflow.

y22ma commented on May 19, 2024

Really good tips. I have the following for sample size at the moment

car: 60788
biker: 1676
truck: 3503
trafficLight: 17253
pedestrian: 9866

Loss converged to 3.0 now. Will run the regular to see if it's reasonable

from darkflow.

y22ma commented on May 19, 2024

@thtrieu the loss shows up as 2.4, but when I perform testing using my test point, the probability produces nan. Just wondering if you have any clue how that could be possible? I'm guessing that nan would've been produced during training as well?

from darkflow.

thtrieu commented on May 19, 2024

Can you describe in detail what commands you did to obtain these results. They all seem new to me.

from darkflow.

thtrieu commented on May 19, 2024

It is totally okay with the Inits. The table tells you which layer are loaded from .weights file, not ckpt. As long as following this table is the message Loading from ./ckpt/tiny-yolov1-5c-<number> and Finished in <>second then you're doing fine.

The strange thing to me is, how can you get any loss value when running a --test command? Normally a --test command simply prints that it is forwarding some input images and preprocessing them before termination.

from darkflow.

y22ma commented on May 19, 2024

Ah good to know that it's loading the weights. I don't actually have a NaN loss value, what I'm referring to is NaN matrix it produces when I run a forward pass during the test procedure.

I printed out the result of line 94 in net/flow.py:
out = self.sess.run(self.out, feed_dict)

and out showed up as a NaN matrix, which makes it hard to believe that it would've produced a valid loss during training?

from darkflow.

thtrieu commented on May 19, 2024

NaN is not necessarily the probabilities in YOLO's formulation. It can be the coordinate offset, confidence, class, etc. You can always check to see what is the output matrix during training by putting self.out into fetches at line 49 of the same file. I suspect these are also NaN matrices and the loss value of 2.4 or 3.0 is result of overflow/underflow.

If the matrices is indeed NaN during training, then there is a scaling problem due to overusing the old weights (N-1 layers are reused with totally different classes of object, and v1.1 is using Batch-Norm with arbitrary large scaling/offset parameters). To check this, try running the model without loading from any .weights file (full initialization) and see if the NaN problem persists.

from darkflow.

y22ma commented on May 19, 2024

Thanks for the tips.

I'm not sure what you mean by "putting self.out into fetches", but I did try running the model without loading from any weights via:
./flow --test <path to my test images> --model cfg/v1.1/tiny-yolov1-5c.cfg

And I'm seeing the same NaN matrix coming out of out = self.sess.run(self.out, feed_dict)

from darkflow.

thtrieu commented on May 19, 2024

By fetches, I mean the fetches in this python code fetched = self.sess.run(fetches, feed_dict) at line 50 of ./net/flow.py. You can use fetches to look at intermediate layers' value.
For e.g.

fetches = [self.train_op, loss_op, self.top.out, self.top.inp.out, self.top.inp.inp.out, self.top.inp.inp.inp.out]

will allow you to fetch the train op (meaning to train the net), loss op (too see the loss), and the last four layers' output matrix. You can certainly use a loop to create this list, the way I did above is just illustrative.

If you were able to print the output of all intermediate layers, then it will be easier to debug your program (to see the NaN problem starts to happen at which layer). I believe this is a problem-specific issue because YOLO models on PASCAL VOC dataset all running fine.

from darkflow.

y22ma commented on May 19, 2024

Used your command to fetch the intermediate layer outputs, and I actually don't see nan output at the last few layers during training, but I do see nan output during testing which starts at self.top.inp.inp.inp.out (Tensor("BiasAdd_7:0", shape=(?, 7, 7, 256), dtype=float32))

I would expect that if the network is producing NaN results, it would've done so during training as well?

from darkflow.

y22ma commented on May 19, 2024

Found out something really peculiar. I downloaded the tiny-yolo.weights from link referred to by the yolov1 site, and found out the link actually points to tiny-yolov2 weights. This is proven by successful load of the final convolution layer when I use the v2 tiny-yolo.cfg. The NaN starts right at that layer as well, so I'm going to try tracking down the correct tiny-yolov1 weights, and train against it.

from darkflow.

thtrieu commented on May 19, 2024

yes, the official site of YOLO is now providing YOLO9000 only. If you want older versions, tell me and I'll upload them.

from darkflow.

y22ma commented on May 19, 2024

If you could upload tiny-yolo-v1, that would be much appreciated.

Just so you know when I try to load yolov1.weights, the walker asserts "Over-read". Not sure if you wish to maintain yolov1 loading anymore, but i thought I would bring that to your attention.

from darkflow.

thtrieu commented on May 19, 2024

to be clear, There is v1.0 (without batch-norm), v1.1 (with batch-norm) and v2 (yolo9000). Which one are you referring to?

It might be this

from darkflow.

y22ma commented on May 19, 2024

Just to update you on this, I'm training the weights you provided using v1.1/tiny-yolov1.cfg, with 5 classes modification I made above. The loss is around 2.2, and the output are not really valid. Will try to keep it going for one more day before I give up :)

I had to disable the following assert in line 74 of loader.py to load the tiny-yolov1.weights at all.

        if walker.path is not None:
            #assert walker.offset == walker.size, \
            #'expect {} bytes, found {}'.format(
            #    walker.offset, walker.size)
            print('Successfully identified {} bytes'.format(
                walker.offset))

from darkflow.

thtrieu commented on May 19, 2024

Training YOLO can be a daunting task, especially for those with limited computational resources. I encourage you to go a little further.

2.2 is a very familiar loss to me, it can tell underfitting or too large learning rate. I suggest going for smaller learning rate to see if there is any progress. If not, then go for a deeper, but much thinner net, see this post if you have not.

from darkflow.

y22ma commented on May 19, 2024

It's odd the training loss for tiny-yolov1.weights is around the same 1.8-2.0 region, yet it actually makes sensible detections.

I do have a GTX 1070, so I'm doing a bit better than running purely on CPU. Will keep you posted tmr.

from darkflow.

thtrieu commented on May 19, 2024

make sure you are using Python3, or convert your code to appropriate one because there is a difference between integer/float division between python2 and python3 that can make a consistent mislocation of bounding boxes.

from darkflow.

y22ma commented on May 19, 2024

Yeesh, I'm fairly certain that I'm not using Python 3 at the moment. Will try that. In general, the bounding box seems to be very small, which can be caused by the small bounding box annotations in the Udacity datasets (some times it gets below 5 pixels in width or height).

If that doesn't improve things, I'll move to Python 3.

from darkflow.

y22ma commented on May 19, 2024

It's not converging the right solutions :( the boxes show up at roughly the right place but the sizes are wrong.

I'll put the code up on my fork for anyone to investigate!

from darkflow.

y22ma commented on May 19, 2024

Overfitting did the trick!! Will post my results shortly. Thanks alot for your help.

from darkflow.

y22ma commented on May 19, 2024

@thtrieu, here's my fork for training against the Udacity SDC dataset: https://github.com/y22ma/darkflow/tree/udacity

Udacity employs a different annotation format than PASCAL VOC, and I hacked the dataset.py script to load the udacity annotation using my function. How would you like this to be handled?

from darkflow.

humbledprogrammer commented on May 19, 2024

Could you please say more about the theory behind the step 3?
What does the overfitting on a small (3~5) images improve? Shall that small training be started with same parameters as the targeted training over the entire training set?

from darkflow.

eugtanchik commented on May 19, 2024

Hello there, I am really interested in using this library for training on my own datasets. I have some problems when trying to test few images after training. Could you help me to understand better how it works?

from darkflow.

eugtanchik commented on May 19, 2024

While testing I have the following output:

Parsing cfg/yolo-voc-1c.cfg
Loading None ...
Finished in 0.00013875961303710938s

Building net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 416, 416, 3)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 32)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 128)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 104, 104, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 256)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 52, 52, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Load | Yep! | concat [16] | (?, 26, 26, 512)
Load | Yep! | local flatten 2x2 | (?, 13, 13, 2048)
Load | Yep! | concat [26, 24] | (?, 13, 13, 3072)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 30)
-------+--------+----------------------------------+---------------
Running entirely on CPU
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Loading from ./ckpt/yolo-voc-1c-4000
Finished in 6.8827056884765625s

Forwarding 3 inputs ...
Total time = 3.51149582862854s / 3 inps = 0.8543367688326967 ips
Post processing 3 inputs ...
Total time = 0.17760968208312988s / 3 inps = 16.890971059763825 ips

but on testing images it detects nothing. Do you have any ideas what's wrong?

from darkflow.

eugtanchik commented on May 19, 2024

In what format should be annotations - xml or some other formats are acceptable?

from darkflow.

longchuanshu commented on May 19, 2024

@eugtanchik into $DARKFLOW_ROOT/net/yolov2/test.py to print boxes.probs, make sure your confidence beyond the threshold

from darkflow.

maryam1369 commented on May 19, 2024

Hi,
I have a csv annotation file and I am using "https://github.com/y22ma/darkflow/tree/udacity ", but I get error: Annotation directory not found ...
please help me.

from darkflow.

JoffreyN commented on May 19, 2024

E:\Users\ZP\Desktop\Getdata>flow.py --model cfg/yolov2-tiny-voc.cfg --load bin/yolov2-tiny-voc.weights --savepb

Parsing ./cfg/yolov2-tiny-voc.cfg
Parsing cfg/yolov2-tiny-voc.cfg
Loading bin/yolov2-tiny-voc.weights ...
Successfully identified 63102560 bytes
Finished in 0.04497408866882324s
Traceback (most recent call last):
File "E:\Users\ZP\Desktop\Getdata\flow.py", line 6, in
cliHandler(sys.argv)
File "D:\Program Files\Python36\lib\site-packages\darkflow\cli.py", line 26, in cliHandler
tfnet = TFNet(FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\build.py", line 64, in init
self.framework = create_framework(*args)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\framework.py", line 59, in create_framework
return this(meta, FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\framework.py", line 15, in init
self.constructor(meta, FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\yolo_init_.py", line 20, in constructor
misc.labels(meta, FLAGS) #We're not loading from a .pb so we do need to load the labels
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\yolo\misc.py", line 36, in labels
with open(file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'labels.txt'

from darkflow.

Guideline to train against other datasets with different classes about darkflow HOT 46 CLOSED

Comments (46)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent