Comments (46)
Yes.
Just to clarify:
Doing the following:
- copy the config of
X.cfg
toY.cfg
- Change the output of last FC
- Change the number of class in detection layer
- call
./flow --model Y.cfg --load X.weights
Will result in: (N-1)
first layer of Y
is loaded from X.weights
, last layer of Y
will be initialized. You can check the status if each layer is loaded or initialized when the program prints a table of layers.
from darkflow.
Correct, notice the first word is also there: The first matching layers are reused. The first mismatch will cause the left of the net to be initialized
from darkflow.
Ok, working on that right now. Could you please point me to where the process script decide to reinitialise a layer when changes are detected?
from darkflow.
A bit complicated:
- A "weight walker" in
./utils/loader.py
is used to load the source weight file. - Then "weight loader" in
./utils/loader.py
cycles through each pair of layer between the source config and the destination config (can be the same config) and yield the weights as long as the pair is identical (by comparinglayer.signature
). If the pair is not identical,None
is yielded. - Then comes thee part
tensorflow
in charge./net/ops/baseop.py
, the layer is wrapped into tensorflow's variables and placholders. If the value of that layer collected from previous step isNone
, it'll be initialized, if not, it'll be used as initial value.
Hope this helps.
from darkflow.
To train I did:
./flow --train --model cfg/v1.1/tiny-yolov1-5c.cfg --load bin/tiny_yolo.weights --annotation <path to my annotations> --dataset <path to my images>
To run I did:
./flow --test <path to my test images> --model cfg/v1.1/tiny-yolov1-5c.cfg --load -1
Interestingly, when I pass in -1 to --load to load the latest check point to both --train and --test option, I got the following output
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 448, 448, 3)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 448, 448, 16)
Load | Yep! | maxp 2x2p0_2 | (?, 224, 224, 16)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 224, 224, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 112, 112, 32)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 112, 112, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 56, 56, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 56, 56, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 28, 28, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 28, 28, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 14, 14, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 14, 14, 512)
Load | Yep! | maxp 2x2p0_2 | (?, 7, 7, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 7, 7, 256)
Load | Yep! | flat | (?, 12544)
Init | Yep! | full 12544 x 735 linear | (?, 735)
-------+--------+----------------------------------+---------------
Failing to load any convolution layers it seems, no wonder it spits out NaN :( It does not do this when I pass in weights file as my --load argument. Seem to tell me that there might be versioning issues with ckpt format. I'm currently using TensorFlow 12.1, if it helps.
tiny-yolov1-5c.cfg was modified from tiny-yolov1.cfg, with changes to [connected] and [detection] posted above.
from darkflow.
Getting some results that makes sense now! Yolo is picking up cars in the dataset, although the bounding box is often drawn with an offset and with the wrong width/height.
from darkflow.
- Please update new commit
- Make sure you are using Python3 and Tensorflow 0.12
- Please make sure you overfit successfully a small dataset (3 ~ 5) images successfully before going any further (for configs with batch-norm, use larger epoch number so that the moving averages are converged)
That will single out many possibilities. Debugging Deep Learning application is not simple.
from darkflow.
Good suggestion on self-driving dataset.
Indeed labels.txt
is in root (./labels.txt
), and the model's name should be different from the default ones. This seems like a bad design, so I am open to your suggestions.
from darkflow.
Well I wonder if it's possible to dynamically construct a FC layer according to the number of classes you have in a labels.txt.
For example, if I want to use yolo_tiny, but for a 5 class dataset rather than a 20class dataset, we could reformat the FC layers to generate appropriate numbers of output.
In darkflow's current form, I would have to modify the yolo_tiny.cfg file, and tell the training script to ignore the FC weights and reinitialize new ones?
from darkflow.
That's a good suggestion too. The current design of darkflow
does not allow doing so, one can modify the source code at ./cfg/process.py
so that while parsing for number of outputs in a .cfg
, it counts the number of line in ./labels.txt
instead. But another number that also affects the last FC's output size is number of boxes, for this you have to look further in the .cfg
file, at [detection]
layer. I personally don't think it is necessary to build this complicated behavior, but you can always customize the source as you like (just that process.py
is a bit messy).
To completely initialize the new net, just leave the --load
option, to load the first identical layers of your new net from, say, yolo-tiny.weights
, point --load
to this file. There will be a table printed out indicating which layer is loaded, which is initialized.
from darkflow.
Just so I understand you fully, when you say "identical layer", I just need to not modify the layers I don't want to change, and darkflow would detect changes in a new cfg file and initialize those variables properly?
from darkflow.
@thtrieu just realized layers don't have IDs, and to introduce change the cfg file, you actually have the change the layer structure. Just wondering if I could avoid that?
The reasons being, I want to swap out FC layers, train them and finetune the entire network with lower learning rate. I want to load the pretrain weights, still train it.
Is it possible to add an extra parameter that specify train=true and reinitilize=random or something of that sorts for each layer?
from darkflow.
Surely you can do that, but it will require source code modification though.
from darkflow.
Reading through weight_loader in loader.py, I'm having a hard time locating the exact line where the signature is compared and rejected. Could you kindly clarify?
In the mean time, I'm planning to not touch the convolution layers at all and swap out the FC and detection layer with the following.
[connected]
output= 735
activation=linear
[detection]
classes=5
coords=4
rescore=1
side=7
num=2
softmax=0
sqrt=1
jitter=.2
object_scale=1
noobject_scale=.5
class_scale=1
coord_scale=5
Will keep you updated on how it works
from darkflow.
-
The comparison is done at line
30
by==
operator being overloaded in the definition of classLayer
-
If I understand you correctly, all you have to do is changing the definition of the last FC layer in
.cfg
like above, and then call for a partial load, no source code need to be modify.
from darkflow.
and the number of classes in the detection layer as you have mentioned before right?
from darkflow.
Training now! To detail what i'm doing, I loaded the CSV annotation file from udacity dataset to produce dumps in the same format you expect in data.py, and in the udacity dataset there are 5 difference classes.
Will keep you posted, and any tips would be much appreciated!
from darkflow.
The bullet points at the end of this post might be helpful https://thtrieu.github.io/notes/Fine-tuning-YOLO-4-classes#hand-picking-good-feature
Besides, I would love to reference your training results/demo on this repo's README. If that's okay, do notify me when you're ready.
from darkflow.
Really good tips. I have the following for sample size at the moment
car: 60788
biker: 1676
truck: 3503
trafficLight: 17253
pedestrian: 9866
Loss converged to 3.0 now. Will run the regular to see if it's reasonable
from darkflow.
@thtrieu the loss shows up as 2.4, but when I perform testing using my test point, the probability produces nan. Just wondering if you have any clue how that could be possible? I'm guessing that nan would've been produced during training as well?
from darkflow.
Can you describe in detail what commands you did to obtain these results. They all seem new to me.
from darkflow.
It is totally okay with the Inits
. The table tells you which layer are loaded from .weights
file, not ckpt
. As long as following this table is the message Loading from ./ckpt/tiny-yolov1-5c-<number>
and Finished in <>second
then you're doing fine.
The strange thing to me is, how can you get any loss value when running a --test
command? Normally a --test
command simply prints that it is forwarding some input images and preprocessing them before termination.
from darkflow.
Ah good to know that it's loading the weights. I don't actually have a NaN loss value, what I'm referring to is NaN matrix it produces when I run a forward pass during the test procedure.
I printed out the result of line 94 in net/flow.py:
out = self.sess.run(self.out, feed_dict)
and out showed up as a NaN matrix, which makes it hard to believe that it would've produced a valid loss during training?
from darkflow.
NaN is not necessarily the probabilities in YOLO's formulation. It can be the coordinate offset, confidence, class, etc. You can always check to see what is the output matrix during training by putting self.out
into fetches
at line 49 of the same file. I suspect these are also NaN matrices and the loss value of 2.4 or 3.0 is result of overflow/underflow.
If the matrices is indeed NaN during training, then there is a scaling problem due to overusing the old weights (N-1 layers are reused with totally different classes of object, and v1.1 is using Batch-Norm with arbitrary large scaling/offset parameters). To check this, try running the model without loading from any .weights
file (full initialization) and see if the NaN problem persists.
from darkflow.
Thanks for the tips.
I'm not sure what you mean by "putting self.out
into fetches", but I did try running the model without loading from any weights via:
./flow --test <path to my test images> --model cfg/v1.1/tiny-yolov1-5c.cfg
And I'm seeing the same NaN matrix coming out of out = self.sess.run(self.out, feed_dict)
from darkflow.
By fetches
, I mean the fetches
in this python code fetched = self.sess.run(fetches, feed_dict)
at line 50 of ./net/flow.py
. You can use fetches
to look at intermediate layers' value.
For e.g.
fetches = [self.train_op, loss_op, self.top.out, self.top.inp.out, self.top.inp.inp.out, self.top.inp.inp.inp.out]
will allow you to fetch the train op (meaning to train the net), loss op (too see the loss), and the last four layers' output matrix. You can certainly use a loop to create this list, the way I did above is just illustrative.
If you were able to print the output of all intermediate layers, then it will be easier to debug your program (to see the NaN problem starts to happen at which layer). I believe this is a problem-specific issue because YOLO models on PASCAL VOC dataset all running fine.
from darkflow.
Used your command to fetch the intermediate layer outputs, and I actually don't see nan output at the last few layers during training, but I do see nan output during testing which starts at self.top.inp.inp.inp.out
(Tensor("BiasAdd_7:0", shape=(?, 7, 7, 256), dtype=float32))
I would expect that if the network is producing NaN results, it would've done so during training as well?
from darkflow.
Found out something really peculiar. I downloaded the tiny-yolo.weights from link referred to by the yolov1 site, and found out the link actually points to tiny-yolov2 weights. This is proven by successful load of the final convolution layer when I use the v2 tiny-yolo.cfg. The NaN starts right at that layer as well, so I'm going to try tracking down the correct tiny-yolov1 weights, and train against it.
from darkflow.
yes, the official site of YOLO is now providing YOLO9000 only. If you want older versions, tell me and I'll upload them.
from darkflow.
If you could upload tiny-yolo-v1, that would be much appreciated.
Just so you know when I try to load yolov1.weights, the walker asserts "Over-read". Not sure if you wish to maintain yolov1 loading anymore, but i thought I would bring that to your attention.
from darkflow.
to be clear, There is v1.0 (without batch-norm), v1.1 (with batch-norm) and v2 (yolo9000). Which one are you referring to?
It might be this
from darkflow.
Just to update you on this, I'm training the weights you provided using v1.1/tiny-yolov1.cfg, with 5 classes modification I made above. The loss is around 2.2, and the output are not really valid. Will try to keep it going for one more day before I give up :)
I had to disable the following assert in line 74 of loader.py to load the tiny-yolov1.weights at all.
if walker.path is not None:
#assert walker.offset == walker.size, \
#'expect {} bytes, found {}'.format(
# walker.offset, walker.size)
print('Successfully identified {} bytes'.format(
walker.offset))
from darkflow.
Training YOLO can be a daunting task, especially for those with limited computational resources. I encourage you to go a little further.
2.2
is a very familiar loss to me, it can tell underfitting or too large learning rate. I suggest going for smaller learning rate to see if there is any progress. If not, then go for a deeper, but much thinner net, see this post if you have not.
from darkflow.
It's odd the training loss for tiny-yolov1.weights is around the same 1.8-2.0 region, yet it actually makes sensible detections.
I do have a GTX 1070, so I'm doing a bit better than running purely on CPU. Will keep you posted tmr.
from darkflow.
make sure you are using Python3, or convert your code to appropriate one because there is a difference between integer/float division between python2 and python3 that can make a consistent mislocation of bounding boxes.
from darkflow.
Yeesh, I'm fairly certain that I'm not using Python 3 at the moment. Will try that. In general, the bounding box seems to be very small, which can be caused by the small bounding box annotations in the Udacity datasets (some times it gets below 5 pixels in width or height).
If that doesn't improve things, I'll move to Python 3.
from darkflow.
It's not converging the right solutions :( the boxes show up at roughly the right place but the sizes are wrong.
I'll put the code up on my fork for anyone to investigate!
from darkflow.
Overfitting did the trick!! Will post my results shortly. Thanks alot for your help.
from darkflow.
@thtrieu, here's my fork for training against the Udacity SDC dataset: https://github.com/y22ma/darkflow/tree/udacity
Udacity employs a different annotation format than PASCAL VOC, and I hacked the dataset.py script to load the udacity annotation using my function. How would you like this to be handled?
from darkflow.
Could you please say more about the theory behind the step 3?
What does the overfitting on a small (3~5) images improve? Shall that small training be started with same parameters as the targeted training over the entire training set?
from darkflow.
Hello there, I am really interested in using this library for training on my own datasets. I have some problems when trying to test few images after training. Could you help me to understand better how it works?
from darkflow.
While testing I have the following output:
Parsing cfg/yolo-voc-1c.cfg
Loading None ...
Finished in 0.00013875961303710938sBuilding net ...
Source | Train? | Layer description | Output size
-------+--------+----------------------------------+---------------
| | input | (?, 416, 416, 3)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 416, 416, 32)
Load | Yep! | maxp 2x2p0_2 | (?, 208, 208, 32)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 208, 208, 64)
Load | Yep! | maxp 2x2p0_2 | (?, 104, 104, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 128)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 104, 104, 64)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 104, 104, 128)
Load | Yep! | maxp 2x2p0_2 | (?, 52, 52, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 256)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 52, 52, 128)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 52, 52, 256)
Load | Yep! | maxp 2x2p0_2 | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 26, 26, 256)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 26, 26, 512)
Load | Yep! | maxp 2x2p0_2 | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 +bnorm leaky | (?, 13, 13, 512)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Load | Yep! | concat [16] | (?, 26, 26, 512)
Load | Yep! | local flatten 2x2 | (?, 13, 13, 2048)
Load | Yep! | concat [26, 24] | (?, 13, 13, 3072)
Init | Yep! | conv 3x3p1_1 +bnorm leaky | (?, 13, 13, 1024)
Init | Yep! | conv 1x1p0_1 linear | (?, 13, 13, 30)
-------+--------+----------------------------------+---------------
Running entirely on CPU
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Loading from ./ckpt/yolo-voc-1c-4000
Finished in 6.8827056884765625sForwarding 3 inputs ...
Total time = 3.51149582862854s / 3 inps = 0.8543367688326967 ips
Post processing 3 inputs ...
Total time = 0.17760968208312988s / 3 inps = 16.890971059763825 ips
but on testing images it detects nothing. Do you have any ideas what's wrong?
from darkflow.
In what format should be annotations - xml or some other formats are acceptable?
from darkflow.
@eugtanchik into $DARKFLOW_ROOT/net/yolov2/test.py to print boxes.probs, make sure your confidence beyond the threshold
from darkflow.
Hi,
I have a csv annotation file and I am using "https://github.com/y22ma/darkflow/tree/udacity ", but I get error: Annotation directory not found ...
please help me.
from darkflow.
E:\Users\ZP\Desktop\Getdata>flow.py --model cfg/yolov2-tiny-voc.cfg --load bin/yolov2-tiny-voc.weights --savepb
Parsing ./cfg/yolov2-tiny-voc.cfg
Parsing cfg/yolov2-tiny-voc.cfg
Loading bin/yolov2-tiny-voc.weights ...
Successfully identified 63102560 bytes
Finished in 0.04497408866882324s
Traceback (most recent call last):
File "E:\Users\ZP\Desktop\Getdata\flow.py", line 6, in
cliHandler(sys.argv)
File "D:\Program Files\Python36\lib\site-packages\darkflow\cli.py", line 26, in cliHandler
tfnet = TFNet(FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\build.py", line 64, in init
self.framework = create_framework(*args)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\framework.py", line 59, in create_framework
return this(meta, FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\framework.py", line 15, in init
self.constructor(meta, FLAGS)
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\yolo_init_.py", line 20, in constructor
misc.labels(meta, FLAGS) #We're not loading from a .pb so we do need to load the labels
File "D:\Program Files\Python36\lib\site-packages\darkflow\net\yolo\misc.py", line 36, in labels
with open(file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'labels.txt'
from darkflow.
Related Issues (20)
- [Documentation] Link to the Android demo is broken
- tiny yolo predict a lot of bounding boxes for one class problem HOT 1
- Result is empty HOT 2
- PermissionError: [Errno 13] Permission denied: './ckpt/checkpoint' HOT 1
- How to put a darkflow model into Android Studio
- Importing to external project
- Accuracy HOT 1
- error in getting started
- Version of yolo from the yolo.cfg file
- Shrikant@ can you help in in regard of this.
- What's wrong with Darkflow?
- can't import darkflow
- Build Error darkflow on 20.04 Ubuntu
- Training own model
- How to convert tensorflow to darknet?
- Convert annotation video mat files to COCO
- Tensorflow update to TF 2.x
- python setup.py build_ext --inplace HOT 3
- pip install -e .
- NMS.pxd not found
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from darkflow.