Comments (17)
@libzzluo of course. My update to @eriklindernoren 's yolo v3 is available here:
https://github.com/ultralytics/xview-yolov3
I'm not sure how usable it is to you out of the box however, as I've significantly modified it to not just fix the convergence issue but also to adapt it to the xView 2018 Object Detection Challenge (rather than COCO).
This repo converges on training with the xView dataset, and was able to produce a 0.16 mAP in the challenge (highest mAP was .27). I attached a picture of the losses, precision and recall during training here. I'm going to be branching this repo into a new COCO-specific repo in the coming days, I'll post that when complete.
from pytorch-yolov3.
Can you please give more details about what still needs to be done for training to work ? I am currently working on your training code and would like to implement what is missing.
from pytorch-yolov3.
HI @eriklindernoren , I'm trying to train yolov3 for a small dataset, 1.3K images which is significantly different from COCO. I intend to use the darknet53.conv.74 pretrained weights provided by pjreddie repo.
Since I have a small dataset, I figured training using the darknet53.conv.74 weights is the best. Now darknet53.conv.74 has weights upto line line 549 in yolo-obj.cfg (I followed the instructions for training on custom images).
I did the following.
- Loaded the weights at random using model.apply(weights_init_normal)
- Loaded and overrode the weights till conv_73 using model.load_weights(opt.weights_path) where weights_path pointed to darknet53.conv.74.
This way I had pretrained weights upto conv73 and weights initialized randomly for the rest of the layers (3 YOLO). - Now, I train all the layers keeping the lr as the default one given for training in your code.
My training stdout shows the conf drop to 0.01 at the end of 30 epochs and the detect.py doesn't detect any object even with a low conf threshold.
Am I training it the correct way? Or do I need to keep the lr = 0 for the pretrained weights upto conv73 and train the remaining layers. Also do I need to change the method to save the weights using state_dict() as suggested in some of the Issues. (#45 )
from pytorch-yolov3.
Hi,
How long did you train the model? There is still work to be done on the training support. Data augmentation as well as weight and learning rate decays needs to be added as well as some other stuff.
from pytorch-yolov3.
The confidence mask needs to be fixed. I will probably get around to fixing that pretty soon. The issue I keep running into is that the recall and precision seems to increase as I train the model but during inference (using model.eval()
) the model outputs junk. Maybe this can be attributed to the way the network is trained w.r.t. the confidence loss at the moment (or the fact that I simply don't train it long enough) but I'm not sure. I'll keep experimenting. Feel free to do the same if you want. :)
from pytorch-yolov3.
I see a similar dissociation between good training results but poor inference later on.
I don't see an obvious culprit yet. One small change I noticed is that the loss criterion for xy coordinates should be MSE rather than BCE.
I think the selection of the best anchor in build_targets
has a bug in it also, as it relies on iou's based on a common top-left corner. I fixed this and vectorized these operations in my cloned repo, and see slightly different results. I could submit a pull request if you'd like to take a look.
The code is very clean and concise though, nice work.
from pytorch-yolov3.
@glenn-jocher Hi.. Would you mind sharing your code? I met the same problem, but i can't fix it. Thanks
from pytorch-yolov3.
THANKS!!! @glenn-jocher That's amazing!!! I will try the code on my repo.😊😊
from pytorch-yolov3.
Ah @feidongxi @libzzluo yes you can try and fork this right now, but it might be easier if you wait a day or two, as I'm going to backtrack all the COCO -> xView specific changes I made into a new repo so that it's back to the COCO facing implementation Erik originally created. In any case, I'll post that when its done.
@eriklindernoren I could submit a pull request at that point if you'd like to try to merge these changes, but first of course I need to verify that COCO trains to a proper mAP, which will take more time, perhaps a week.
from pytorch-yolov3.
@glenn-jocher That sounds great. Haven't had much time to work on this lately. Appreciate it!
from pytorch-yolov3.
@eriklindernoren got it. What happened when you tried to train originally?
I've validated your .58 mAP in my forked repo after realigning it to COCO (using Redmon's weights), and I've tentatively trained a few epochs from scratch with good convergence, but I'm realizing it will be a significant challenge replicating Redmon's mAP after 160 epochs simply due to a few missing details in his paper, such as his polynomial learning rate scheduler, his multi-scale training, and his augmentation strategies, which are touched on but not explicitly described anywhere I know of.
Some or all of these may be in the darknet repo, do you have any info on them? If not I can try and take my best stab at it and see where the mAP lands. Theoretically my update is capable of all these things, including full augmentation, so its just a matter of figuring out what settings to use.
It's also going to take longer than I thought. My 1080 Ti appears good for about 16 epochs per day (120k images per epoch), so I imagine ~10 days to get to 160 epochs. Is this close to what you saw on your end?
from pytorch-yolov3.
@glenn-jocher
I check the output log file built by origin darknet framework, the network randomly change its input size between 320 and 608 (the step is 32) every 10 epochs. What's more, if i set the random = 1
in the origin framework's yolov3.cfg
, the GPU (1080) memory (8G) will be insufficient sometimes.
I will also try to analyze the details in the source code and read the paper. If anything is discovered, I will add it in time. But the C source code is sooo complicated...
from pytorch-yolov3.
@libzzluo hmm ok thanks for the info. So we have the multi-scale information now. That was one of my missing links. I'm surprised it's every 10 epochs, that only leaves room for 16 different input sizes over 160 epochs. Is random = 1
set by default? Yes the C is hard to decipher, I have not looked at it yet.
The full list of unknowns I had is:
- multi-scale training i.e.
img_size = random.choice(range(10, 20)) * 32
- polynomial learning rate scheduler (for use with SGD)
- image colorspace augmentation (currently I have +/- 50% on the SV channels of HSV. This seems a bit excessive to my eye...)
- image spatial augmentation (currently I have +/- 20% translation and zoom, random left-right flips, and 10 deg rotation. Rotation is my own addition, can be set to 0. Labels augmented along with image.)
from pytorch-yolov3.
I created examples with and without augmentation to illustrate the results. Both spatial and colorspace augmentation are active here. I toned down the rotation to +/- 5 deg and added random shear of +/- 3 deg (both of these can be disabled).
Augmented
Standard
from pytorch-yolov3.
I've finished creating my yolov3 repository:
https://github.com/ultralytics/yolov3
I've started training COCO using this repo, including the augmentation shown above. I'm running about 15 epochs per day so it will take a while to get to 160.
One concern I have, besides my stated unknowns above is that I see significant improvement in precision and recall using CELoss in place of BCELoss for the classification term. This is contrary to the original yolov3 loss function, which uses BCE for both classification and objectness, and MSE on the bounding boxes. This could be an indicator of underlying problems elsewhere. Unfortunately I suppose I have to wait another week until I reach epoch 160 to see what the final effect of this change is. For now, current progress shown here:
from pytorch-yolov3.
@glenn-jocher That's great. Could you make a PR with your additions?
from pytorch-yolov3.
Yes, I believe there is an issue with Darknet.save_weights
(see #89). I have changed to saving and loading state dicts in master and I now I get the model to converge, as well as preserved performance when saving and loading the model.
from pytorch-yolov3.
Related Issues (20)
- Replace own `Mish` implementation by torch's HOT 5
- Crash after processing JSON to RDF HOT 2
- Need help with getting bboxes.
- very low mAP on coco val2014 when training from scratch HOT 18
- how to set custom path for labels? please help me HOT 6
- how to adjust learning rates or something to be better when using a pretrained model to train my dataset?please give me some advice HOT 5
- i'm using yolov3.weights to test data/sample/, i found there are some wrong bboxes which are different from this projects HOT 8
- Unable to fully load data classes of classes.names HOT 1
- Questions regarding the evaluation model in train.py and test.py HOT 1
- PyTorch-YOLOv3 Tracking
- is woring in 3090,lower fps! HOT 2
- for Linux?or for windows? HOT 6
- mAP nearly zero? HOT 3
- Testing Suite
- 关于pytorchyolo1.8.0和GPU不兼容的问题 HOT 1
- 怎么把pytorchyolo卸载干净呀? HOT 2
- how can i calculate the params and flops of yolov3-tiny model 如何计算 yolov3-tiny 模型的参数和计算量?
- Would support torchv2.2.2?
- tensorboard没有显示的结果? HOT 1
- How to compute FPS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-yolov3.