Git Product home page Git Product logo

Comments (6)

YunYang1994 avatar YunYang1994 commented on May 21, 2024

As it said in the original paper:

Each bounding box consists of 5 predictions: x, y, w, h, and confidence. The (x, y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image.

Thanks for your distinctive idea, but I can't agree with you. As we know, round operation contains cell and floor operation. If you feed neural network an ambiguous operation, how can you expect the network to learn easier ? In that case, I think it will be rather more difficult to make the network converge.

from tensorflow-yolov3.

ardianumam avatar ardianumam commented on May 21, 2024

Oh, I see. But, I mean, the consistency is about which grid we assign for the object label. A good common sense is: (i) assigning to the grid which has biggest IoU to the box GT, right? See another illustration here, if we use floor, both case (1) and (2) will be assigned in the grid (0,0) for the object area, meanwhile, box GT in case (1) and (2) have a distance almost one grid. If we use round, case (1) and (2) will be assigned in the grid (1,1) and (0,0), repectively, just like the common sense I mention in (i) before. So, we can think that consistency is more about assigning label to the one which gives the least distance between box_center and grid_center. In this case, round will give less distance compared to floor. We can also think that by performing round and floor operation, we will lose some information from the 0.x number we remove. Using round, the maximum number we lose is 0.5, while, using floor, it can be 0.99....

Anyway, do you ever run this code to train MS COCO dataset from the scratch (not using pre-training network)? I wonder how many days are needed. I'm currectly still running this code to train MS COCO from the scratch. Now is in epoc 4 (3 days), it looks converged, but seems it needs a lot of epoch, i.e., a lot of days (in the original paper is stated using 160 epoch).

from tensorflow-yolov3.

YunYang1994 avatar YunYang1994 commented on May 21, 2024

Oh, I see. But, I mean, the consistency is about which grid we assign for the object label. A good common sense is: (i) assigning to the grid which has biggest IoU to the box GT, right? See another illustration here, if we use floor, both case (1) and (2) will be assigned in the grid (0,0) for the object area, meanwhile, box GT in case (1) and (2) have a distance almost one grid. If we use round, case (1) and (2) will be assigned in the grid (1,1) and (0,0), repectively, just like the common sense I mention in (i) before. So, we can think that consistency is more about assigning label to the one which gives the least distance between box_center and grid_center. In this case, round will give less distance compared to floor. We can also think that by performing round and floor operation, we will lose some information from the 0.x number we remove. Using round, the maximum number we lose is 0.5, while, using floor, it can be 0.99....

Anyway, do you ever run this code to train MS COCO dataset from the scratch (not using pre-training network)? I wonder how many days are needed. I'm currectly still running this code to train MS COCO from the scratch. Now is in epoc 4 (3 days), it looks converged, but seems it needs a lot of epoch, i.e., a lot of days (in the original paper is stated using 160 epoch).

Oh, I got it. Your idea is very impressive ! But, since YOLO is a regression problem, which means the regression objection must be certain. So we need a particular grid cell location to be regressor.

For training MS COCO dataset from scratch, I have not done it yet. I will appreciate it very much if you would have shared your result with us.

from tensorflow-yolov3.

ardianumam avatar ardianumam commented on May 21, 2024

Yes sure, later I can share the result. Now is still in epochs 6, the recall is still 0.0x, but the precision is already high, almost one.

Btw, do you know why running this training code constantly increases the used RAM memory? Keeping running the code will encouter running out of (ram) memory issue and eventually kills the process.

from tensorflow-yolov3.

ardianumam avatar ardianumam commented on May 21, 2024

Good news
I wanna share the cause of memory increase in train.py code (currently you already delete it in this repository). The root cause is in this code part:

_, _, _, summary = sess.run([tf.assign(rec_tensor, rec),
                            tf.assign(prec_tensor, prec),
                            tf.assign(mAP_tensor, mAP), write_op], feed_dict={is_training:True})

Putting tf.assign operation inside the training loop will create new additional graph repeatedly. So, I change those three tf.assign by using placeholder, and do feed_dict to them using rec, prec and mAP. The training time is also faster afterward.

from tensorflow-yolov3.

dodogoffy avatar dodogoffy commented on May 21, 2024

Good news
I wanna share the cause of memory increase in train.py code (currently you already delete it in this repository). The root cause is in this code part:

_, _, _, summary = sess.run([tf.assign(rec_tensor, rec),
                            tf.assign(prec_tensor, prec),
                            tf.assign(mAP_tensor, mAP), write_op], feed_dict={is_training:True})

Putting tf.assign operation inside the training loop will create new additional graph repeatedly. So, I change those three tf.assign by using placeholder, and do feed_dict to them using rec, prec and mAP. The training time is also faster afterward.

Can you share the code ?? Thanks a lot!

from tensorflow-yolov3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.