Comments (3)
Hi, I think use feature mean is not a bad idea. However, a more important thing you may notice is that, if the video frame has clear objects. Since VC R-CNN is build on the faster rcnn framework which extracts object features based on the detected/given bounding boxes. If the video frame is not very clear or has few objects, the feature extracted by faster rcnn may be trivial. (Maybe you can add a bounding box which is a whole image size to ensure extract the whole frame feature) .
Another thing is, the distribution of the frame images may be quite different from that of the training samples for pretrained VC RCNN (e.g., MSCOCO).
Does the I3D model you used is a pretrained model, which means you just use it to extract features? Or you need to train the I3D model during training?
from vc-r-cnn.
Hi, Thanks a lot for your reply.
Hi, I think use feature mean is not a bad idea. However, a more important thing you may notice is that, if the video frame has clear objects. Since VC R-CNN is build on the faster rcnn framework which extracts object features based on the detected/given bounding boxes. If the video frame is not very clear or has few objects, the feature extracted by faster rcnn may be trivial. (Maybe you can add a bounding box which is a whole image size to ensure extract the whole frame feature) .
Firstly, I am using YOLOv5 for extracting the bounding boxes from the video frames and then I was feeding the BBox coordinates to VC R-CNN. Do you think, using YOLOv5 is a very bad idea? I opted for YOLO as it's very fast which is specially required for videos which contain huge number of frames.
Also, by adding a bounding box equaling the whole image size, do you mean to add it to all the frames no matter how many objects were detected for that frame or just for those frames where no object or a very few objects were detected?
Another thing is, the distribution of the frame images may be quite different from that of the training samples for pretrained VC RCNN (e.g., MSCOCO).
Do you suggest retraining the whole VC R-CNN architecture in this case for my custom dataset?
Does the I3D model you used is a pretrained model, which means you just use it to extract features? Or you need to train the I3D model during training?
I am not training the I3D model. I am using a pretrained one just for feature extraction from videos.
from vc-r-cnn.
Hi,
- Yolov5 can be ok for extracting the bounding box.
- For the whole image size bounding box, I think 2 options you mentioned can both be ok.
- Yes, if you have data (annotation) to fine-tune the VC RCNN on your own custom dataset, this is the best choice. If you don't have the annotation, you can just use the pretrained VC R-CNN model.
If you have any other questions, feel free to ask me. Thanks.
from vc-r-cnn.
Related Issues (20)
- Up_Down_VC downstream task HOT 4
- The problem in reproducing the results of image captioning HOT 5
- error: identifier "AT_CHECK" is undefined HOT 1
- 36 VC Features per image
- How can I use pretrained VC-R-CNN for inference on a specify image?
- questions about dict Z HOT 1
- Problems with following install.md file while reproducing HOT 1
- nan loss while training HOT 1
- The links to "10-100 VC Features per image" and "10-100 Updown Features per image" are invalid HOT 2
- VC Feature's link dead
- Links to VC feature not working
- Where can I get 'BOUNDINGBOX_FILE'? HOT 1
- some problems
- Hyperparameters for Multi-GPU training HOT 3
- cannot import name '_C' from 'vc_rcnn' HOT 1
- Where "last_checkpoint" should be modified to reflect the absolute path of "model_final.pth"? HOT 1
- Can you kindly provide VC features on VCR dataset? HOT 1
- _C.DIC_FILE not found when run test_net.py HOT 3
- Inference to single image HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vc-r-cnn.