Git Product home page Git Product logo

dicosa's People

Contributors

jpthu17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dicosa's Issues

Strange results occur when reproducing code on a GPU

I'm getting strange results when running the code on an RTX 3090 GPU. I first used the code in CLIP4Clip to compress the video size to 3fps :
https://github.com/ArrowLuo/CLIP4Clip/blob/master/preprocess/compress_video.py
and then froze the clip model by using those code:
for param in self.clip.parameters():
param.requires_grad = False # not update by gradient
the train log on MSRVTT as follows :
[2024-05-12 08:25:31,329 tvr 320 INFO]: eta: 4:50:08 epoch: 2/5 iteration: 3800/7030 time: 1.3135 (5.3897) data: 0.4849 (4.5103) loss: 6.1797 (6.1809) E_loss: 6.1559 (6.1561) M_loss: 0.0250 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:28:30,665 tvr 320 INFO]: eta: 4:44:24 epoch: 2/5 iteration: 3850/7030 time: 1.3637 (5.3663) data: 0.4905 (4.4867) loss: 6.1970 (6.1808) E_loss: 6.1726 (6.1559) M_loss: 0.0248 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:31:26,774 tvr 320 INFO]: eta: 4:38:42 epoch: 2/5 iteration: 3900/7030 time: 1.2943 (5.3427) data: 0.4724 (4.4631) loss: 6.1943 (6.1810) E_loss: 6.1701 (6.1561) M_loss: 0.0245 (0.0248) lr: logit_scale: 100.00max mem: 8443
[2024-05-12 08:31:26,780 tvr 485 INFO]: [start] extract train feature
[2024-05-12 08:35:03,700 tvr 505 INFO]: [finish] extract train feature
[2024-05-12 08:35:03,700 tvr 546 INFO]: [start] extract text+video feature
[2024-05-12 08:35:33,605 tvr 573 INFO]: [finish] extract text+video feature
[2024-05-12 08:35:33,605 tvr 577 INFO]: 1000 1000 1000 1000
[2024-05-12 08:35:33,605 tvr 581 INFO]: [start] calculate the similarity
[2024-05-12 08:35:33,605 tvr 387 INFO]: [finish] map to main gpu
[2024-05-12 08:35:33,609 tvr 401 INFO]: [finish] map to main gpu
[2024-05-12 08:36:08,858 tvr 584 INFO]: [end] calculate the similarity
[2024-05-12 08:36:08,858 tvr 587 INFO]: [start] compute_metrics
[2024-05-12 08:36:08,858 tvr 613 INFO]: sim matrix size: 1000, 1000
[2024-05-12 08:36:08,878 tvr 616 INFO]: Length-T: 1000, Length-V:1000
[2024-05-12 08:36:08,878 tvr 618 INFO]: [end] compute_metrics
[2024-05-12 08:36:08,878 tvr 621 INFO]: time profile: feat 29.9s match 35.25275s metrics 0.01992s
[2024-05-12 08:36:08,878 tvr 623 INFO]: Text-to-Video: R@1: 0.5 - R@5: 1.1 - R@10: 1.4 - R@50: 4.4 - Median R: 798.0 - Mean R: 683.1
[2024-05-12 08:36:08,878 tvr 625 INFO]: Video-to-Text: R@1: 0.6 - R@5: 1.1 - R@10: 1.7 - R@50: 4.6 - Median R: 810.5 - Mean R: 686.7
[2024-05-12 08:36:09,399 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.step3900.2
[2024-05-12 08:36:10,072 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.best.2
Can you give me some suggestions to deal with these problems ? Thanks

Questions about the inference stage

Thank you for sharing such a great job!

You concatenated the latent factors of text and video subspace to calculate similarity through MLP, which means that during the testing phase, we also need to perform this operation on the query and every candidate. Compared to the cosine similarity used in many previous methods, this does not seem to be an efficient approach. I would like to hear your opinion on this issue.

Trainin on one gpu.

Hello,
Thank you for the repo and well done for the project.

I have a question on how and if it's possible to train on a single gpu.

qb_norm issues

Hello,author,I found that qb_norm was used for code inference, but it doesn't seem to be mentioned in the paper?

MSVD checkpoint

Hi,

Congrats on your amazing work! Can you please upload the MSVD checkpoint and steps for inference?

Discrepancy between paper and code regarding attention pooling temperature

Hello,

While going through your paper and code, I noticed a discrepancy regarding the temperature parameter used in attention pooling. In the paper, it's mentioned that the softmax temperature is set to 0.01. However, in the code, the default temperature value appears to be 5, and in practice it seems to be set to 3.

Could you please clarify what the correct value of the temperature should be? It would be greatly appreciated if you could provide an explanation for the differences between these values.

Thanks in advance for your time and assistance.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.