Git Product home page Git Product logo

efficient-prompt's People

Contributors

ju-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efficient-prompt's Issues

Questions on calculating the accuracy in validation phase

In val.py file

`

    sim_ensemble = torch.zeros(0).to(device)
    test_num = int(len(similarity)/valEnsemble)
    for enb in range(valEnsemble):
          sim_ensemble = torch.cat([sim_ensemble, similarity[enb*test_num:enb*test_num+test_num].unsqueeze(0)], dim=0) 
    target_final = targets[:test_num]

    sim_final = torch.mean(sim_ensemble, 0)
    values, indices = sim_final.topk(5)
    top1 = (indices[:, 0] == target_final).tolist()
    top5 = ((indices == einops.repeat(target_final, 'b -> b k', k=5)).sum(-1)).tolist()

    top1ACC = np.array(top1).sum() / len(top1)
    top5ACC = np.array(top5).sum() / len(top5)

`

I see you split the validation datasets into $valEnsemble$ parts, and then get average from $valEnsemble$ parts. Finally,
calculate average part accuracy in validation datasets. Not 0:len(similarity). so I want to know what reasons u do this? To my knowledge, isn't it more convincing to calculate accuracy of 0:len(similarity).

I am looking forward your reply!
Thx

Query regarding the testing setup for Action Localization

In the supplementary file, it has been written that for Action Localization , in ActivityNet dataset, each snippet consists of 768 frames ? This is extremely dense feature , because normal benchmarked approaches use 8 or 16 frames ! Is this claim correct ? If yes, can you share this feature atleast if not the code ?
Screenshot from 2021-12-17 01-46-38

about feature extract

Thank you for your excellent work!But I have some problems about the feature extract.
Could you please upload the code about the feature extract? I want to reproduce the work on my own dataset.
Thank a lot !

About the K400 checkpoint

Hello, thank you for releasing the code. I would like to ask if you plan to release the model checkpoint trained on the Kinetics400 dataset.

About K700 openset split

Thanks for your wonderful work~ I want to know the details about your K700 openset split.

Is it right that videos of 400 classes in the original K700 training split are used for openset training, and videos of the other 300 classes in original K700 val splits are used for openset validation?

Or you directly gather all videos of K700 in both training and val list, and pick 400 classes for training and use the left videos of 300 classes for validation?

About CLIP feature extract?

It appears that there might be a discrepancy in the extracted features from Activity Net. The features you obtained from the video __c8enCfzqw are of dimension (4130, 512), while the features in ANet_CLIP for v___c8enCfzqw.npy have dimensions (864, 512). This suggests a difference in processing.
So,i am confused about it. Please tell me how do deal with the Activity Net, thanks!

What is the training epochs for the new parameters

Thanks for your great work and coming code in advance. Pretraining on image and finetuning on video is a standard pipeline. Usually, there are few new parameters introduced in the finetuning process. And, as we all know, the well-trained pre-train model can decrease the finetune epochs. However, this work introduces more new parameters than others(Apologize if I am wrong), though the pre-train model is very strong. I am just wondering does this method need longer epochs?

About few-shot hyperparameters for HMDB51

Hi,

Thank you for your solid work and code releases.
I'm trying to reproduce the results of the few-shot settings, but I'm not able to get the same result for HMDB51 dataset as you mentioned in the paper. I wonder if the hyperparameter setup of HMDB51 differs greatly from UCF101.

Thanks!

Questions on action localization

Hi,

Thank you for sharing your fantastic work. I have a few questions related to the action localization application.

  1. In your work you mentioned that you follow a two-stage pipeline: class-agnostic localization, and then action classification. In the class-agnostic proposal generation step, it is understood that a generic detector is trained from scratch using clip image features (instead of I3D features). Could you please explain if the detector is trained in a class-agnostic way in your implementation? or if the class predictions are just discarded?

  2. In step 1, it is mentioned in the supplementary that you "utilise three parallel prediction heads to determine" the localization. Can you please explain why three heads are used?

  3. In section 3.2 training loss, it is explained that for the localization task, the mean pool of dense features from stage1 proposals is used to obtain v_i. So in the second step in action classification, is the model (prompting) trained for classification? If so, will the training data be original dataset videos sampled at 10 fps and length of 256 frames (following AFSD) and corresponding action class labels?

  4. My last question is, could you provide insights on why the off-the-shelf detector is trained on the clip image features, instead of purely using off-the-shelf detections?

I would really appreciate it if you could provide the answers at the earliest.
Thank you.

Intersection between test and train categories

Thank you for opening the source code, in the zero sample temporal action recognition, I sent the HMDB51 dataset and actually read train_split01.txt and text_split01.txt, which I guess are the training set and the test set, and the categories should not be duplicated as I am in the zero sample task. But why are the action categories duplicated in these two files? Looking forward to your reply.

About K-400 Performance

Hello, thank you for kindly releasing the code!
I'm trying to reproduce the closed-set action recognition results (Table 1), but I fail to get a comparable performance w/ and w/o temporal modeling (actually much worse than the reported results) :(
Can you share the extracted K-400 features for a further check? @ju-chen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.