Git Product home page Git Product logo

psp_cvpr_2021's Introduction

PyTorch implementation of the CVPR-2021 paper: Positive Sample Propagation along the Audio-Visual Event Line

Audio-Visual Event (AVE) Localization task

AVE localization aims to find out those video segments containing an audio-visual event and classify its category. An audio-visual event is both audible and visible, which means the sound source must appear in visual image (visible) while the sound it makes also exists in audio portion (audible).

AVE localization

Our Framework

framework

Data preparation

The AVE dataset and the extracted audio and visual features can be downloaded from https://github.com/YapengTian/AVE-ECCV18. Other preprocessed files used in this repository can be downloaded from here. All the required data are listed as below, and these files should be placed into the data folder.


audio_feature.h5  visual_feature.h5  audio_feature_noisy.h5 visual_feature_noisy.h5
right_label.h5  prob_label.h5  labels_noisy.h5  mil_labels.h5
train_order.h5  val_order.h5  test_order.h5

Fully supervised setting

  • Train:

CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --model_name PSP --threshold=0.099 --train

  • Test:

CUDA_VISIBLE_DEVICES=0 python fully_supervised_main.py --model_name PSP --threshold=0.099 --trained_model_path ./model/PSP_fully.pt

Weakly supervised setting

  • Train:

CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --model_name PSP --threshold=0.095 --train

  • Test:

CUDA_VISIBLE_DEVICES=0 python weakly_supervised_main.py --model_name PSP --threshold=0.095 --trained_model_path ./model/PSP_weakly.pt

Note: The pre-trained models can be downloaded here and they should be placed into the model folder. If you would like to train from scratch for the both settings, you may make some adjustments to further improve the performance (e.g., try another threshold value, choose a different initialization method and so on).

Citation

If our paper is useful for your research, please consider citing it:

@InProceedings{zhou2021positive,
    title={Positive Sample Propagation along the Audio-Visual Event Line},
    author={Zhou, Jinxing and Zheng, Liang and Zhong, Yiran and Hao, Shijie and Wang, Meng},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2021},
}

Acknowledgements

This code began with YapengTian/AVE-ECCV18. Thanks for their great work. We also hope our source code can help people who are interested in our work or the audio-visual related problems. If you have any questions about our paper or the codes, please feel free to open an issue or contact us by email.

psp_cvpr_2021's People

Contributors

dependabot[bot] avatar jasongief avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

psp_cvpr_2021's Issues

Why the right_labels.h5 does not contain all the video labels?

Hello, I would like to ask why only the first category of video classification (Church bell) labels per second are given in right_labels.h5, only the first 188 rows show whether there is an event per second, and the values of all subsequent rows It is all 0. Logically, it should contain all the labels of all videos for 10 seconds. If an event occurs, it will be 1, otherwise is background, is 0.
I am very sorry to bother you all the time, and I am very grateful for your contribution.
1

Some questions about the paper

I also read the original paper of 'AVE'. I don't understand why you say them tries to automatically filter out unpaired samples. How to understand the meaning of 'filter out'.
图片

Why do some Linear layers have bias set to False?

Hi,

Thanks for your code and congrats on the paper accept. I was going through your code and your model definition. I have 2 enquiries:

  1. Why do some Linear layers have bias set to False?
  2. You init linear layers with Xavier Uniform initialization. Does initialization with other methods influence your result?

Thanks!

The question of result

Hi, nice work!

I was very sorry to bother you. When I used your code, there were two results(val and test), I had read your paper, but I did not see the description of result. So, which result did you use in your paper? val or test dataset?

Thank you very much!

weakly_model.py

Hi, I think "v_dim=hidden_dim" should be changed to "v_dim=v_dim" on line 197.

training time and environment

Dear authors,

Could you please kindly let me know how many GPUs you used for training the model? and how long will it take? Do you set the early stop or specify the overall training epochs?

Thanks a lot for your help!

data

Dear authors,

Thanks for opening this work!
I have downloaded the data as:
data/AVE_Dataset
data/audio_feature.h5 prob_label.h5 right_labels.h5 visual_feature.h5

Looks like it miss the other files e.g. audio_feature_noisy.h5 visual_feature_noisy.h5

Besides, could you please let me know the environment you use?

what information is prob_label.h5 generated based on

Hello, I would like to ask what information is prob_label generated based on? At first I thought it corresponds to the information in annotation.txt, because they all have 4143 lines, but I found that actually prob_label does not correspond to annotation, because the maximum probability of displaying the second line in prob_label is the background.
.

Train from scratch

Good work! When I train from scratch for the fully supervised setting, the AVE localization accuracy can achieve 75.8% or 75.2% with different seeds. Any suggestions?

How to generate attention map?

Can you provide the code to generate the attention map? I tried to run "attention_visualization" of AVE-ECCV18, and use “PSP_fully.pt” to replace the model of the original code, but an error "Nonetype object has no attribute register_forward_hook" appeared, I know it was caused by the replacement of the model, but I don’t know how to solve.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.