Git Product home page Git Product logo

jomold's Introduction

Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Haoyue Cheng, Zhaoyang Liu, Hang Zhou, Chen Qian, Wayne Wu and Limin Wang

Code for ECCV 2022 paper Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Paper Overview

Modality-specific label noise

The procedure of modality-specific label denoising

The results on LLP dataset

Get Started

Prepare data

  1. Please download the preprocessed audio and visual features from https://github.com/YapengTian/AVVP-ECCV20.
  2. Put the downloaded features into data/feats/.

Train the model

1.Train noise estimator:

python main.py --mode train_noise_estimator --save_model true --model_save_dir ckpt --checkpoint noise_estimater.pt

2.Calculate noise ratios:

python main.py --mode calculate_noise_ratio --model_save_dir ckpt --checkpoint noise_estimater.pt --noise_ratio_file noise_ratios.npz

3.Train model with label denoising:

python main.py --mode train_label_denoising --save_model true --model_save_dir ckpt --checkpoint JoMoLD.pt --noise_ratio_file noise_ratios.npz

Test

We provide the pre-trained JoMoLD checkpoint for evaluation. Please download and put the checkpoint into "./ckpt" directory and use the following command to test:

python main.py --mode test_JoMoLD --model_save_dir ckpt --checkpoint JoMoLD.pt

Citation

If you find this work useful, please consider citing it.

@article{cheng2022joint,
  title={Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing},
  author={Cheng, Haoyue and Liu, Zhaoyang and Zhou, Hang and Qian, Chen and Wu, Wayne and Wang, Limin},
  journal={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}

jomold's People

Contributors

carolinecheng233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jomold's Issues

Questions about eval function and calculate_noise_ratio function

Thanks for your great work! Your code is easy to understand and follow, however, I am confused about some detailed implementations:

  • I notice that your eval function is slightly different with that in ECCV2020-AVVP, they use "output" as predicted weak labels to filter out false positive events:
    o = (output.cpu().detach().numpy() >= 0.5).astype(np.int_)
    Pa = (Pa >= 0.5).astype(np.int_) * np.repeat(o, repeats=10, axis=0)
    Pv = (Pv >= 0.5).astype(np.int_) * np.repeat(o, repeats=10, axis=0)
    while you use "a_prob" and "v_prob" instead:
    oa = (a_prob.cpu().detach().numpy() >= 0.5).astype(np.int_)
    ov = (v_prob.cpu().detach().numpy() >= 0.5).astype(np.int_)
    Pa = (Pa >= 0.5).astype(np.int_) * np.repeat(oa, repeats=10, axis=0)
    Pv = (Pv >= 0.5).astype(np.int_) * np.repeat(ov, repeats=10, axis=0)
    I wonder if this change matters and why you modify this code in this way?

  • The lines 144 and 145 in your main.py, you use “a_prob”, "Pa", "v_prob", "Pv" to calculate noise ratio, I wonder why you use
    a = a * Pa
    v = v * Pv
    instead of directly using "a" and "v"?

  • The calculation of event_nums, I wonder if it is the same if I use
    event_nums[c] += 1
    after line 148: "if label[b][c] != 0:" ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.