Git Product home page Git Product logo

funnynet-w's Introduction

FunnyNet-W (Multimodal Learning of Funny Moments in Videos in the Wild)

By Zhi-Song Liu, Robin Courant and Vicky Kalogeiton

We present FunnyNet-W, a versatile and efficient framework for funny moment detection in the video.

BibTex

    @InProceedings{funnynet-w,
        author = {Liu, Zhi-Song and Courant, Robin and Kalogeiton, Vicky},
        title = {FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild},
        booktitle = {International Journal of Computer Vision},
        year = {2024},
        pages={},
        doi={}
    }

Dependencies

Python 3.8 OpenCV library Pytorch 1.12.0 CUDA 11.3

Environment setup

  1. Clone code to your local computer.
git clone https://github.com/Holmes-Alan/FunnyNet-W.git
cd FunnyNet-W
  1. Create working environment.
conda create --name funnynet -y python=3.8
conda activate funnynet
  1. Install the dependencies.
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
  1. Run the setup script to intsall all the dependencies.
./setup.sh
  1. Download friends data:
gdown https://drive.google.com/drive/folders/1ZM6agmEnheiyP0IIrD3Fc7DOubjyu5eO -O ./data --folder

Note: label files are strutured as follow: [season, episode, funny-label, start, end]

The dataset directory is organized as followed:

FunnyNet-data/
└── tv_show_name/
    ├── audio/
    │   ├── diff/              # `.wav` files with stereo channel difference
    │   ├── embedding/         # `.pt` files with audio embedding vectors
    │   ├── laughter/          # `.pickle` files with laughter timecodes
    │   ├── laughter_segment/  # `.wav` files with detected laughters
    │   ├── left/              # `.wav` files with the surround left channel
    │   └── raw/               # `.wav` files with extracted raw audio from videos
    ├── laughter/              # `.pk` files with laughter labels
    ├── sub/                   # `.pk` files with subtitles
    ├── episode/               # `.mkv` files with videos
    ├── audio_split/           # `.wav` files with audio 8 seconds windows
    │   ├── test_8s/
    │   ├── train_8s/
    │   └── validation_8s/
    ├── video_split/           # `.mp4` files with video 8 seconds windows
    │   ├── test_8s/
    │   ├── train_8s/
    │   └── validation_8s/
    └── sub_split/             # `.pk` files with subtitles 8 seconds windows
    |   ├── sub_test_8s.pk
    |   ├── sub_train_8s.pk
    |   └── sub_validation_8s.pk
    └── automatic_sub_split/   # `.pk` files with automatic subtitles 8 seconds windows
        ├── sub_test_8s.pk
        ├── sub_train_8s.pk
        └── sub_validation_8s.pk

Note: we cannot provide audio and video data for obvious copyright issues.

FunnyNet

Data processing

Split audio, subtitles and videos into segments of n seconds (default 8 seconds), and use Whisper to generate automatic subtitles from audio in the wild:

python data_processing/mask_audio.py DATA_DIR/audio/raw DATA_DIR/audio/laughter DATA_DIR/audio/processed
python data_processing/audio_processing.py DATA_DIR/audio/processed DATA_DIR/laughter/xx.pk DATA_DIR/audio_split
python data_processing/sub_processing.py DATA_DIR/sub DATA_DIR/laughter/xx.pk DATA_DIR/sub_split
python data_processing/video_processing.py DATA_DIR/episode DATA_DIR/laughter/xx.pk DATA_DIR/video_split
python data_processing/whisper_extractor.py DATA_DIR/audio_split DATA_DIR/laughter/xx.pk DATA_DIR/automatic_sub_split

Training

  1. Train multimodality with audio, vision and subtitle
python main_audio+vision+sub_videomae_llama friends_path llama2_pts_path

Testing

  1. Download pretrained model from this link, and put it under "./models"
  2. Test multimodality with audio, vision and subtitle
python eval_audio+vision+sub_videomae_llama friends_path llama2_pts_path --model_file models/audio+vision+sub_videomae_llama_whisper.pth

Laughter detection

Please follow our previous work on FunnyNet

Reference

If you find our work useful and interesting to you, please consider citing our papers

    @InProceedings{funnynet,
        author = {Liu, Zhi-Song and Courant, Robin and Kalogeiton, Vicky},
        title = {FunnyNet: Audiovisual Learning of Funny Moments in Videos},
        booktitle = {Asian Conference on Computer Vision (ACCV)},
        year = {2023},
        pages={433-450},
        doi = {10.1007/978-3-031-26316-3_26}
    }
    
    @InProceedings{funnynet-w,
        author = {Liu, Zhi-Song and Courant, Robin and Kalogeiton, Vicky},
        title = {FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild},
        booktitle = {International Journal of Computer Vision},
        year = {2024},
        pages={},
        doi={}
    }

funnynet-w's People

Contributors

holmes-alan avatar

Stargazers

Thanos Delatolas avatar Vicky Kalogeiton avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.