Git Product home page Git Product logo

cncvs_data_collector's Introduction

Data collector of CN-CVS

You are on branch Speech.

This repo contains the code for collecting paired audio-video data of CN-CVS, the original paper is:

CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis
Chen Chen, Dong Wang, Thomas Fang Zheng
[Paper] [Web]

You can also use this repo to collect paired audio-video data from any video that contains only one speaker at a time.

Data Collection Pipeline

1. collect metadata and generate json file:

[
    {
        "speaker_name": "",
        "id": "",
        "save_file": "",
        "video_url": "",
        "audio_url": null
    },
    {
        "speaker_name": "",
        "id": "",
        "save_file": "",
        "video_url": "",
        "audio_url": null
    }
]

Each item is a video, video_url and save_file and id is required.

Video will be downloaded from ${video_url} and saved as ${save_file}.mp4, or mkv or other video format.

2. run download and process script

Modify path args in run.sh and then sh run.sh

Branches

There are two branches in this repo.

Speech

This branch is for the processing of speech videos.

News

This branch is for the processing of newscast videos.

Main Difference between two branches

shots

In News our strategy to deal with shots are different.

According to my own experience, videos in "News 30 minutes" consist of a shot from the presenter and a live scene from an outdoor interview. We only need the shot from the presenter so other shots are discarded.

VAD

In News, speed of audio content is much faster, so we use different VAD parameters.

You may need to adjust the parameter when apply to your videos.

Cite

If you find this work useful in your research, please cite the paper:

@INPROCEEDINGS{10095796,
  author={Chen, Chen and Wang, Dong and Zheng, Thomas Fang},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis}, 
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10095796}}

Contact

My email: [email protected] CSLT web: cslt.org

cncvs_data_collector's People

Contributors

sectum1919 avatar

Stargazers

MMXuan avatar Ben avatar  avatar Celestia avatar  avatar  avatar Aizhiqi avatar MaxMa avatar Yunlin Chen avatar gs.ren avatar Cano avatar  avatar haoqiang avatar  avatar Pingchuan Ma avatar

Watchers

Yunlin Chen avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.