pb_sed: Paderborn Sound Event Detection

This repository provides the source code for our 1-st rank solution for DCASE 2022 Challenge Task 4, which advanced from our 3-rd rank and 4-th rank solutions for the DCASE 2020 Challenge Task 4 and DCASE 2021 Challenge Task 4, respectively.

This repository also provides our final strongly pseudo-labeled datasets

without using external data: allow to train a CRNN ensemble achieving >55% PSDS1, >82% PSDS2 and >65% collar-based F1-score on the public evaluation set (when using FBCRNN ensemble for tagging and PSDS2, tag-conditioned BiCRNN ensemble for collar-based F1-score and unconditioned BiCRNN ensemble for PSDS1).
with using external data: allow to train a CRNN ensemble achieving >58% PSDS1, >86% PSDS2 and >70% collar-based F1-score on the public evaluation set (when using FBCRNN ensemble for tagging and PSDS2, tag-conditioned BiCRNN ensemble for collar-based F1-score and unconditioned BiCRNN ensemble for PSDS1).

If you are using our system or our pseudo labels please consider citing our papers:

[1] J.Ebbers and R. Haeb-Umbach, "Pre-Training and Self-Training for Sound Event Detection in Domestic Environments", Technical Report for Challenge on Detection and Classification of Acoustic Scenes and Events 2022,

[2] J.Ebbers and R. Haeb-Umbach, "Self-Trained Audio Tagging and Sound Event Detection in Domestic Environments", in Proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2021,

[3] J.Ebbers and R. Haeb-Umbach, "Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection", in Proc. Workshop on Detection and Classification of Acoustic Scenes and Events 2020,

Installation

Install requirements:

$ pip install --user git+https://github.com/fgnt/padertorch.git@47bf5b1850cf0f2ab8e9cb69f43181c61bd78f12
$ pip install --user git+https://github.com/fgnt/paderbox.git@809b27251c478f1997d2720b89fe455aac23234e
$ pip install --user git+https://github.com/fgnt/lazy_dataset.git@c31ea9837bce58569d77525f6cb5e3c043b8a4e9
$ pip install --user git+https://github.com/fgnt/sed_scores_eval.git@a922e0a4692957d56b307a2eec942422ab22b55a

Clone the repository:

$ git clone https://github.com/fgnt/pb_sed.git

Install package:

$ pip install --user -e pb_sed

Database

DESED

Install requirements:

$ pip install --user git+https://github.com/turpaultn/DESED@af3a5d5be9213239f42cf1c72f538e8058d8d2e4

Download the database by running

$ python -m pb_sed.database.desed.download -db /path/to/desed

yielding the following database structure:

├── audio
│   ├── eval
│   │   └── public
│   │       └── <clip_id>.wav
│   ├── train
│   │   ├── strong
│   │   │   └── <clip_id>.wav
│   │   ├── synthetic20
│   │   │   └── <clip_id>.wav
│   │   ├── synthetic21
│   │   │   └── <clip_id>.wav
│   │   ├── unlabel_in_domain
│   │   │   └── <clip_id>.wav
│   │   └── weak
│   │       └── <clip_id>.wav
│   └── validation
│       └── validation
│           └── <clip_id>.wav
├── metadata
│   ├── eval
│   │   └── public.tsv
│   ├── train
│   │   ├── strong.tsv
│   │   ├── synthetic20.tsv
│   │   ├── synthetic21.tsv
│   │   ├── unlabel_in_domain.tsv
│   │   └── weak.tsv
│   └── validation
│       └── validation.tsv
└── missing_files
    ├── missing_files_strong.tsv
    ├── missing_files_unlabel_in_domain.tsv
    ├── missing_files_validation.tsv
    └── missing_files_weak.tsv

Follow the description in https://github.com/turpaultn/DESED to request missing files and copy them to the corresponding audio directories.

Run

$ python -m pb_sed.database.desed.create_json -db /path/to/desed

to create the json files /path/to/pb_sed/jsons/desed.json, /path/to/pb_sed/jsons/desed_pseudo_labeled_without_external.json and /path/to/pb_sed/jsons/desed_pseudo_labeled_with_external.json (describing the database).

AudioSet

To download the whole AudioSet run

$ python -m pb_sed.database.audioset.download -db /path/to/audioset

yielding the following database structure:

├── audio
│   ├── balanced_train
│   │   └── <clip_id>.wav
│   ├── eval
│   │   └── <clip_id>.wav
│   └── unbalanced_train
│       └── <clip_id>.wav
├── audioset_eval_strong.tsv
├── audioset_train_strong.tsv
├── balanced_train_segments.csv
├── class_labels_indices.csv
├── eval_segments.csv
├── mid_to_display_name.tsv
└── unbalanced_train_segments.csv

Note, that this can take multiple days as AudioSet is huge. You may prefer to setup above database structure with symlinks towards your existing AudioSet download.

Run

$ python -m pb_sed.database.audioset.create_json -db /path/to/audioset

to create the json file /path/to/pb_sed/jsons/audioset.json (describing the database).

Experiments

Forward-Backward CRNN (FBCRNN)

To train an FBCRNN from scratch, run

$ python -m pb_sed.experiments.weak_label_crnn.training

Each training stores checkpoints and metadata (incl. a tensorboard event file) in a directory /path/to/storage_root/weak_label_crnn/desed/training/<group_timestamp>/<model_timestamp>. By default, /path/to/storage_root is /path/to/pb_sed/exp but can be changed by setting an environment variable

$ export STORAGE_ROOT=/path/to/custom/storage_root

To train a second model and add it to an existing group (ensemble), run

$ python -m pb_sed.experiments.weak_label_crnn.training with group_name=<group_timestamp>

To train on our provided pseudo labeled data, add data_provider.json_path=/path/to/pb_sed/jsons/desed_pseudo_labeled_{with,without}_external.json and data_provider.train_set.train_unlabel_in_domain=2 to the command, e.g.:

$ python -m pb_sed.experiments.weak_label_crnn.training with data_provider.json_path=/path/to/pb_sed/jsons/desed_pseudo_labeled_with_external.json data_provider.train_set.train_unlabel_in_domain=2

Add external_data=False to the commands to exclude external data from FBCRNN training.

For hyper-parameter tuning, run

$ python -m pb_sed.experiments.weak_label_crnn.tuning with group_dir=/path/to/storage_root/weak_label_crnn/desed/training/<group_timestamp>

which saves hyper-parameters in a directory /path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp>.

For evaluation on the public evaluation set, run

$ python -m pb_sed.experiments.weak_label_crnn.inference with hyper_params_dir=/path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp>

Bidirectional CRNN (requiring strong labels)

To train an unconditioned bidirectional CRNN (BiCRNN) with our provided strong pseudo labels (with external data), run

$ python -m pb_sed.experiments.strong_label_crnn.training with weak_label_crnn_hyper_params_dir=/path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp>

Each training stores checkpoints and metadata (incl. a tensorboard event file) in a directory /path/to/storage_root/strong_label_crnn/desed/training/<group_timestamp>/<model_timestamp>.

To train a second model and add it to an existing group (ensemble), run

$ python -m pb_sed.experiments.strong_label_crnn.training with weak_label_crnn_hyper_params_dir=/path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp> group_name=<group_timestamp>

To train tag-conditioned BiCRNNs instead add trainer.model.tag_conditioning=True to the commands.

Add external_data=False to the commands to exclude external data from BiCRNN training and to use pseudo labels obtained without external data.

For hyper-parameter tuning, run

$ python -m pb_sed.experiments.strong_label_crnn.tuning with strong_label_crnn_group_dir=/path/to/storage_root/strong_label_crnn/desed/training/<group_timestamp> weak_label_crnn_hyper_params_dir=/path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp>

which saves hyper-parameters in a directory /path/to/storage_root/strong_label_crnn/desed/hyper_params/<timestamp>.

For evaluation on the public evaluation set, run

$ python -m pb_sed.experiments.strong_label_crnn.inference with strong_label_crnn/desed/hyper_params_dir=/path/to/storage_root/strong_label_crnn/desed/hyper_params/<timestamp>

AudioSet Pre-training

To pre-train a deeper and wider FBCRNN on AudioSet (excluding DESED validation clips), run

$ python -m pb_sed.experiments.weak_label_crnn.training with database=audioset net_config=deep m=2 filter_desed_test_clips=True

To train an FBCRNN from the pretrained model (with some frozen layers), run

$ python -m pb_sed.experiments.weak_label_crnn.training with net_config=deep m=2 init_ckpt_path=/path/to/storage_root/weak_label_crnn/audioset/training/<group_timestamp>/<model_timestamp> frozen_cnn_2d_layers=18 frozen_cnn_1d_layers=1

To train an unconditioned BiCRNN from the pretrained model (with some frozen layers), run

$ python -m pb_sed.experiments.strong_crnn.training with net_config=deep m=2 init_ckpt_path=/path/to/storage_root/weak_label_crnn/audioset/training/<group_timestamp>/<model_timestamp> frozen_cnn_2d_layers=18 frozen_cnn_1d_layers=1 weak_label_crnn_hyper_params_dir=/path/to/storage_root/weak_label_crnn/desed/hyper_params/<timestamp>

To train a tag-conditioned BiCRNN instead, add trainer.model.tag_conditioning=True to the command.

dawenxi-only / pb_sed Goto Github PK

pb_sed's Introduction

pb_sed: Paderborn Sound Event Detection

Installation

Database

DESED

AudioSet

Experiments

Forward-Backward CRNN (FBCRNN)

Bidirectional CRNN (requiring strong labels)

AudioSet Pre-training

pb_sed's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent