CERBERUS Network for Music Separation & Transcription
This is a rough implementation of Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments (Ethan Manilow et al., ICASSP2020)
Note
This implementation did not achieve as much performance as reported in the paper.
Demo (Source Separation)
Quantitative Evaluation (Transcription)
Precision | Recall | Accuracy | |
---|---|---|---|
Piano | 0.585 | 0.566 | 0.460 |
Bass | 0.797 | 0.817 | 0.747 |
Drums | 0.230 | 0.417 | 0.133 |
- Note : There's no benchmark dataset. These results are measured on data I randomly created using test set of Slakh2100 dataset. So It is not appropriate to quantitatively compare these results with those reported in the paper.
Pretrained Network & config
Inference
python inference.py hparams.yaml weight.ckpt input.wav output_dir/
Training with Slakh2100 Dataset
- Get Slakh2100 dataset (See: Slakh2100 Project)
- Downsample audio to 16k
- Modify configs/config.yaml
data_dir: "/path/to/slakh2100_flac_16k/" # see: validation_epoch_end() in network/cerberus_wrapper.py sample_audio: path: "/path/to/sample/audio/sample_rate_16k.wav" offset: 1264000 num_frames: 160000
- Run training
python train.py
Contact
- Jongho Choi ([email protected])
- Jiwon Kim
- Ahyeon Choi