floriankrey / dnc Goto Github PK

View Code? Open in Web Editor NEW

78.0 9.0 14.0 3.71 GB

Discriminative Neural Clustering for Speaker Diarisation

License: Apache License 2.0

Python 83.61% Shell 16.39%

speaker-diarization supervised-clustering machine-learning clustering university-of-cambridge speech-processing

dnc's Introduction

Discriminative Neural Clustering (DNC) for Speaker Diarisation

This repository is the code used in our paper:

Discriminative Neural Clustering for Speaker Diarisation

Qiujia Li*, Florian Kreyssig*, Chao Zhang, Phil Woodland (* indicates equal contribution)

Overview

We propose to use encoder-decoder models for supervised clustering. This repository contains:

a submodule for spectral clustering, a modified version of this repository by Google
a submodule for DNC using Transformers, implemented in ESPnet
data processing procedures for data augmentation & curriculum learning in our paper

Dependencies

First, as this repository contains two submodules, after cloning this repository, please run

git submodule update --init --recursive

Then execute the following command to install MiniConda for virtualenv with related packages:

cd DNC
./install.sh

Note that you may want to change the CUDA version for PyTorch in install.sh according to your own driver.

Data generation

First activate the virtual environment:

source venv/bin/activate

To generate training and validation data with sub-meeting length 50 and 1000 random shifts:

python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf --filtEncomp --maxlen 50 --augment 1000 --varnormalise /path/to/datadir/m50.real.augment

python3 datapreperation/gen_augment_data.py --input-scps data/dev.scp --input-mlfs data/dev.mlf --filtEncomp --maxlen 50 --augment 1000 --varnormalise /path/to/datadir/m50.real.augment

To generate training data with sub-meeting length 50 and 1000 random shifts using the meeting randomisation:

python3 datapreperation/gen_dvecdict.py --input-scps data/train.scp --input-mlfs data/train.mlf --filtEncomp --segLenConstraint 100 --meetingLevelDict /path/to/datadir/dvecdict.meeting.split100

python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf --filtEncomp --maxlen 50 --augment 100 --varnormalise --randomspeaker  --dvectordict /path/to/datadir/dvecdict.meeting.split100/train.npz /path/to/datadir/m50.meeting.augment/

To generate evaluation data:

python3 datapreperation/gen_augment_data.py --input-scps data/eval.scp --input-mlfs data/eval.mlf --filtEncomp --maxlen 50 --varnormalise /path/to/datadir/m50.real

Training and decoding of DNC models

Train a DNC Transformer

The example setup for AMI is in

cd espnet/egs/ami/dnc1

There are multiple configuration files you may want to change:

model training config: config/tuning/train_transformer.yaml
model decoding config: config/decode.yaml
submission config: cmd_backend variable should be set in cmd.sh to use your preferred setup. You may also want to modify the corresponding submission settings for the queuing system, e.g. config/queue.conf for SGE or conf/slurm.conf for SLURM.

To start training, run

./run.sh --stage 4 --stop_stage 4 --train_json path/to/train.json --dev_json path/to/dev.json --tag tag.for.model --init_model path/to/model/for/initialisation

If the model trains from scratch, the --init_model option should be omitted. For more options, please look into run.sh and config/tuning/train_transformer.yaml.

To track the progress of the training, run

tail -f exp/mdm_train_pytorch_tag.for.model/train.log

Decode a DNC Tranformer

Similar to the command used for training, run

./run.sh --stage 5 --decode_json path/to/eval.json --tag tag.for.model

For more options, please look into run.sh and config/decode.yaml.

The decoding results are, by default, stored in multiple json files in exp/mdm_train_pytorch_tag.for.model/decode_dev_xxxxx/data.JOB.json

Running spectral clustering

To run spectral clustering on previously generated evalutation data, for example for sub-meeting lengths 50:

python3 scoring/run_spectralclustering.py --p-percentile 0.95 --custom-dist cosine --json-out /path/to/scoringdir/eval95k24.1.json  /path/to/datadir/m50.real/eval.json

Evaluation of clustering results

First the DNC or SC output has to be converted into the RTTM format: For SC:

python3 scoring/gen_rttm.py --input-scp data/eval.scp --js-dir /path/to/scoringdir --js-num 1 --js-name eval95k24 --rttm-name eval95k24

For DNC:

python3 scoring/gen_rttm.py --input-scp data/eval.scp --js-dir espnet/egs/ami/dnc1/exp/mdm_train_pytorch_tag.for.model/decode_dev_xxxxx/ --js-num 16 --js-name data --rttm-name evaldnc

To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:

python3 scoring/split_rttm.py --submeeting-rttm /path/to/scoringdir/eval95k24.rttm --input-rttm scoring/refoutputeval.rttm --output-rttm /path/to/scoringdir/reference.rttm

Finally, the speaker error rate has to be calculated using:

python3 scoring/score_rttm.py --score-rttm /path/to/scoringdir/eval95k24.rttm --ref-rttm /path/to/scoringdir/reference.rttm --output-scoredir /path/to/scoringdir/eval95k24

Reference

@misc{LiKreyssig2019DNC,
  title={Discriminative Neural Clustering for Speaker Diarisation},
  author={Li, Qiujia and Kreyssig, Florian L. and Zhang, Chao and Woodland, Philip C.},
  journal={ArXiv.org},
  eprint={1910.09703}
  year={2019},
  url={https://arxiv.org/abs/1910.09703}
}

dnc's People

Contributors

Stargazers

Watchers

Forkers

hyzcn entn-at chaopig 1215thebqtic opencvnoob divyeshrajpura4114 twistedmove bbrookie yangyutu ishine llior greatnoble lemoncandy42 li-ronghui

dnc's Issues

Have trouble during training DNC Transformer

Hello, Thanks for your sharing!
I'm a graduate student from Taiwan.
I tried to follow yours step to train a DNC Transformer,but when I use
"./run.sh --stage 4 --stop_stage 4 --train_json /home/erichong0318/DNC/gen_data/m50.meeting.augment/train.json --dev_json /home/erichong0318/DNC/gen_data/m50.real.augment/dev.json --tag test"
to train my model,I encountered this problem.

The step to prepare data is all copy from yous github code
What should I do to solve this?
thanks a lot!

Custom Kmeans

Hi, I think that there is an over-simplification in Custom Kmeans, the way the centroids are estimated:

centres[each_center] = np.mean(X[each_center_samples], axis=0)

doesn't actually yield to the point that minimise the average custom distances within a cluster.
The mean is the optimal solution for the euclidian distance but not for an arbitrary distance. For instance, in the case of cosine distance, the mean calculated as above will give the optimum center of the cluster only if X rows are l2-normalised.

A more general solution would be to use sklearn_extra.cluster.KMedoids

How to get d-vectors

Hi,
thank you for your work.

I am trying to replicate your setup with AMI, but I have no idea, how to get d-vectors. You propose some augmentation technique, but when running those commands, it is trying to find arks I do not have.

FileNotFoundError: [Errno 2] No such file or directory: 'data/train/train.00.ark

I was able to download AMI (ihm) and process data, but I am not able to find anything on d-vectors.

Thanks.

issues about training with init model

Hi，
I tried to replicate the experiment，and follow yours step to train a DNC Transformer with the same configuration. The result of spectral cluster DER is the same，but the results of DNC are inconsistent. The DER of DNC is 32.52%，by contrast, the result of paper is 13.90%
My shell cmds are follow:

run.sh

#current path is DNC/
dnc_root=espnet/egs/ami/dnc1
path_to_datadir=data/augment_data
m50_real_augment_path=$path_to_datadir/m50.real.augment
m50_meeting_augment_path=$path_to_datadir/m50.meeting.augment
dvecdict_meeting_path=$path_to_datadir/dvecdict.meeting.split100
m50_real_path=$path_to_datadir/m50.real
SC_scoring_path=scoring/sys_rttm/SC_result
DNC_scoring_path=scoring/sys_rttm/DNC_result
data_path=data	
model_init=exp/mdm_train_pytorch_tag.for.model/results/model.acc.best
resume_path=exp/mdm_train_pytorch_tag.for.model/results/snapshot.ep.50

./path.sh
ln -s $dnc_root dnc1
#To generate training and validation data with sub-meeting length 50 and 1000 random shifts
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
python3 datapreperation/gen_augment_data.py --input-scps data/dev.scp --input-mlfs data/dev.mlf \
	--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
	
#To generate training data with sub-meeting length 50 and 1000 random shifts using the meeting randomisation
python3 datapreperation/gen_dvecdict.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --segLenConstraint 100 --meetingLevelDict $dvecdict_meeting_path
	
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --maxlen 50 --augment 100 --varnormalise --randomspeaker  \
	--dvectordict $dvecdict_meeting_path/train.npz $m50_meeting_augment_path
	
#To generate evaluation data
python3 datapreperation/gen_augment_data.py --input-scps data/eval.scp --input-mlfs data/eval.mlf \
	--filtEncomp --maxlen 50 --varnormalise $m50_real_path
	
cd $dnc_root

#To start training, run
CUDA_VISIBLE_DEVICES=1,2,3 ./run.sh --stage 4 --stop_stage 4 --train_json ../../../../$m50_real_augment_path/train.json \
	--ngpu 3 --dev_json ../../../../$m50_real_augment_path/dev.json --tag tag.for.model

#To track the progress of the training, run
tail -f exp/mdm_train_pytorch_tag.for.model/train.log

#Decode a DNC Tranformer
#Similar to the command used for training, run
#The decoding results are, by default, stored in multiple json files in exp/mdm_train_pytorch_tag.for.model/decode_dev_xxxxx/data.JOB.json
./run.sh --stage 5 --decode_json ../../../../$m50_real_path/eval.json --tag tag.for.model

cd ../../../../
#Running spectral clustering
#To run spectral clustering on previously generated evalutation data, for example for sub-meeting lengths 50:
python3 scoring/run_spectralclustering.py --p-percentile 0.95 --custom-dist cosine \
	--json-out $SC_scoring_path/eval95k24.1.json  $m50_real_path/eval.json
#Evaluation of clustering results
#First the DNC or SC output has to be converted into the RTTM format: 
#For SC:
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $SC_scoring_path \
	--js-num 1 --js-name eval95k24 --rttm-name eval95k24
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $SC_scoring_path/eval95k24.rttm \
	--input-rttm scoring/refoutputeval.rttm --output-rttm $SC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $SC_scoring_path/eval95k24.rttm \
	--ref-rttm $SC_scoring_path/reference.rttm --output-scoredir $SC_scoring_path/result


#For DNC:
\cp $dnc_root/exp/mdm_train_pytorch_tag.for.model/decode_mdm_dev_decode/data* $DNC_scoring_path
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $DNC_scoring_path \
	--js-num 16 --js-name data --rttm-name evaldnc
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $DNC_scoring_path/evaldnc.rttm \
	--input-rttm scoring/refoutputeval.rttm --output-rttm $DNC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $DNC_scoring_path/evaldnc.rttm \
	--ref-rttm $DNC_scoring_path/reference.rttm --output-scoredir $DNC_scoring_path/result

I consider the problem is that the training dataset was augmented by three ways, while I only used m50_real_augment dataset to train model.
Then I want to add parameter --init_model $model_init to train m50_meeting_augment dataset, there is an error:

train.log

# asr_train.py --config conf/tuning/train_transformer.yaml --ngpu 3 --backend pytorch --outdir exp/mdm_train_pytorch_tag.for.model/results --tensorboard-dir tensorboard/mdm_train_pytorch_tag.for.model --debugmode 1 --dict data/lang_1char/mdm_train_units.txt --debugdir exp/mdm_train_pytorch_tag.for.model --minibatches 0 --verbose 0 --resume --asr-model exp/mdm_train_pytorch_tag.for.model/results/model.acc.best --train-sample-rate 0.2 --rotate true --seed 1 --train-json ../../../../data/augment_data/m50.real.augment/train.json --valid-json ../../../../data/augment_data/m50.real.augment/dev.json 
# Started at Wed May  5 22:22:08 EDT 2021
#
/work/wj/DNC/venv/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
2021-05-05 22:22:09,421 (asr_train:322) WARNING: Skip DEBUG/INFO messages
None
Traceback (most recent call last):
  File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 386, in <module>
    main(sys.argv[1:])
  File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 374, in main
    train(args)
  File "/work/wj/DNC/espnet/espnet/asr/pytorch_backend/asr.py", line 333, in train
    model = model_class(idim, odim, args, asr_model=asr_model, mt_model=mt_model)
TypeError: __init__() got an unexpected keyword argument 'asr_model'
# Accounting: time=2 threads=1
# Ended (code 1) at Wed May  5 22:22:10 EDT 2021, elapsed time 2 seconds

What steps have I made wrong?
Thank you very much！

Why data/train.scp has same features in meeting EN2001a, EN2001d, EN2001e ?

I have found that EN2001a, EN2001d, and EN2001e meeting utterances have four times the features of the original.

Why is it like this?

from kaldiio import ReadHelper

feats = []
with ReadHelper('scp:data/train.scp') as reader:
    for key, numpy_array in reader:
        #if "0007881_0007913" in key:
        if "0056607_0056837" in key:
            print(key)
            feats.append(numpy_array)

#AMIXXX-00001-1EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-3EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-4EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-5EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837

print(np.array_equal(feats[0], feats[1]))
print(np.array_equal(feats[1], feats[2]))
print(np.array_equal(feats[2], feats[3]))

#True
#True
#True

Consider directly using the newest version of SpectralCluster

The SpectralCluster library has iterated many versions.

The newest versions have much more functionalities, including custom distances for K-means like cosine.

Please consider directly importing the newest version of this library instead of a nested fork 😄

Also, you can directly use the equivalent config in the ICASSP2018 paper in a few lines:

from spectralcluster import configs

labels = configs.icassp2018_clusterer.predict(X)

Is it feasible to use DNC on own data?

The relevant parts of run.sh are removed (stages -1 to 3) and the used TDNN for the 32-dimensional d-vector embedding is unclear to me from the section 5.2 of the paper.
In issue #2 you mention any d-vector embedding could be used, but is this really true? Some parameters have to be identical, don't they (window size, overlap, sample rate,...)?
How would you have to edit the AMI-label files (the offsets probably)?
What files would I have to create for own data to use the decoder on these?

tldr: Is it feasible to use DNC on own data?