richermans / datadriven-gpvad Goto Github PK

View Code? Open in Web Editor NEW

92.0 92.0 23.0 21.17 MB

The codebase for Data-driven general-purpose voice activity detection.

License: MIT License

Python 100.00%

machine-learning noise-robust pytorch speech-activity-detection voice-activity-detection

datadriven-gpvad's People

Stargazers

Watchers

datadriven-gpvad's Issues

When forward “example.wav”, Can not get the same result as Readme

Hello, I have git pull code, and pip install requirements.
When run "python forward.py -w ./example/example.wav", the result bellow which is different from README. Is there any problom?Thank you very much

assert len(cv_df) > 0, "Fraction a bit too large?"

Thansk for your code.
I`m trying to train from scratch by teacher1.
But I did meet this error when I run 'run.py'.
How can I solve this problem?
Advancely Thank you!!

(env_gpvad)my_account:~/Datadriven-GPVAD$ python run.py train configs/example.yaml
[2022-01-24 20:46:21] Storing files in experiments/CRNN/2022-01-24_20-46-01_400e8c547d0b11ec9397a0423f3aed9a
[2022-01-24 20:46:21] batch_size: 64
[2022-01-24 20:46:21] data: data/csv_labels/balanced.csv
[2022-01-24 20:46:21] data_args:
[2022-01-24 20:46:21] mode: null
[2022-01-24 20:46:21] early_stop: 15
[2022-01-24 20:46:21] epochs: 15
[2022-01-24 20:46:21] itercv: 10000
[2022-01-24 20:46:21] label: data/softlabels/csv/balanced.csv
[2022-01-24 20:46:21] label_type: soft
[2022-01-24 20:46:21] loss: FrameBCELoss
[2022-01-24 20:46:21] model: CRNN
[2022-01-24 20:46:21] model_args: {}
[2022-01-24 20:46:21] num_workers: 8
[2022-01-24 20:46:21] optimizer: AdamW
[2022-01-24 20:46:21] optimizer_args:
[2022-01-24 20:46:21] lr: 0.001
[2022-01-24 20:46:21] outputpath: experiments/
[2022-01-24 20:46:21] postprocessing: double
[2022-01-24 20:46:21] save: best
[2022-01-24 20:46:21] scheduler_args:
[2022-01-24 20:46:21] factor: 0.1
[2022-01-24 20:46:21] patience: 10
[2022-01-24 20:46:21] threshold: null
[2022-01-24 20:46:21] transforms:
[2022-01-24 20:46:21] - timemask
[2022-01-24 20:46:21] - freqmask
[2022-01-24 20:46:21]
[2022-01-24 20:46:21] Running on device cpu
[2022-01-24 20:46:21] train_df
[2022-01-24 20:46:21] cv_df
[2022-01-24 20:46:21] Transforms:
[2022-01-24 20:46:21] Sequential(
[2022-01-24 20:46:21] (0): TimeMask()
[2022-01-24 20:46:21] (1): FreqMask()
[2022-01-24 20:46:21] )
Traceback (most recent call last):
File "run.py", line 639, in
fire.Fire(Runner)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "run.py", line 118, in train
assert len(cv_df) > 0, "Fraction a bit too large?"
AssertionError: Fraction a bit too large?

The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”

Hi,

I have some issue about extract feature.

1, In the file "configs/example.yaml"

data: data/softlabels/hdf5/balanced.h5
label: data/softlabels/csv/balanced.csv -> csv_labels/balanced.csv

2, when I run "python3 extract_features.py" command, there is an error!

in prepare_labels.py can't find "encoders/balanced.pth". it should be "labelencoders/vad.path" ? but when use models " 'gpvb':" ?

could you give me advice about it ?

MODELS = {
'crnn': {
'model': crnn,
'encoder': torch.load('encoders/balanced.pth'),
'outputdim': 527,
},
'gpvb': {
'model': crnn,
'encoder': torch.load('../labelencoders/vad.pth'), #('encoders/balanced_binary.pth'),
'outputdim': 2,
}
}

thanks for your response!

Provide teacher pretrained for project

Thanks for the great work!
Can you provide two teachers pretrained balanced.pth and balanced_binary.pth?

I appreciate your help!

The error about “python3 extract_features.py wavs.txt -o hdf5/balanced.h5”，too

Hi，thanks for your excellent work.
When I rerun the program, I have got some errors.
such as:

I check it as "python3 extract_feature.py wavs.txt -o hdf5/balanced.h5"， but I got other error:

prepare label error

when I run python3 prepare_labels.py --pre ../pretrained_models/teacher1/model.pth csv_labels/balanced.csv softlabels/hdf5/balanced.h5 softlabels/csv/balanced.csv
I find a error ,the encoders/balanced.pth not exist. I download https://github.com/RicherMans/GPV/blob/master/pretrained/gpv_f.pth to encoders/balanced.pth according to the other issues. But have the error encoder has not classes_

Testset C came which one tasks in DCASE18?

Hi, Guys, This is a good job
In Dataset, test set C came from DCASE18, but DCASE18 has 5 tasks. so, Teseset C come which task? Thanks

Training from scratch [Data format query]

Hi,
Thank you for your wonderful work with GPVAD. I am looking at training the student model from scratch for my dataset(s). My dataset is in the form of audio_signal (wav) and the region has been tagged within the audio sample.
For example:
[{'type': 'BACKGROUND NOISE', 'time-range': [3.041, 3.169]}, {'type': 'SPEECH', 'time-range': [5.208, 5.544]}, {'type': 'BACKGROUND NOISE', 'time-range': [4.339, 5.069]}] is a tagged audio. Can your data pipeline support training for such data formats? If not, what do you suggest I should do to find a work around this?
Thanks a lot!

Evaluation set could provide？

Hello! I noticed the evaluate function in run.py, which is shown as bellow.

Actually I don't know the format of labels.tsv. Could you provide the evaluation set? if not, is it possible to give a screenshot for labels.tsv
By the way, is the data.h5 same as train set, which is extracted by extract_feature.py ?
Thanks!

'filename' also needed in data/softlabels/hdf5/balanced.h5 ?

When I was trying to train the model, I meet a new problem with UnicodeDecodeError.

File "run.py", line 97, in train
data_df = pd.read_csv(config_parameters['data'], sep='\s+')
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 782, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

I changed the data/softlabels/hdf5/balanced.h5 to utf8 and it is like:

8948 4446 0d0a 1a0a 0000 0000 0008 0800
0400 1000 0000 0000 0000 0000 0000 0000
ffff ffff ffff ffff ccda 4b01 0000 0000
ffff ffff ffff ffff 0000 0000 0000 0000
6000 0000 0000 0000 0100 0000 0000 0000
8800 0000 0000 0000 a802 0000 0000 0000
0100 0100 0100 0000 1800 0000 0000 0000
1100 1000 0000 0000 8800 0000 0000 0000
......

A new problem relating to 'filename' occur.
This line of code in run.py indicates that the data_df also needs a 'filename' line ?
merged = data_df.merge(label_df, on='filename')

Something wrong when I tried to extract features

Hi,

Something wrong when I tried to extract features with "python extract_feature.py wavs.txt -o hdf5/balanced.h5"

Traceback (most recent call last):
File "extract_feature.py", line 86, in
DF[ARGS.col].unique(),
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'filename'

Is the pandas version wrong or something else?
Plz help. Thx

About how to perform fine-tunning

Hi，

Do you have any idea about fine-tunning the pretrained model(such sre) to a more complicated scenario using a small related data set? I tried to use the teacher model to label the new data set, and train few epochs with a very small learning rate. Howerver, the performance drops drastically. Quit sad.

How to train "teacher"?

Script for Audioset downloader

Hi everyone,
First of all, thanks for the great work!
Can you provide the script for downloading the Audioset?
Thank you so much!

How was the ground truth in the article be set? How to get it?

Using the SRE model for other languages

Hi,
Thank you for your work on Datadriven-GPVAD. I was able to set it up and do some inferencing for my data quickly.
I wanted to know if I can use your model SRE (or any) for languages other than English. I wanted to use your model for Hindi. Or would you suggest training your model from scratch for other languages?
Also, I wanted to know if you would recommend mixing the data points for both English and Hindi and trying to train a language-agnostic model using your work.
Thanks a lot!

richermans / datadriven-gpvad Goto Github PK

datadriven-gpvad's People

Stargazers

Watchers

Forkers

datadriven-gpvad's Issues

Recommend Projects

Recommend Topics

Recommend Org