richermans / datadriven-gpvad Goto Github PK
View Code? Open in Web Editor NEWThe codebase for Data-driven general-purpose voice activity detection.
License: MIT License
The codebase for Data-driven general-purpose voice activity detection.
License: MIT License
Thansk for your code.
I`m trying to train from scratch by teacher1.
But I did meet this error when I run 'run.py'.
How can I solve this problem?
Advancely Thank you!!
(env_gpvad)my_account:~/Datadriven-GPVAD$ python run.py train configs/example.yaml
[2022-01-24 20:46:21] Storing files in experiments/CRNN/2022-01-24_20-46-01_400e8c547d0b11ec9397a0423f3aed9a
[2022-01-24 20:46:21] batch_size: 64
[2022-01-24 20:46:21] data: data/csv_labels/balanced.csv
[2022-01-24 20:46:21] data_args:
[2022-01-24 20:46:21] mode: null
[2022-01-24 20:46:21] early_stop: 15
[2022-01-24 20:46:21] epochs: 15
[2022-01-24 20:46:21] itercv: 10000
[2022-01-24 20:46:21] label: data/softlabels/csv/balanced.csv
[2022-01-24 20:46:21] label_type: soft
[2022-01-24 20:46:21] loss: FrameBCELoss
[2022-01-24 20:46:21] model: CRNN
[2022-01-24 20:46:21] model_args: {}
[2022-01-24 20:46:21] num_workers: 8
[2022-01-24 20:46:21] optimizer: AdamW
[2022-01-24 20:46:21] optimizer_args:
[2022-01-24 20:46:21] lr: 0.001
[2022-01-24 20:46:21] outputpath: experiments/
[2022-01-24 20:46:21] postprocessing: double
[2022-01-24 20:46:21] save: best
[2022-01-24 20:46:21] scheduler_args:
[2022-01-24 20:46:21] factor: 0.1
[2022-01-24 20:46:21] patience: 10
[2022-01-24 20:46:21] threshold: null
[2022-01-24 20:46:21] transforms:
[2022-01-24 20:46:21] - timemask
[2022-01-24 20:46:21] - freqmask
[2022-01-24 20:46:21]
[2022-01-24 20:46:21] Running on device cpu
[2022-01-24 20:46:21] train_df
[2022-01-24 20:46:21] cv_df
[2022-01-24 20:46:21] Transforms:
[2022-01-24 20:46:21] Sequential(
[2022-01-24 20:46:21] (0): TimeMask()
[2022-01-24 20:46:21] (1): FreqMask()
[2022-01-24 20:46:21] )
Traceback (most recent call last):
File "run.py", line 639, in
fire.Fire(Runner)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/t3qadmin/anaconda3/envs/env_gpvad/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "run.py", line 118, in train
assert len(cv_df) > 0, "Fraction a bit too large?"
AssertionError: Fraction a bit too large?
Hi,
I have some issue about extract feature.
1, In the file "configs/example.yaml"
data: data/softlabels/hdf5/balanced.h5
label: data/softlabels/csv/balanced.csv -> csv_labels/balanced.csv
2, when I run "python3 extract_features.py" command, there is an error!
in prepare_labels.py can't find "encoders/balanced.pth". it should be "labelencoders/vad.path" ? but when use models " 'gpvb':" ?
could you give me advice about it ?
MODELS = {
'crnn': {
'model': crnn,
'encoder': torch.load('encoders/balanced.pth'),
'outputdim': 527,
},
'gpvb': {
'model': crnn,
'encoder': torch.load('../labelencoders/vad.pth'), #('encoders/balanced_binary.pth'),
'outputdim': 2,
}
}
thanks for your response!
Thanks for the great work!
Can you provide two teachers pretrained balanced.pth and balanced_binary.pth?
I appreciate your help!
when I run python3 prepare_labels.py --pre ../pretrained_models/teacher1/model.pth csv_labels/balanced.csv softlabels/hdf5/balanced.h5 softlabels/csv/balanced.csv
I find a error ,the encoders/balanced.pth not exist. I download https://github.com/RicherMans/GPV/blob/master/pretrained/gpv_f.pth to encoders/balanced.pth according to the other issues. But have the error encoder has not classes_
Hi, Guys, This is a good job
In Dataset, test set C came from DCASE18, but DCASE18 has 5 tasks. so, Teseset C come which task? Thanks
Hi,
Thank you for your wonderful work with GPVAD. I am looking at training the student model from scratch for my dataset(s). My dataset is in the form of audio_signal (wav) and the region has been tagged within the audio sample.
For example:
[{'type': 'BACKGROUND NOISE', 'time-range': [3.041, 3.169]}, {'type': 'SPEECH', 'time-range': [5.208, 5.544]}, {'type': 'BACKGROUND NOISE', 'time-range': [4.339, 5.069]}]
is a tagged audio. Can your data pipeline support training for such data formats? If not, what do you suggest I should do to find a work around this?
Thanks a lot!
Hello! I noticed the evaluate function in run.py, which is shown as bellow.
Actually I don't know the format of labels.tsv. Could you provide the evaluation set? if not, is it possible to give a screenshot for labels.tsv
By the way, is the data.h5 same as train set, which is extracted by extract_feature.py ?
Thanks!
When I was trying to train the model, I meet a new problem with UnicodeDecodeError.
File "run.py", line 97, in train
data_df = pd.read_csv(config_parameters['data'], sep='\s+')
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in init
self._make_engine(self.engine)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 782, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
I changed the data/softlabels/hdf5/balanced.h5 to utf8 and it is like:
8948 4446 0d0a 1a0a 0000 0000 0008 0800
0400 1000 0000 0000 0000 0000 0000 0000
ffff ffff ffff ffff ccda 4b01 0000 0000
ffff ffff ffff ffff 0000 0000 0000 0000
6000 0000 0000 0000 0100 0000 0000 0000
8800 0000 0000 0000 a802 0000 0000 0000
0100 0100 0100 0000 1800 0000 0000 0000
1100 1000 0000 0000 8800 0000 0000 0000
......
A new problem relating to 'filename' occur.
This line of code in run.py indicates that the data_df also needs a 'filename' line ?
merged = data_df.merge(label_df, on='filename')
Hi,
Something wrong when I tried to extract features with "python extract_feature.py wavs.txt -o hdf5/balanced.h5"
Traceback (most recent call last):
File "extract_feature.py", line 86, in
DF[ARGS.col].unique(),
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in getitem
indexer = self.columns.get_loc(key)
File "/data/anaconda3/envs/gpvad/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'filename'
Is the pandas version wrong or something else?
Plz help. Thx
Hi,
Do you have any idea about fine-tunning the pretrained model(such sre) to a more complicated scenario using a small related data set? I tried to use the teacher model to label the new data set, and train few epochs with a very small learning rate. Howerver, the performance drops drastically. Quit sad.
How to train "teacher"?
Hi everyone,
First of all, thanks for the great work!
Can you provide the script for downloading the Audioset?
Thank you so much!
How was the ground truth in the article be set? How to get it?
Hi,
Thank you for your work on Datadriven-GPVAD. I was able to set it up and do some inferencing for my data quickly.
I wanted to know if I can use your model SRE (or any) for languages other than English. I wanted to use your model for Hindi. Or would you suggest training your model from scratch for other languages?
Also, I wanted to know if you would recommend mixing the data points for both English and Hindi and trying to train a language-agnostic model using your work.
Thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.