Git Product home page Git Product logo

speakerprofiling's Introduction

Speaker Profiling

This Repository contains the code for estimating the Age, Height and Gender of a speaker with their speech signal. The repository experiments with both TIMIT and NISP Dataset.

model architeture

DEMO Colab Notebook

Installation

Use the package manager pip to install the required packages for preparing the dataset, training and testing the model.

pip install -r requirements.txt

Usage

Download the dataset

# Timit Dataset
wget https://data.deepai.org/timit.zip
unzip timit.zip -d 'path to timit data folder'

# NISP Dataset
git clone https://github.com/iiscleap/NISP-Dataset.git

Prepare the dataset for training and testing

# TIMIT Dataset
python TIMIT/prepare_timit_data.py --path='path to timit data folder'

# NISP Dataset
python NISP/prepare_nisp_data.pt --nisp_repo_path='path to nisp data repo folder'

Update Config and Logger

Update the config.py file to update the batch_size, gpus, lr, etc and change the preferred logger in train_.py files

Training(Dev Model, to make sure everything is set as expected for training)

# TIMIT Dataset
python train_timit.py --dev=True --data_path='path to final data folder'

# NISP Dataset
python train_nisp.py --dev=True --data_path='path to final data folder'

Training(also check for other arguments in the train_....py file)

# TIMIT Dataset
python train_timit.py --data_path='path to final data folder'

# NISP Dataset
python train_nisp.py --data_path='path to final data folder'

Test the Model

# TIMIT Dataset
python test_timit.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'

# NISP Dataset
python test_nisp.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'

Results

Wandb Runs

TIMIT Baseline

Model Height RMSE Height MAE Age RMSE Age MAE Gender Acc
Male Female Male Female Male Female Male Female
MFCC_LSTM-Attn 7.5 6.6 5.5 5.2 7.7 8.4 5.6 5.9 0.975
MelSpec_LSTM-Attn 7.7 8.1 5.8 6.5 7.7 8.7 5.5 6.1 0.669
MFCC_CNN-LSTM-Attn 7.5 6.8 5.7 5.3 8.2 8.7 5.4 6.1 0.989
MelSpec_CNN-LSTM-Attn 7.5 7.4 5.8 5.8 8.2 8.4 5.8 5.9 0.96
wav2vec(no-finetune)-LSTM-Attn 7.4 6.4 5.5 5.1 7.2 8.2 5.0 5.7 0.994
wav2vec(finetune 56)-LSTM-Attn 7.5 6.2 5.5 4.9 7.5 7.9 5.5 5.7 0.994
wav2vec(finetune 6)-LSTM-Attn 7.6 6.7 5.6 5.3 7.0 8.2 4.9 5.6 0.993
wav2vec(finetune 56)-LSTM-Attn(Only H) 7.4 6.2 5.6 4.9
multi-scale-cnn(Only H) 7.5 6.1 5.9 4.7

TIMIT Previous Results

Model Height RMSE Height MAE Age RMSE Age MAE Gender Acc
Male Female Male Female Male Female Male Female
[1] 2019 6.85 6.29 - - 7.6 8.63 - -
[2] 2016 (fusion) 6.7 6.1 5.0 5.0 7.8 8.9 5.5 6.5
[2] 2016 (baseline) 7.0 6.5 5.3 5.2 8.1 9.1 5.7 6.2
[3] 2020 - - - - 7.24 8.12 5.12 5.29 0.996
[4] 2009 6.8 6.3 5.3 5.1 - - - -

NISP

Model Height RMSE Height MAE Age RMSE Age MAE Gender Acc
Male Female Male Female Male Female Male Female
[5] TMP 6.17 6.93 5.22 5.30 5.60 5.57 4.40 4.42
[5] Comb-3 6.13 6.70 5.16 5.30 5.63 4.99 3.80 3.76
Our Method 6.49 6.37 5.32 5.12 5.48 5.71 3.70 4.22 0.984

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Reference

  • [1] S. B. Kalluri, D. Vijayasenan and S. Ganapathy, "A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech," ICASSP 2019 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 6580-6584, doi: 10.1109/ICASSP.2019.8683397.
  • [ 2 ] Rita Singh, Bhiksha Raj, and James Baker, “Short-term analysis for estimating physical parameters of speakers,” in Proc. of IWBF. IEEE, 2016, pp. 1–6
  • [ 3 ] Joint gender and age estimation based on speech signals using x-vectors and transfer learning ICASSP 2021.
  • [ 4 ] Mporas, I., Ganchev, T. Estimation of unknown speaker’s height from speech. Int J Speech Technol 12, 149–160 (2009). https://doi.org/10.1007/s10772-010-9064-2

speakerprofiling's People

Contributors

shangeth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

speakerprofiling's Issues

Input data size

Hi,
I'm trying to convert your model to coreml model using the following code.

example_input = torch.rand(128, 40, 3)
traced_model = torch.jit.trace(model, example_input)

I keep getting errors about the size of example input being wrong. Can you correct me on this?
thanks

Error in traning <using Colab>

!python /content/SpeakerProfiling/train_timit.py --data_path='/content/dataset/wav_data' --speaker_csv_path='/content/SpeakerProfiling/Dataset/data_info_height_age.csv' --noise_dataset_path='/content/noise_dataset'

Global seed set to 100
Training Model on TIMIT Dataset
#Cores = 4 #GPU = -1
/usr/local/lib/python3.7/dist-packages/torchaudio/functional/functional.py:433: UserWarning: At least one mel filterbank has all zero values. The value for n_mels (128) may be set too high. Or, the value for n_freqs (201) may be set too low.
"At least one mel filterbank has all zero values. "
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Dataset Split (Train, Validation, Test)= 3920 700 1680
/usr/local/lib/python3.7/dist-packages/deprecate/deprecation.py:115: LightningDeprecationWarning: The MeanSquaredError was deprecated since v1.3.0 in favor of torchmetrics.regression.mean_squared_error.MeanSquaredError. It will be removed in v1.5.0.
stream(template_mgs % msg_args)
/usr/local/lib/python3.7/dist-packages/deprecate/deprecation.py:115: LightningDeprecationWarning: The MeanAbsoluteError was deprecated since v1.3.0 in favor of torchmetrics.regression.mean_absolute_error.MeanAbsoluteError. It will be removed in v1.5.0.
stream(template_mgs % msg_args)
Model Details: #Params = 671345 #Trainable Params = 671345
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:111: LightningDeprecationWarning: Trainer(distributed_backend=ddp) has been deprecated and will be removed in v1.5. Use Trainer(accelerator=ddp) instead.
f"Trainer(distributed_backend={distributed_backend}) has been deprecated and will be removed in v1.5."
Traceback (most recent call last):
File "/content/SpeakerProfiling/train_timit.py", line 131, in
distributed_backend='ddp'
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 40, in insert_env_defaults
return fn(self, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 421, in init
max_time,
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 52, in on_trainer_init
self._configure_checkpoint_callbacks(checkpoint_callback)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 77, in _configure_checkpoint_callbacks
raise MisconfigurationException(error_msg)
pytorch_lightning.utilities.exceptions.MisconfigurationException: Invalid type provided for checkpoint_callback: Expected bool but received <class 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint'>. Pass callback instances to the callbacks argument in the Trainer constructor instead.

Age estimation results are not satisfying

Hi,
Thanks for the relatively well understood git.

However, it seems that the results regarding age estimation are not actually meaningful. The MAE when simply choosing the average of the dataset (e.g., E[ |est_age - E[ages_dataset]| ] ) is about 5.6, which means that the algorithm might not really do anything significant in the estimation.

I suggest you guys to check this out with a more varied dataset to see if there isn't some problem with the whole network.

Error in TIMIT height conversion

It seems that due to missing " in the orginal TIMIT speaker info file, this repository contains error in height conversion.
Speaker PAM1 is 5'11 which is 180.34, not 154.94. Unfortunately, this speaker is part of a test set, which invalidates the presented results. On the other hand, the real result will be probably better :)

File under TIMIT/dataset.py is broken

The file TIMIT/dataset.py has an unfixed git merge conflict from lines 91 to line 101 in the main branch of the repo.

<<<<<<< HEAD
        return wav, height, age, gender
=======

        if type(wav).__module__ == np.__name__:
            wav = torch.tensor(wav)
        
        # wav = self.spec_aug(self.spectral_transform(wav))/100
        # print(wav.min(), wav.max(), wav.mean(), wav.std())
        return wav, height, age, gender
>>>>>>> 9533cd02cfa4ddb9aee40945d53e3354b5d5d960

The prepare_nisp_data.py script is not working

The script under NISP/prepare_nisp_data.py has many lines that cannot be executed because the code has some errors. For example, some imports are missing. Additionally, the train_test_split line seems to have incorrect input parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.