Git Product home page Git Product logo

msc-vt-thesis's Introduction

MSc-VT-Thesis

This repository contains the scripts and CSV files that I used throughout my master's thesis for training and testing ASR models, as well as analyzing the datasets. The master's I completed is the M.Sc. Voice Technology at Rijksuniversiteit Groningen - Campus Fryslân in June 2023.

The experiments were conducted on Mozilla's Common Voice project. The version used is 8.0, and the language is Frisian, marked as fy-NL in the metadata.

Link to thesis

https://campus-fryslan.studenttheses.ub.rug.nl/360/

Prerequisites

  • Python 3.9.6 (this is the version used in experiments and guaranteed to work, other 3.9.x versions may also work, as well as newer ones such as 3.10, but I have not experimented with those)

Installing dependencies

The scripts are contained within Jupyter Notebook files (.ipynb). Therefore, Jupyter Notebook must be installed. In a high-performance computing (HPC) cluster setting (Linux environment), this can be done by first creating a Python virtual environment:

python -m venv name_of_venv

Then, activate the newly created environment:

source /path/to/name_of_venv/bin/activate

in Windows:

/path/to/name_of_venv/Scripts/activate

If you want to install Jupyter and the dependencies from all notebooks, you can run:

pip install -r requirements.txt

If not, then you can first install Jupyter Notebook:

pip install jupyter

Then install only the dependencies you need by running the first cell of the notebook you are editing.

Usage

Simply run a Jupyter instance:

jupyter notebook

Then access the notebooks via the browser.

Files

  • The .ipynb files that start with XLS_R are the training scripts I used throughout my research
    • XLS_R_fine_tune_train_from_hf.ipynb corresponds to the script used for experiment 1
    • XLS_R_fine_tune_local_train.ipynb corresponds to the scripts used for experiments 2-7
  • evaluation.ipynb is the file used for evaluating the models
  • data_split_analysis.ipynb is the file used for analyzing the datasets, as well as generating the splits for experiments 2-7
  • Inside cv-corpus-8.0-2022-01-19/fy-NL you can find the .csv splits of experiments 3, 4, and 5 (10 hours, 1 hour, and 10 minutes of training data respectively), along with statistics grouped by age and sex of the speaker (marked by _stats.csv at the end)

Hardware setup

The most important part to make the training and evaluation notebooks run without memory overflow issues is the GPU and the video RAM that it contains. The experiments were conducted on an Nvidia A100 GPU accelerator card with 40 GB of VRAM. There should also be a minimum amount of 20 GB of RAM (NOT VRAM) available. It is recommended to run them either through Google Colab (and a Pro subscription is a must to have access to the better GPUs) or through a high-performance cluster, if possible. If not, certain settings could be adjusted in order to allow the notebooks to not crash due to memory overflow.

The time required for training the models varies depending on the amount of data used and the hyperparameters related to checkpointing and the number of epochs. For testing, the time it takes is ~3.5 hours (using the test set from Common Voice 8.0) for XLS-R with 1 billion parameters.

Research output

Here is a list of the XLS-R models fine-tuned throughout my thesis that were uploaded to Hugging Face (if the number of parameters is not mentioned, then it is 1B parameters):

  • Experiment 1 (train split of CV 8.0, 5 hours of data, training time: 4 hours)
  • Experiment 2 (41 hours of data, training time: 22 hours)
  • Experiment 3 (10 hours of data, training time: 7.5 hours)
  • Experiment 4 (1 hour of data, training time: 1.5 hours)
  • Experiment 5 (10 minutes of data, training time: 45 minutes)
  • Experiment 6 (41 hours of data, XLS-R with 300M parameters, training time: 16 hours)
  • Experiment 7 (41 hours of data, XLS-R with 2B parameters, training time: 1 day)

Contact

if you have any suggestions, feel free to create a PR with your changes and I'll review it as soon as possible.

If you have any questions, open an issue. Disclaimer: the more time it will pass, the less able I will be to respond to questions as I will most likely forget details. I apologize for that :)

msc-vt-thesis's People

Contributors

greenw0lf avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.