Git Product home page Git Product logo

vocalpitchmodulator's Introduction

Vocal Pitch Modulator

A vocal pitch modulator that uses Machine Learning for realistic voice change. This is a project for NUS's CS4347 (Sound and Music Computing).

Motivation

The goal of vocal pitch modulation in this project is to maintain a “realistic” sound when the pitch is changed. When we use conventional modulation techniques, increasing the pitch of an audio file by more than around 3 semitones tends to make people sound like chipmunks, while lowering the pitch by more than around 3 semitones makes them sound demonic/dopey. However, there are people who speak at lower and higher pitches without sounding this way, so it is not simply the case that lower/higher pitched voices sound this way, but the spectral characteristics must be adjusted in a suitable manner to keep realism. As such, in this project, we wish to employ machine learning to adjust pitch without losing realism.

Repository Structure

Figures detailing the implementation of the Vocal Pitch Modulator can be found in the Documentation folder.

Tentatively, the API that is to be programmed can be found in VPM.py.

The dataset we are training our Artificial Neural Networks with can be found in the Data/dataset folder. Inside the Data folder, you will also find the list of files along with the relevant labels in dataset_files.csv. You will also be able to find the Jupyter Notebooks that were used to generate the dataset, and the raw file list, but we are not including the raw files in this repository, so these will not be for use, but for reference.

System Pipeline

The following is the proposed modulation pipeline:
Modulation Pipeline

Data Organization

The following illustrates the organization of our data.
Data List
Data-Label Pairs

Training Pipeline

The following illustrates the training pipelines for the encoder and decoders. There are two possible timbre-extrating neural nets we attempted. The first is a classifier which takes MFCC and tries to output the vowel label, while the other is an autoencoder which takes an MFCC, reduces it from 20 to 4 dimensions, and attempts to reconstruct the original MFCC. Encoder Training Pipeline TimbreVAE Training Pipeline

This is the proposed decoder that makes use of the Timbre encoder: Decoder Training Pipeline

vocalpitchmodulator's People

Contributors

racheltanxueqi avatar zioul123 avatar zachary-feng avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.