Git Product home page Git Product logo

fasttune's Introduction

FastTune

FastTune is an AI tuner for guitar, ukulele, bass, banjo, mandolin, violin and etc.

It utilizes the transformer-based tuneNN network model for abstract timbre modeling, supporting tuning for 12+ instrument types.

Instrument Surpport String More
Guitar 6-string, 4-string, 7-string, 12-string 30+ tune-mode; acoustic, electric, overdriven and etc
Ukulele 4-string 4+ tune-mode
Bass 4-string, 5-string 18+ tune-mode; acoustic, electric, slap and etc
Banjo 4-string 3+ tune-mode
Mandolin 8-string standard tune-mode
Violin/Viola/Cello 4-string standard tune-mode

The online experience based on web audio and wasm, See the site here

Feature

  • accurate  For all the mentioned instruments, it can accurately identify the low, mid, and high-frequency regions, with a frequency range from 30Hz to 2000Hz. For each stage of a musical note, from onset to decay, the model output is combined with real-time tracking and correction using the Wigner-Ville distribution. It can accurately reflect the changes in pitch and subtle fluctuations in each stage of the ADSR of the musical note, with intonation errors within a range of 0.5% of a musical semitone.
  • fast   Quick response, nearly approaching the endpoints and onset stages of musical notes, faster by 100ms-200ms compared to most algorithms. Swift responses to string plucking, string cutting, and fret releasing.
  • smooth    String twisting tuning, smooth and seamless, accurate and responsive to pitch bending. Continuous tuning, finger movements, fast and authentic, resembling the heartbeat pulse. Every pluck of the fingers provides precise and real-time feedback.

open in online experience

Principle

For pitch tracking across various instruments, there are currently common issues with latency and accuracy.

  • latency issue   Some instruments exhibit strong structural resonances at the onset, and the interference from these resonant peaks varies across instrument categories. The intensity and duration of the resonance peaks differ as well. Taking the guitar as an example, it shows strong resonance peak interference, even surpassing the fundamental frequency of the timbre. The duration of this interference is long, reaching 200ms or more. Strong resonance peaks pose significant challenges to recognition. Many algorithms adopt strategies such as correlating probabilities with consecutive packets before and after or introducing latency to reduce the risk of resonance peak interference.
  • accuracy issue  Due to the complexity of instrument timbre, achieving 100% accurate recognition is extremely challenging. This difficulty arises from significant variations in the timbre at different stages of a musical note. It is challenging to accurately identify data for each stage of the ADSR (Attack, Decay, Sustain, Release) process from the beginning to the end of a musical note. The issue is further complicated by the wide range of frequencies, with extreme differences in timbre between very low and very high frequencies.

Classic Algorithm

For pitch estimation, there are many related algorithms, mainly based on frequency domain processing or autocorrelation processing. We have conducted numerous evaluations and comparisons using the following algorithms:

  • PEF - A pitch estimation filter is designed, and pitch is estimated by performing cross-correlation operations in the frequency domain.
  • NCF - Pitch is estimated using normalized time-domain autocorrelation.
  • HPS/LHS - Pitch is estimated by adopting dot or sum operations on the harmonics of the spectrum.
  • CEP - Pitch is estimated by performing a second FFT transformation on the spectrum and using cepstral analysis.
  • YIN - Pitch is estimated using time-domain differential autocorrelation.

The command line tools:

# You need to install the AudiFlux package: pip install audioflux
$ ./pitch.py -p pef -r 32000 -i test.wav -o test.txt

-p, --pitch, select pitch detection algorithm, include pef|ncf|hps|lhs|cep|yin
-r, --samplate, select samplerate
-h, --help, all parameter information

Considering the characteristics of the above algorithms, for low-frequency pitch estimation of musical instruments (below 55Hz), most algorithms, except HPS/LHS, are ineffective. However, HPS/LHS performs relatively poorly in high-frequency pitch estimation (above 1000Hz). All algorithms face challenges of strong resonant peak interference and misjudgments. PEF shows better performance in handling pitch-related resonant peaks. In scenarios with slightly stronger background noise, the latency of all algorithms rapidly increases.

FastTune Architecture

open in online experience

In order to better address the two issues mentioned above, FastTune is designed with three modules, and their specific relationships are as follows:

  • track module -   Responsible for identifying entry points, updating, and terminating loops. Manages mutual communication between the feature module and network module, as well as real-time tracking and correction using Wigner-Ville distribution (WVD).
  • feature module -   Extracting feature peaks, harmonics, endpoints, STFT spectrum, Bark spectrum, cepstrum, CQHC, etc.
  • network module -   TNN deep learning network model, responsible for identification.

TuneNN

The timbre of musical notes is the result of various combinations and transformations of harmonic relationships, harmonic strengths and weaknesses, instrument resonant peaks, and structural resonant peaks over time.

  • STFT spectrum, the most primitive spectrum, can accurately reflect the harmonic relationships and strengths of harmonics in musical notes.
  • Bark spectrum, more accurate than Mel spectrum in accordance with psychoacoustic perception of the human ear, is a nonlinear compression of the STFT spectrum. It belongs to a psychoacoustic abstraction feature that focuses on the harmonic relationships and strengths.
  • Cepstrum, the envelope characteristics of instrument resonant peaks.
  • CQHC, MFCC features are designed to address pitch variations in speech. Based on CQT, CQCC can better reflect instrument resonant peaks and structural resonant peaks, while CQHC, using a deconvolution approach, yields more prominent results compared to CQCC.

1D value and 2D time transformer processed with sliding adjacent windows.

Specific feature extraction can be referred to in featureExtract.py, and the model structure can be referred to in tuneNN.py.

open in online experience

Dataset

Due to the diversity of instruments and the complexity of timbres, for the guitar instrument, we collected 300+ different brands and price-range guitars, comprising a training and testing dataset of 100GB. The brands include Yamaha, Fender, Gibson, Guild, Taylor, Martin, Enya, Rosen, Mosen, Weibo, Saga, Kama, and more, totaling 50+ brands with different price ranges from 10$ to 5000$.

A robust dataset is crucial for training network models. Our goal is not only to share the code but also actively preparing to publicly release the training and testing datasets. While we cannot currently commit to a specific release date, please be assured that the intention to provide access to the dataset is firm.

Thank you for your understanding and continued support.

fasttune's People

Contributors

liufeigit avatar liweidong avatar wtq2255 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

eranshapira

fasttune's Issues

No Citations

The premise for creating this model seems interesting but there are no citations to back up the claims. Could you add citations to the readme?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.