ML4SA

My curated list of machine learning-based techniques for sound art in creative media.

Audio Recognition

Audio recognition involves the identification and classification of audio signals or specific features within audio data.
Examples in Sound Art: Audio recognition can be used in sound art to analyze and classify sounds based on their content or characteristics. For instance, using machine learning techniques, sound artists can create installations that classify and respond to specific sound events, such as distinguishing between footsteps and bird chirping.
State-of-the-Art Methods: Popular machine learning libraries for audio recognition include:
- Librosa: A Python library for music and audio analysis, providing features for audio preprocessing, feature extraction, and classification.
- Tensorflow Audio: A library that extends TensorFlow with audio-specific functionalities, including audio preprocessing, feature extraction, and model building.
- YAMNet: A deep learning model developed by Google that can recognize a wide range of audio events from 521 audio classes.
ML techniques explained
- Convolutional Neural Networks (CNNs): CNNs are commonly used for audio recognition tasks such as speech recognition, music genre classification, and environmental sound classification. They excel at capturing local audio features by applying convolutional filters to spectrograms or raw audio waveforms.
- Recurrent Neural Networks (RNNs): RNNs are suitable for tasks that require sequential modeling, such as speech recognition and music transcription. They can capture temporal dependencies in audio data by utilizing recurrent connections to maintain memory of past information.
- Hidden Markov Models (HMMs): HMMs are statistical models often used in speech recognition. They represent audio as a sequence of states and use probability distributions to model the transitions between states.
- Attention Mechanisms: Attention mechanisms help models focus on specific parts of an audio sequence, allowing them to capture important temporal dependencies. They have improved the performance of tasks like speech recognition and audio event detection.
- Transformer Models: Transformer models, originally developed for natural language processing, have also been applied to audio recognition. They use self-attention mechanisms to capture long-range dependencies in audio signals, leading to state-of-the-art performance in tasks like music genre classification.

Audio Synthesis

Audio synthesis involves the generation or creation of new audio signals using various algorithms and techniques.
Examples in Sound Art: Sound artists can use audio synthesis to create unique and expressive sounds for their installations or compositions. They can generate abstract textures, atmospheric tones, or even imitate real-world sounds using synthesis techniques. For example, a sound artist may use granular synthesis to manipulate and transform recorded environmental sounds into a mesmerizing sonic landscape.
State-of-the-Art Methods: Notable machine learning libraries and models for audio synthesis include:
- Magenta: An open-source project by Google that explores the intersection of machine learning and music generation. It provides models like MusicVAE and NSynth for generating new musical compositions and timbres.
- WaveGAN: A Generative Adversarial Network (GAN)-based model that can generate high-quality audio samples. It can learn and mimic the distribution of real audio data to create realistic and diverse sounds.
ML techniques explained
- Generative Adversarial Networks (GANs): GANs can be used for audio synthesis tasks, such as generating realistic instrument sounds or speech. The generator network learns to produce audio samples that are indistinguishable from real audio, while the discriminator network learns to differentiate between real and generated samples.
- Variational Autoencoders (VAEs): VAEs are used for learning latent representations of audio data. They can generate new audio samples by sampling from the learned latent space, enabling tasks like music generation and voice cloning.
- WaveNet: WaveNet is a deep generative model for audio synthesis. It models the raw waveform directly, allowing high-quality audio generation with realistic details and nuances.

Audio Transformation

Audio transformation involves modifying or manipulating audio signals to achieve desired effects or transformations.
Examples in Sound Art: Sound artists can use audio transformation techniques to alter and shape sound elements in their compositions or installations. They can apply effects like time stretching, pitch shifting, or spectral manipulation to create unique sonic experiences. For instance, a sound artist might use time stretching to slow down or stretch out a sound sample, creating an ethereal and atmospheric effect.
State-of-the-Art Methods: Some machine learning libraries and models for audio transformation include:
- Essentia: An open-source library for audio analysis and transformation, providing a wide range of audio processing algorithms and tools.
- Spleeter: A library developed by Deezer that uses deep learning to perform source separation, allowing users to separate vocals, drums, and other sound sources from mixed audio.
ML techniques explained
- Time-domain and Frequency-domain Manipulation: Machine learning techniques can be used to manipulate audio signals in both the time domain and frequency domain.
  - Time Stretching and Pitch Shifting: These techniques modify the speed and pitch of audio signals without changing the overall content. They are commonly used in audio editing and music production.
  - Spectral Processing: Spectral processing techniques operate on the frequency domain representation of audio signals. They can be used for tasks like noise reduction, audio effects, and source separation.
- Source Separation: Source separation techniques aim to separate individual sound sources from a mixture, such as isolating vocals from music. Deep learning models, including recurrent neural networks and convolutional neural networks, have been utilized to improve the accuracy and quality of source separation.
- Audio Style Transfer: Style transfer techniques allow for the transformation of audio signals into different styles or genres. By leveraging machine learning models, it is possible to modify the characteristics of audio, such as changing the timbre or adding specific musical attributes.

Audio Related Packages

Contents

Total number of packages: 66

Read-Write

audiolazy 📦 - Expressive Digital Signal Processing (DSP) package for Python.
audioread 📦 - Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
mutagen 📦 - Reads and writes all kind of audio metadata for various formats.
pyAV - PyAV is a Pythonic binding for FFmpeg or Libav.
(Py)Soundfile 📦 - Library based on libsndfile, CFFI, and NumPy.
pySox 📦 - Wrapper for sox.
stempeg 📦 - read/write of STEMS multistream audio.
tinytag 📦 - reading music meta data of MP3, OGG, FLAC and Wave files.

Transformations - General DSP

acoustics 📦 - useful tools for acousticians.
AudioTK - DSP filter toolbox (lots of filters).
AudioTSM 📦 - real-time audio time-scale modification procedures.
Gammatone - Gammatone filterbank implementation.
pyFFTW 📦 - Wrapper for FFTW(3).
NSGT 📦 - Non-stationary gabor transform, constant-q.
matchering 📦 - Automated reference audio mastering.
MDCT 📦 - MDCT transform.
pydub 📦 - Manipulate audio with a simple and easy high level interface.
pytftb - Implementation of the MATLAB Time-Frequency Toolbox.
pyroomacoustics 📦 - Room Acoustics Simulation (RIR generator)
PyRubberband 📦 - Wrapper for rubberband to do pitch-shifting and time-stretching.
PyWavelets 📦 - Discrete Wavelet Transform in Python.
Resampy 📦 - Sample rate conversion.
SFS-Python 📦 - Sound Field Synthesis Toolbox.
sound_field_analysis 📦 - Analyze, visualize and process sound field data recorded by spherical microphone arrays.
STFT 📦 - Standalone package for Short-Time Fourier Transform.

Feature extraction

aubio 📦 - Feature extractor, written in C, Python interface.
audioFlux 📦 - A library for audio and music analysis, feature extraction.
audiolazy 📦 - Realtime Audio Processing lib, general purpose.
essentia - Music related low level and high level feature extractor, C++ based, includes Python bindings.
python_speech_features 📦 - Common speech features for ASR.
pyYAAFE - Python bindings for YAAFE feature extractor.
speechpy 📦 - Library for Speech Processing and Recognition, mostly feature extraction for now.
spafe 📦 - Python library for features extraction from audio files.

Data augmentation

audiomentations 📦 - Audio Data Augmentation.
muda 📦 - Musical Data Augmentation.
pydiogment 📦 - Audio Data Augmentation.

Speech Processing

aeneas 📦 - Forced aligner, based on MFCC+DTW, 35+ languages.
deepspeech 📦 - Pretrained automatic speech recognition.
gentle - Forced-aligner built on Kaldi.
Parselmouth 📦 - Python interface to the Praat phonetics and speech analysis, synthesis, and manipulation software.
persephone 📦 - Automatic phoneme transcription tool.
pyannote.audio 📦 - Neural building blocks for speaker diarization.
pyAudioAnalysis² 📦 - Feature Extraction, Classification, Diarization.
py-webrtcvad 📦 - Interface to the WebRTC Voice Activity Detector.
pypesq - Wrapper for the PESQ score calculation.
pystoi 📦 - Short Term Objective Intelligibility measure (STOI).
PyWorldVocoder - Wrapper for Morise's World Vocoder.
Montreal Forced Aligner - Forced aligner, based on Kaldi (HMM), English (others can be trained).
SIDEKIT 📦 - Speaker and Language recognition.
SpeechRecognition 📦 - Wrapper for several ASR engines and APIs, online and offline.

Environmental Sounds

sed_eval 📦 - Evaluation toolbox for Sound Event Detection

Perceptial Models - Auditory Models

cochlea 📦 - Inner ear models.
Brian2 📦 - Spiking neural networks simulator, includes cochlea model.
Loudness - Perceived loudness, includes Zwicker, Moore/Glasberg model.
pyloudnorm - Audio loudness meter and normalization, implements ITU-R BS.1770-4.
Sound Field Synthesis Toolbox 📦 - Sound Field Synthesis Toolbox.

Source Separation

commonfate 📦 - Common Fate Model and Transform.
NTFLib - Sparse Beta-Divergence Tensor Factorization.
NUSSL 📦 - Holistic source separation framework including DSP methods and deep learning methods.
NIMFA 📦 - Several flavors of non-negative-matrix factorization.

Music Information Retrieval

Catchy - Corpus Analysis Tools for Computational Hook Discovery.
Madmom 📦 - MIR packages with strong focus on beat detection, onset detection and chord recognition.
mir_eval 📦 - Common scores for various MIR tasks. Also includes bss_eval implementation.
msaf 📦 - Music Structure Analysis Framework.
librosa 📦 - General audio and music analysis.

Deep Learning

Kapre 📦 - Keras Audio Preprocessors
TorchAudio - PyTorch Audio Loaders
nnAudio 📦 - Accelerated audio processing using 1D convolution networks in PyTorch.

Symbolic Music - MIDI - Musicology

Music21 📦 - Toolkit for Computer-Aided Musicology.
Mido 📦 - Realtime MIDI wrapper.
mingus 📦 - Advanced music theory and notation package with MIDI file and playback support.
Pretty-MIDI 📦 - Utility functions for handling MIDI data in a nice/intuitive way.

Realtime applications

Jupylet - Subtractive, additive, FM, and sample-based sound synthesis.
PYO - Realtime audio dsp engine.
python-sounddevice 📦 - PortAudio wrapper providing realtime audio I/O with NumPy.
ReTiSAR - Binarual rendering of streamed or IR-based high-order spherical microphone array signals.

Web Audio

TimeSide (Beta) - high level audio analysis, imaging, transcoding, streaming and labelling.

Audio Dataset and Dataloaders

beets 📦 - Music library manager and MusicBrainz tagger.
musdb 📦 - Parse and process the MUSDB18 dataset.
medleydb - Parse medleydb audio + annotations.
Soundcloud API 📦 - Wrapper for Soundcloud API.
Youtube-Downloader 📦 - Download youtube videos (and the audio).
audiomate 📦 - Loading different types of audio datasets.
mirdata 📦 - Common loaders for Music Information Retrieval (MIR) datasets.

Wrappers for Audio Plugins

VamPy Host 📦 - Interface compiled vamp plugins.

farahhuifanyang / ml4sa Goto Github PK

ml4sa's Introduction

ML4SA

Audio Recognition

Audio Synthesis

Audio Transformation

Audio Related Packages

Read-Write

Transformations - General DSP

Feature extraction

Data augmentation

Speech Processing

Environmental Sounds

Perceptial Models - Auditory Models

Source Separation

Music Information Retrieval

Deep Learning

Symbolic Music - MIDI - Musicology

Realtime applications

Web Audio

Audio Dataset and Dataloaders

Wrappers for Audio Plugins

References

ml4sa's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org