Git Product home page Git Product logo

ml4sa's Introduction

ML4SA

My curated list of machine learning-based techniques for sound art in creative media.

Audio Recognition

  • Audio recognition involves the identification and classification of audio signals or specific features within audio data.

  • Examples in Sound Art: Audio recognition can be used in sound art to analyze and classify sounds based on their content or characteristics. For instance, using machine learning techniques, sound artists can create installations that classify and respond to specific sound events, such as distinguishing between footsteps and bird chirping.

  • State-of-the-Art Methods: Popular machine learning libraries for audio recognition include:

    • Librosa: A Python library for music and audio analysis, providing features for audio preprocessing, feature extraction, and classification.

    • Tensorflow Audio: A library that extends TensorFlow with audio-specific functionalities, including audio preprocessing, feature extraction, and model building.

    • YAMNet: A deep learning model developed by Google that can recognize a wide range of audio events from 521 audio classes.

  • ML techniques explained

    • Convolutional Neural Networks (CNNs): CNNs are commonly used for audio recognition tasks such as speech recognition, music genre classification, and environmental sound classification. They excel at capturing local audio features by applying convolutional filters to spectrograms or raw audio waveforms.

    • Recurrent Neural Networks (RNNs): RNNs are suitable for tasks that require sequential modeling, such as speech recognition and music transcription. They can capture temporal dependencies in audio data by utilizing recurrent connections to maintain memory of past information.

    • Hidden Markov Models (HMMs): HMMs are statistical models often used in speech recognition. They represent audio as a sequence of states and use probability distributions to model the transitions between states.

    • Attention Mechanisms: Attention mechanisms help models focus on specific parts of an audio sequence, allowing them to capture important temporal dependencies. They have improved the performance of tasks like speech recognition and audio event detection.

    • Transformer Models: Transformer models, originally developed for natural language processing, have also been applied to audio recognition. They use self-attention mechanisms to capture long-range dependencies in audio signals, leading to state-of-the-art performance in tasks like music genre classification.

Audio Synthesis

  • Audio synthesis involves the generation or creation of new audio signals using various algorithms and techniques.
  • Examples in Sound Art: Sound artists can use audio synthesis to create unique and expressive sounds for their installations or compositions. They can generate abstract textures, atmospheric tones, or even imitate real-world sounds using synthesis techniques. For example, a sound artist may use granular synthesis to manipulate and transform recorded environmental sounds into a mesmerizing sonic landscape.
  • State-of-the-Art Methods: Notable machine learning libraries and models for audio synthesis include:
    • Magenta: An open-source project by Google that explores the intersection of machine learning and music generation. It provides models like MusicVAE and NSynth for generating new musical compositions and timbres.
    • WaveGAN: A Generative Adversarial Network (GAN)-based model that can generate high-quality audio samples. It can learn and mimic the distribution of real audio data to create realistic and diverse sounds.
  • ML techniques explained
    • Generative Adversarial Networks (GANs): GANs can be used for audio synthesis tasks, such as generating realistic instrument sounds or speech. The generator network learns to produce audio samples that are indistinguishable from real audio, while the discriminator network learns to differentiate between real and generated samples.
    • Variational Autoencoders (VAEs): VAEs are used for learning latent representations of audio data. They can generate new audio samples by sampling from the learned latent space, enabling tasks like music generation and voice cloning.
    • WaveNet: WaveNet is a deep generative model for audio synthesis. It models the raw waveform directly, allowing high-quality audio generation with realistic details and nuances.

Audio Transformation

  • Audio transformation involves modifying or manipulating audio signals to achieve desired effects or transformations.
  • Examples in Sound Art: Sound artists can use audio transformation techniques to alter and shape sound elements in their compositions or installations. They can apply effects like time stretching, pitch shifting, or spectral manipulation to create unique sonic experiences. For instance, a sound artist might use time stretching to slow down or stretch out a sound sample, creating an ethereal and atmospheric effect.
  • State-of-the-Art Methods: Some machine learning libraries and models for audio transformation include:
    • Essentia: An open-source library for audio analysis and transformation, providing a wide range of audio processing algorithms and tools.
    • Spleeter: A library developed by Deezer that uses deep learning to perform source separation, allowing users to separate vocals, drums, and other sound sources from mixed audio.
  • ML techniques explained
    • Time-domain and Frequency-domain Manipulation: Machine learning techniques can be used to manipulate audio signals in both the time domain and frequency domain.

      • Time Stretching and Pitch Shifting: These techniques modify the speed and pitch of audio signals without changing the overall content. They are commonly used in audio editing and music production.
      • Spectral Processing: Spectral processing techniques operate on the frequency domain representation of audio signals. They can be used for tasks like noise reduction, audio effects, and source separation.
    • Source Separation: Source separation techniques aim to separate individual sound sources from a mixture, such as isolating vocals from music. Deep learning models, including recurrent neural networks and convolutional neural networks, have been utilized to improve the accuracy and quality of source separation.

    • Audio Style Transfer: Style transfer techniques allow for the transformation of audio signals into different styles or genres. By leveraging machine learning models, it is possible to modify the characteristics of audio, such as changing the timbre or adding specific musical attributes.

Audio Related Packages

  • Total number of packages: 66

Read-Write

Transformations - General DSP

Feature extraction

Data augmentation

Speech Processing

Environmental Sounds

Perceptial Models - Auditory Models

Source Separation

  • commonfate :octocat: ๐Ÿ“ฆ - Common Fate Model and Transform.
  • NTFLib :octocat: - Sparse Beta-Divergence Tensor Factorization.
  • NUSSL :octocat: ๐Ÿ“ฆ - Holistic source separation framework including DSP methods and deep learning methods.
  • NIMFA :octocat: ๐Ÿ“ฆ - Several flavors of non-negative-matrix factorization.

Music Information Retrieval

  • Catchy :octocat: - Corpus Analysis Tools for Computational Hook Discovery.
  • Madmom :octocat: ๐Ÿ“ฆ - MIR packages with strong focus on beat detection, onset detection and chord recognition.
  • mir_eval :octocat: ๐Ÿ“ฆ - Common scores for various MIR tasks. Also includes bss_eval implementation.
  • msaf :octocat: ๐Ÿ“ฆ - Music Structure Analysis Framework.
  • librosa :octocat: ๐Ÿ“ฆ - General audio and music analysis.

Deep Learning

Symbolic Music - MIDI - Musicology

Realtime applications

  • Jupylet :octocat: - Subtractive, additive, FM, and sample-based sound synthesis.
  • PYO :octocat: - Realtime audio dsp engine.
  • python-sounddevice :octocat: ๐Ÿ“ฆ - PortAudio wrapper providing realtime audio I/O with NumPy.
  • ReTiSAR :octocat: - Binarual rendering of streamed or IR-based high-order spherical microphone array signals.

Web Audio

  • TimeSide (Beta) :octocat: - high level audio analysis, imaging, transcoding, streaming and labelling.

Audio Dataset and Dataloaders

Wrappers for Audio Plugins

References

ml4sa's People

Contributors

ktolcyd88800 avatar farahhuifanyang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.