Git Product home page Git Product logo

listen-chat-edit-on-edge's Introduction

Listen, Chat, And Edit on Edge: Text-Guided Soundscape Modification for Real-Time Auditory Experience

What is this project about?

Listen, Chat, and Edit (LCE) is a cutting-edge multimodal sound mixture editor designed to modify each sound source in a mixture based on user-provided text instructions. The system features a user-friendly chat interface and the unique ability to edit multiple sound sources simultaneously within a mixture without the need for separation. Using open-vocabulary text prompts interpreted by a large language model, LCE creates a semantic filter to edit sound mixtures, which are then decomposed, filtered, and reassembled into the desired output.

Project Structure

  • data/datasets: Contains the scripts used to process dataset and prompts.
  • demonstration: A demonstration of an input mixure and the edited version.
  • embeddings: The pkl file recieved from the LLM are stored in this folder.
  • hparams: Hyperparameters settings for the models.
  • llm_cloud: Configuration and scripts for cloud-based language model interactions.
  • modules: Core modules and utilities for the project.
  • prompts: Handling and processing of text prompts.
  • pubsub: Setup for publish-subscribe messaging patterns.
  • utils: Utility scripts for general purposes.
  • E6692.2022Spring.LCEE.ss6928.pkk2125.presentationFinal.pptx: Final presentation file detailing project overview and results.
  • profiling.ipynb: Jupyter notebook for profiling the modules in terms of inference speed and gpu memory usage.
  • run_lce.ipynb: Main executable notebook for the LCE system.
  • run_prompt_reader.ipynb: Notebook for reading and processing prompts.
  • run_prompt_reader_profiling.ipynb: Profiling for the prompt reader.
  • run_sound_editor_nosb.ipynb: Notebook for the sound editor module without SpeechBrain.

Installation

  1. Clone the repository:
    git clone https://github.com/SiavashShams/Listen-Chat-Edit-on-Edge.git
  2. Install required dependencies:
    pip install -r requirements.txt

Usage

To run the main LCE application:

run_lce.ipynb

For a demonstration of the system's capabilities, refer to the demonstration folder.

Implementation

  • Deploy Conv-TasNet on the Jetson Nano.
  • Deploy LLAMA 2 on a GCP server
  • Send a prompt to the server. Communication is handled in two methods - one, through SSH and the other, through Pub/Sub service.
  • LLM computed the embedding and publishes back the embedding, which is input to the Conv-TasNet model.
  • The resulting audio mixture is ready to be played!

Links

Presentation

Report

References

Thanks to the authors of Listen, Chat, And Edit for their amazing work.

listen-chat-edit-on-edge's People

Contributors

siavashshams avatar kpk101 avatar github-classroom[bot] avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.