audo-ai / magic-mic Goto Github PK

Open Source Noise Cancellation App for Virtual Meetings

JavaScript 7.49% C++ 57.99% CMake 13.47% Shell 0.84% Emacs Lisp 0.10% Rust 11.67% HTML 1.23% SCSS 3.83% Dockerfile 3.39%

machine-learning deep-learning audio microphone noise-reduction noise-cancellation

magic-mic's Introduction

Magic Mic

Realtime Audio Processing

This is the open source component of Magic Mic, an app created by the folks at audo.ai to provide easy access to a realtime version of our custom machine learning based noise removal tool. Just run the app, switch the microphone in whatever audio consuming app you are using (zoom, discord, google meet) to Magic Mic and you're off! This is still in early alpha and so is only available for pulseaudio on linux right now. This is still in active development and so bug fixes and new features will be coming, along with OS X and Windows support in the future.

You can get a prebuilt version of magic mic from our releases.

Usage

Using Magic Mic should pretty much be as simple as executing the AppImage. First download the AppImage from our releases then either make it executable in your file manager or run the following command in the terminal

chmod a+x /path/to/MagicMic.appimage

Then to run the AppImage you can either execute it from your file manager (for example by double clicking) or from the terminal

/path/to/MagicMic.appimage

We are still working on a more automated installation.

After executing the AppImage, Magic Mic should open and you should see an icon in your systray. From the magic mic window you can select your microphone, enable and disable denoising, and select your denoising engine. Once that is set up you can test that everythng is working by clicking on the "Mic Check" button. You can feel free to close the Magic Mic window whenever. It will continue to run in the background. If you did not move the Magic Mic AppImage, you can reopen the Magic Mic window by clicking on the icon in the systray and selecting "Open". You can also completely quit all of Magic Mic by clicking on "Quit."

To use Magic Mic in an app that listens to your microphone, all you have to do is configure the app to use "Magic Mic" as the listening microphone.

We don't have auto update implemeted yet, so please check back here every once in a while to check out new releases with new features!

Open Source

Our custom denoising model is proprietary. Only us at Audo can create builds using it as the denoising engine. If you would like to build Magic Mic yourself, we support using rnnoise as the Audio Processor.

Vision

We imagine Magic Mic providing a common interface for realtime audio processing on many platforms. We want Magic Mic to be an open source tool that enables all developers to build proprietary or free real time audio processing tools. Right now, implementing a custom audio processor is just a matter of writing some c++ (we should have a guide soon), but the interface is quite immature and we're still quite far from realizing this vision, so reach out if you want to help! Contact us in the github discussions tab or the issues.

Development

Structure

This project has esentially 3 components. First, there is the code in src-native which interacts with the audio system and actually creates the virtual microphone and does the denoising. Then there is the tauri code in src-tauri which deals with creating the system webview and interacting with the frontend code. The naming of these directories is somewhat misleading, because both the code in src-native and the code in src-tauri are compiled to native code. Additionaly, there is src-web which contains a create-react-app project which is displayed by tauri.

Building with Docker

Run

DOCKER_BUILDKIT=1 docker build --output . .

and the appimage should be copied into your working directory.

Building without Docker

Run

mkdir build
cd build
cmake -DVIRTMIC_ENGINE="PIPESOURCE" -DAUDIOPROC_CMAKES="$PWD/../src-native/RNNoiseAP.cmake" ..
make build_tauri

This should place an appimage in src-tauri/target/release/bundle/appimage.

PIPESOURCE is the only virtmic engine available at the moment; in the future this may change to support other platforms or if we implement other virtual microphones.

Custom Audio Processors

There is support for building with custom audio processors, and the documentation will be coming soon. For the meantime you could look at src-native/RNNoiseAP.cmake (and the files it references) as an example. Audio processors are added to the build by adding there cmake files to the semicolon seperated list AUDIOPROC_CMAKES when configuring the build.

magic-mic's People

Contributors

Stargazers

Watchers

magic-mic's Issues

Add support for RNN noise for open source build

Update denosier to Michael's newest code

Disable logging in production for now

For now we should just completely disable logging in prod. We can try to figure out a good way to maybe do some rotating log files in the future, but for now just have it off by default.
We should make sure you can renable them maybe with a MAGICMIC_LOGvariable

Proof of concept based on builtin modules

Simplest route might be to use a null-sink or virtual-sink and its monitor as the mic. I think we need to use the pulseaudio api (rather than eg. portaudio) so we can choose our mic and sink.

Give user feedback on errors

Indicate when the app needs to be restarted when the server dies or something.

Improve RPC

Right now the rpc interface is pretty confusing, brittle and poorly documented. I want to improve it. Right now I'm sort of using JSON-RPC but not really. I don't support concurrent/asynchronous requests and it's very prone to race conditions. The methods are poorly documented. I'd like to fix all of this in as easy a way as possible which would probably mean sticking mostly with the current implementation, but I'm definitely on the look out for a tiny rpc framework that would make this easier. That said, I do think that fixing these issues probably won't be too hard with the current implementation.

Latency problems but awesome

First of all, this is a terrific project! Very useful and easy to use.

In my first try all went ok (probably bacause the cpu consumption was low), but when I started a meeting using brave-browser the noise filter was disabled due to high latency. I've changed to lightweight filter but is not as good as yours is.

Amazing work, thank you for making it open source!

Pass torchscript path as cmdline arg

Integrate dbus control in module

Figure out how to control the module through dbus. For now this will just be enabling and disabling the denoising.

Figure out what to do about speaker display

Real Bidirectional Communication

Right now we're following JSON-RPC pretty closely for our ui -> server interactions, but I'm pretty sure json-rpc does not allow the server to send notifications to the client. This would be useful for us because sometimes the server makes changes that need to be reflected in the ui. Right now that is implemented using polling, but it might be better to actually just have the server send updates when it needs to. The basic infrastructure is already in the server (VirtualMicUpdate) but the ui doesn't support it. To get the ui to suport it we would need to have some sort of rust control over an event emitter in the javascript. I don't think tauri alpha has anything like this but the tauri beta might, so this might be blocked on #53.

Is 16kb 16000 or 16384?

Find out what we're using. It might not actually matter, but I think it would be good to be on the same page everywhere (talking about denoiser and me).

Make Instructions for how to use it in various apps

Show how to disable default noise removal, and set custom microphone

Check if pipesource is already loaded

We shouldn't try to load it if its already there, even though if its already there thats evidence of a failed unload

Implement GUI MVP

Starting out with electron-webpack, implementing Michael's figma design

What if Queue falls behind realtime?

Right now there is no consideration given in the module to what to do if the queue falls behind realtime. We need to check for this and respond accordingly (ie. drop samples, speed things up [by resampling?], or something else.

Close to system tray

Ideally, I think we would want it so that closing the app actually minimizes to tray and right click on tray has a quit option.

Implement RunningSTD

checkout something like (this)[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm]

Right now I'm just returning .2 which was the std of a random test file.

Figure out how to handle rewinds in the module

The module is a pretty smooth loopback right now, but whenever pulseaudio logs a rewind there is a pop. This is probably because I haven't handled rewinds (I don't really even understand what they are right now). Rewinds need to be handled correctly.

Verify performance issues are due to model

If so maybe look into using a better blas and lapack

Remove proprietary files

Apparently theyre not gone. Maybe just squash history

Update to Tauri beta

We are using the alpha, but tauri is now in beta. Apparently its a pretty awesome change, so we should get on that.

Look into module version checks

My default archlinux pulseaudio daemon has no problem loading a module with any PA_MODULE_VERSION set but my pulseaudio built from source refuses to load the module if the version is not MODULE_VERSION. I need to look into what is causing this (is it just a new feature?) and if it is a problem on actual platforms. If the version is checked on other platforms we can probably live patch the module based on pulseaudio --version but ideally that won't be necessary.

Detect if rust rpc thread dies

And show error to user

Allow project to be built without proprietary libdenoiser

Clean up UI a little

A little too much space on the edges/not vertically centered and personally I think the text seems a little big (even tho it was big in the mockup)

Update the readme with full build instructions

Pulseaudio can't load module when linked with denoiser

Right now the module can not load into pulseaudio when it is linked against the denosier. My guess is that it is some problem to do with dynamic linking, so I tried to build pytorch as a static library which I have not succeeded at yet. It also could be due to c++ name mangling not being handled properly somewhere.

I will try to update this issue with screenshots of the message when I get a chance.

Improve denoiser api

Right now its pretty naive. I don't think the feed/spew is the best interface. Too many buffers to keep track of. Better if it just processes one chunk of audio at a time, and keeps track of whatever additional context it needs. Should think about this before #36

Make sure server dies if parent dies

Figure out what to do with denoiser lib

Need to compile the model into it, but also should it be in a different repo? If so we need to scrub the history here or decide what to do about that

Deal with pipes blocking in pipesource-mvp

The module-pipesource-pipe fills up and we block on it. This is bad for many reasons including its implications on latency but right now the main problem is that it interferes with signal handling. It seems like c++ io doesn't give us enough control over this so we need to use lower level io. Shouldn't be a problem, I just need to do it and I'm to tired right now.

Redesign Audio Processor api

Make sure promises give errors on errors

Change default microphone on startup to default input mic

Optionally: Save last mic to a config file in xdg config folder (ie. ~/.config/magic-mic/config.json) and try setting it to that on bootup.

Set default input source correctly

Right now I think the source is set arbitrarily via pa_stream_connect_record(..., source, ...) where source is nullptr at startup (which pulseaudio defines as setting the source as the system pleases). In my system this sets it to my audio monitor (alsa_output.pci-0000_00_1f.3.analog-stereo.monitor). We should figure out how to set the default input to the default source on startup.

While super clunky (as with everything in pulseaudio xD), one option is to get this via pa_context_get_server_info which triggers a callback including the server info which includes the default source name:

void magic_mic_server_info_cb(pa_context *c, const pa_server_info *info, void *userdata) {
    (void) info->default_source_name;
}

Another option is to set source = "@DEFAULT_SOURCE@" on startup, but when the default source gets set to magic mic, then this would also change. So if we did this we would want to immediately read the current source and then reassign it to the actual name.

Alternatively, to ease this issue, we could simply hide monitors so in cases where users only have one non-monitor mic, the default is correct. For reference, the output of pactl list sources for me is:

Click to reveal

Source #0
	State: RUNNING
	Name: alsa_output.pci-0000_00_1f.3.analog-stereo.monitor
	Description: Monitor of Built-in Audio Analog Stereo
	Driver: module-alsa-card.c
	Sample Specification: s16le 2ch 48000Hz
	Channel Map: front-left,front-right
	Owner Module: 6
	Mute: no
	Volume: front-left: 63250 /  97% / -0.93 dB,   front-right: 63250 /  97% / -0.93 dB
	        balance 0.00
	Base Volume: 65536 / 100% / 0.00 dB
	Monitor of Sink: alsa_output.pci-0000_00_1f.3.analog-stereo
	Latency: 0 usec, configured 25000 usec
	Flags: DECIBEL_VOLUME LATENCY 
	Properties:
		device.description = "Monitor of Built-in Audio Analog Stereo"
		device.class = "monitor"
		alsa.card = "0"
		alsa.card_name = "HDA Intel PCH"
		alsa.long_card_name = "HDA Intel PCH at 0x94520000 irq 132"
		alsa.driver_name = "snd_hda_intel"
		device.bus_path = "pci-0000:00:1f.3"
		sysfs.path = "/devices/pci0000:00/0000:00:1f.3/sound/card0"
		device.bus = "pci"
		device.vendor.id = "8086"
		device.vendor.name = "Intel Corporation"
		device.product.id = "a171"
		device.product.name = "CM238 HD Audio Controller"
		device.form_factor = "internal"
		device.string = "0"
		module-udev-detect.discovered = "1"
		device.icon_name = "audio-card-pci"
	Formats:
		pcm

Source #1
	State: RUNNING
	Name: alsa_input.pci-0000_00_1f.3.analog-stereo
	Description: Built-in Audio Analog Stereo
	Driver: module-alsa-card.c
	Sample Specification: s16le 2ch 48000Hz
	Channel Map: front-left,front-right
	Owner Module: 6
	Mute: no
	Volume: front-left: 14389 /  22% / -39.51 dB,   front-right: 14389 /  22% / -39.51 dB
	        balance 0.00
	Base Volume: 6554 /  10% / -60.00 dB
	Monitor of Sink: n/a
	Latency: 8235 usec, configured 40000 usec
	Flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY 
	Properties:
		alsa.resolution_bits = "16"
		device.api = "alsa"
		device.class = "sound"
		alsa.class = "generic"
		alsa.subclass = "generic-mix"
		alsa.name = "ALC255 Analog"
		alsa.id = "ALC255 Analog"
		alsa.subdevice = "0"
		alsa.subdevice_name = "subdevice #0"
		alsa.device = "0"
		alsa.card = "0"
		alsa.card_name = "HDA Intel PCH"
		alsa.long_card_name = "HDA Intel PCH at 0x94520000 irq 132"
		alsa.driver_name = "snd_hda_intel"
		device.bus_path = "pci-0000:00:1f.3"
		sysfs.path = "/devices/pci0000:00/0000:00:1f.3/sound/card0"
		device.bus = "pci"
		device.vendor.id = "8086"
		device.vendor.name = "Intel Corporation"
		device.product.id = "a171"
		device.product.name = "CM238 HD Audio Controller"
		device.form_factor = "internal"
		device.string = "front:0"
		device.buffering.buffer_size = "352800"
		device.buffering.fragment_size = "176400"
		device.access_mode = "mmap+timer"
		device.profile.name = "analog-stereo"
		device.profile.description = "Analog Stereo"
		device.description = "Built-in Audio Analog Stereo"
		module-udev-detect.discovered = "1"
		device.icon_name = "audio-card-pci"
	Ports:
		analog-input-internal-mic: Internal Microphone (type: Mic, priority: 8900, availability unknown)
		analog-input-mic: Microphone (type: Mic, priority: 8700, not available)
	Active Port: analog-input-internal-mic
	Formats:
		pcm

Improve Latency

In pipesource-mvp there is a huge amount of latency. This is probably due in part to both denoiser and maybe something in pipesource app itself. Need to do more investigation to see if anything is coming from pipesource, but on the denoiser end here are some ideas to improve latency:

use prefilled audio buffer like in example python
always spew everything no matter what by zero padding

I'll add more info to this issue if I find more.

ConnectionRefused with RNNoise module

When running the RNNoise module, it consistently errors with:

[2021-04-29 10:17:18.535] [server] [info] Loading Audio Processor from /tmp/.mount_magic-2h02QS/usr/lib/magic-mic/native/runtime_libs/audioproc.so
[2021-04-29 10:17:18.536] [server] [error] Cannot load Audio Processor create symbol: /tmp/.mount_magic-2h02QS/usr/bin/server-x86_64-unknown-linux-gnu: undefined symbol: create
thread 'main' panicked at 'Failed to connect to socket; FIX THIS RACE CONDITON: Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }', src/main.rs:118:10
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Despite this, the Audo-AI module consistently works.

Version: eae88f0

Fix Docker builds

Right now the dockerfile setup is pretty annoying for a few reasons:

It relies on caching build stages for the build not to take forever. This works, but I'd rather not rely on it. Better if the caching were a bit more explicit and maybe a bit more extensive
It only builds from git, so you can't really test things locally without pushing them. That is pretty horrible

Add something like a force kill and unload option

unloads modules, kills server

Wait for getStatus to be true before making other requests

Detect High Latency

If load gets high disable denoising and notify user using maybe something from here maybe.

We already detect high latency in the audio processor queue (which shouldn't even exist in the first place #48) and when load gets bad latency tends to accumualate in the recording stream queue which can be detected by pa_stream_get_readable_size

Investigate Microphone Volume

For some reason, on @MatthewScholefield's computer the microphone volume gets set to 100%. Need to investigate why this may happen.

Update UI possibly with new logo

E2E server tests

Try setting up some e2e tests on the server on various distros. Not exactly sure how to go about this but it's worth looking in to.

Integrate pipesource-mvp with GUI

This should be the last step before a real MVP. I'm thinking maybe use rpc (grpc), but I'm not sure yet. Need to do some more investigating on that front.
Whatever we do might be a good idea to have a non electron intermediary which implements the rpc (or whatever) api and then does the platform specific calls itself. Probably easiest to not deal with that stuff within electorn.

Don't stream if no one is listening

Look into https://freedesktop.org/software/pulseaudio/doxygen/introspect_8h.html#a55f2fbae1ce6b964e65c82c848280c06

Figure out distribution

I need to figure out which shared libraries to ship with and how to package them.

I think I can just put them in resources and set LD_LIBRARY_PATH appropriately.

Which shared libraries to ship is a different question. Maybe start with everything, see how big that is and slim down from there?

Auto setup modules in pipesource-mvp

Get pipesource-mvp to automatically load the requried pusleaudio modules and clean them up on start and stop

Add linting and formatting to everything

Right now the style of everything is all over the place. I should really add linting and formatting for the c++, web stuff, and rust.