Git Product home page Git Product logo

lyra's Introduction

Lyra: a generative low bitrate speech codec

What is Lyra?

Lyra is a high-quality, low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this it applies traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.

Overview

The basic architecture of the Lyra codec is quite simple. Features are extracted from speech every 20ms and are then compressed for transmission at a desired bitrate between 3.2kbps and 9.2kbps. On the other end, a generative model uses those features to recreate the speech signal.

Lyra harnesses the power of new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today.

Computational complexity is reduced by using a cheaper convolutional generative model called SoundStream, which enables Lyra to not only run on cloud servers, but also on-device on low-end phones in real time (with a processing latency of 20ms). This whole system is then trained end-to-end on thousands of hours of speech data with speakers in over 90 languages and optimized to accurately recreate the input audio.

Lyra is supported on Android, Linux, Mac and Windows.

Prerequisites

There are a few things you'll need to do to set up your computer to build Lyra.

Common setup

Lyra is built using Google's build system, Bazel. Install it following these instructions. Bazel verson 5.0.0 is required, and some Linux distributions may make an older version available in their application repositories, so make sure you are using the required version or newer. The latest version can be downloaded via Github.

You will also need python3 and numpy installed.

Lyra can be built from Linux using Bazel for an ARM Android target, or a Linux target, as well as Mac and Windows for native targets.

Android requirements

Building on android requires downloading a specific version of the android NDK toolchain. If you develop with Android Studio already, you might not need to do these steps if ANDROID_HOME and ANDROID_NDK_HOME are defined and pointing at the right version of the NDK.

  1. Download command line tools from https://developer.android.com/studio

  2. Unzip and cd to the directory

  3. Check the available packages to install in case they don't match the following steps.

    bin/sdkmanager  --sdk_root=$HOME/android/sdk --list

    Some systems will already have the java runtime set up. But if you see an error here like ERROR: JAVA_HOME is not set and no 'java' command could be found on your PATH., this means you need to install the java runtime with sudo apt install default-jdk first. You will also need to add export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 (type ls /usr/lib/jvm to see which path was installed) to your $HOME/.bashrc and reload it with source $HOME/.bashrc.

  4. Install the r21 ndk, android sdk 30, and build tools:

    bin/sdkmanager  --sdk_root=$HOME/android/sdk --install  "platforms;android-30" "build-tools;30.0.3" "ndk;21.4.7075529"
  5. Add the following to .bashrc (or export the variables)

    export ANDROID_NDK_HOME=$HOME/android/sdk/ndk/21.4.7075529
    export ANDROID_HOME=$HOME/android/sdk
  6. Reload .bashrc (with source $HOME/.bashrc)

Building

The building and running process differs slightly depending on the selected platform.

Building for Linux

You can build the cc_binaries with the default config. encoder_main is an example of a file encoder.

bazel build -c opt lyra/cli_example:encoder_main

You can run encoder_main to encode a test .wav file with some speech in it, specified by --input_path. The --output_dir specifies where to write the encoded (compressed) representation, and the desired bitrate can be specified using the --bitrate flag.

bazel-bin/lyra/cli_example/encoder_main --input_path=lyra/testdata/sample1_16kHz.wav --output_dir=$HOME/temp --bitrate=3200

Similarly, you can build decoder_main and use it on the output of encoder_main to decode the encoded data back into speech.

bazel build -c opt lyra/cli_example:decoder_main
bazel-bin/lyra/cli_example/decoder_main --encoded_path=$HOME/temp/sample1_16kHz.lyra --output_dir=$HOME/temp/ --bitrate=3200

Note: the default Bazel toolchain is automatically configured and likely uses gcc/libstdc++ on Linux. This should be satisfactory for most users, but will differ from the NDK toolchain, which uses clang/libc++. To use a custom clang toolchain on Linux, see toolchain/README.md and .bazelrc.

Building for Android

Android App

There is an example APK target called lyra_android_example that you can build after you have set up the NDK.

This example is an app with a minimal GUI that has buttons for two options. One option is to record from the microphone and encode/decode with Lyra so you can test what Lyra would sound like for your voice. The other option runs a benchmark that encodes and decodes in the background and prints the timings to logcat.

bazel build -c opt lyra/android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK
adb install bazel-bin/lyra/android_example/lyra_android_example.apk

After this you should see an app called "Lyra Example App".

You can open it, and you will see a simple TextView that says the benchmark is running, and when it finishes.

Press "Record from microphone", say a few words, and then press "Encode and decode to speaker". You should hear your voice being played back after being coded with Lyra.

If you press 'Benchmark', you should see something like the following in logcat on a Pixel 6 Pro when running the benchmark:

lyra_benchmark:  feature_extractor:  max: 1.836 ms  min: 0.132 ms  mean: 0.153 ms  stdev: 0.042 ms
lyra_benchmark: quantizer_quantize:  max: 1.042 ms  min: 0.120 ms  mean: 0.130 ms  stdev: 0.028 ms
lyra_benchmark:   quantizer_decode:  max: 0.103 ms  min: 0.026 ms  mean: 0.029 ms  stdev: 0.003 ms
lyra_benchmark:       model_decode:  max: 0.820 ms  min: 0.191 ms  mean: 0.212 ms  stdev: 0.031 ms
lyra_benchmark:              total:  max: 2.536 ms  min: 0.471 ms  mean: 0.525 ms  stdev: 0.088 ms

This shows that decoding a 50Hz frame (each frame is 20 milliseconds) takes 0.525 milliseconds on average. So decoding is performed at around 38 (20/0.525) times faster than realtime.

To build your own android app, you can either use the cc_library target outputs to create a .so that you can use in your own build system. Or you can use it with an android_binary rule within bazel to create an .apk file as in this example.

There is a tutorial on building for android with Bazel in the bazel docs.

Android command-line binaries

There are also the binary targets that you can use to experiment with encoding and decoding .wav files.

You can build the example cc_binary targets with:

bazel build -c opt lyra/cli_example:encoder_main --config=android_arm64
bazel build -c opt lyra/cli_example:decoder_main --config=android_arm64

This builds an executable binary that can be run on android 64-bit arm devices (not an android app). You can then push it to your android device and run it as a binary through the shell.

# Push the binary and the data it needs, including the model and .wav files:
adb push bazel-bin/lyra/cli_example/encoder_main /data/local/tmp/
adb push bazel-bin/lyra/cli_example/decoder_main /data/local/tmp/
adb push lyra/model_coeffs/ /data/local/tmp/
adb push lyra/testdata/ /data/local/tmp/

adb shell
cd /data/local/tmp
./encoder_main --model_path=/data/local/tmp/model_coeffs --output_dir=/data/local/tmp --input_path=testdata/sample1_16kHz.wav
./decoder_main --model_path=/data/local/tmp/model_coeffs --output_dir=/data/local/tmp --encoded_path=sample1_16kHz.lyra

The encoder_main/decoder_main as above should also work.

Building for Mac

You will need to install the XCode command line tools in addition to the prerequisites common to all platforms. XCode setup is a required step for using Bazel on Mac. See this guide for how to install XCode command line tools. Lyra has been built successfully using XCode 13.3.

You can follow the instructions in the Building for Linux section once this is completed.

Building for Windows

You will need to install Build Tools for Visual Studio 2019 in addition to the prerequisites common to all platforms. Visual Studio setup is a required step for building C++ for Bazel on Windows. See this guide for how to install MSVC. You may also need to install python 3 support, which is also described in the guide.

You can follow the instructions in the Building for Linux section once this is completed.

API

For integrating Lyra into any project only two APIs are relevant: LyraEncoder and LyraDecoder.

DISCLAIMER: At this time Lyra's API and bit-stream are not guaranteed to be stable and might change in future versions of the code.

On the sending side, LyraEncoder can be used to encode an audio stream using the following interface:

class LyraEncoder : public LyraEncoderInterface {
 public:
  static std::unique_ptr<LyraEncoder> Create(
      int sample_rate_hz, int num_channels, int bitrate, bool enable_dtx,
      const ghc::filesystem::path& model_path);

  std::optional<std::vector<uint8_t>> Encode(
      const absl::Span<const int16_t> audio) override;

  bool set_bitrate(int bitrate) override;

  int sample_rate_hz() const override;

  int num_channels() const override;

  int bitrate() const override;

  int frame_rate() const override;
};

The static Create method instantiates a LyraEncoder with the desired sample rate in Hertz, number of channels and bitrate, as long as those parameters are supported (see lyra_encoder.h for supported parameters). Otherwise it returns a nullptr. The Create method also needs to know if DTX should be enabled and where the model weights are stored. It also checks that these weights exist and are compatible with the current Lyra version.

Given a LyraEncoder, any audio stream can be compressed using the Encode method. The provided span of int16-formatted samples is assumed to contain 20ms of data at the sample rate chosen at Create time. As long as this condition is met the Encode method returns the encoded packet as a vector of bytes that is ready to be stored or transmitted over the network.

The bitrate can be dynamically modified using the set_bitrate setter. It returns true if the desired bitrate is supported and correctly set.

The rest of the LyraEncoder methods are just getters for the different predetermined parameters.

On the receiving end, LyraDecoder can be used to decode the encoded packet using the following interface:

class LyraDecoder : public LyraDecoderInterface {
 public:
  static std::unique_ptr<LyraDecoder> Create(
      int sample_rate_hz, int num_channels,
      const ghc::filesystem::path& model_path);

  bool SetEncodedPacket(absl::Span<const uint8_t> encoded) override;

  std::optional<std::vector<int16_t>> DecodeSamples(int num_samples) override;

  int sample_rate_hz() const override;

  int num_channels() const override;

  int frame_rate() const override;

  bool is_comfort_noise() const override;
};

Once again, the static Create method instantiates a LyraDecoder with the desired sample rate in Hertz and number of channels, as long as those parameters are supported. Else it returns a nullptr. These parameters don't need to be the same as the ones in LyraEncoder. And once again, the Create method also needs to know where the model weights are stored. It also checks that these weights exist and are compatible with the current Lyra version.

Given a LyraDecoder, any packet can be decoded by first feeding it into SetEncodedPacket, which returns true if the provided span of bytes is a valid Lyra-encoded packet.

Then the int16-formatted samples can be obtained by calling DecodeSamples. If there isn't a packet available, but samples still need to be generated, the decoder might switch to a comfort noise generation mode, which can be checked using is_comfort_noise.

The rest of the LyraDecoder methods are just getters for the different predetermined parameters.

For an example on how to use LyraEncoder and LyraDecoder to encode and decode a stream of audio, please refer to the integration test.

License

Use of this source code is governed by a Apache v2.0 license that can be found in the LICENSE file.

Papers

  1. Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018, April). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676-680). IEEE.
  2. Denton, T., Luebs, A., Chinen, M., Lim, F. S., Storus, A., Yeh, H., Kleijn, W. B., & Skoglund, J. (2020, November). Handling Background Noise in Neural Speech Generation. In 2020 54th Asilomar Conference on Signals, Systems, and Computers (pp. 667-671). IEEE.
  3. Kleijn, W. B., Storus, A., Chinen, M., Denton, T., Lim, F. S., Luebs, A., Skoglund, J., & Yeh, H. (2021, June). Generative speech coding with predictive variance regularization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6478-6482). IEEE.
  4. Zeghidour, N., Luebs, A., Omran, A., Skoglund, J., & Tagliasacchi, M. (2021). SoundStream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

lyra's People

Contributors

a-rose avatar aluebs avatar ewouth avatar jsoref avatar lilinxiong avatar mchinen avatar reekystive avatar siddhant-k-code avatar yeroro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lyra's Issues

packet_ loss_rate in decoder

What is the function of packet_ loss_rate in GilbertModel::Create ? Is it necessary? Does it need to be modified according to the actual situation?

encoded file size, audio quality and evaluation

Hi,

  1. I tried to encode the "Original" file from google ai Blog using the original sample rate 16k and also after changing the sample rate to 8k.
    The Original size at 16k sample-rate was 163.5kb and 81.8kb at 8k.
    when I encoded the files with lyra. Both files were 1.9kb.
    I'm wondering if this is just random? or lyra doesn't care about the source sample rate?

  2. When I decoded at a 16k rate the size of my lyra output was different from the decoded file at google ai Blog. of course the quality was different too.
    I would appreciate any explanation about that.

  3. How could I check lyra quality? I figured out that PESQ and POLQA will not give me the correct score. because the change in alignment and phase.

bazel build -c opt :encoder_main error on 20.04-Ubuntu

when use bazel build -c opt :encoder_main, some error has occur

ERROR: /home/w/lyra-main/BUILD:860:10: Compiling encoder_main.cc failed: undeclared inclusion(s) in rule '//:encoder_main':
this rule is missing dependency declarations for the following files included by 'encoder_main.cc':
'/usr/local/lib/clang/14.0.0/include/stddef.h'
'/usr/local/lib/clang/14.0.0/include/__stddef_max_align_t.h'
'/usr/local/lib/clang/14.0.0/include/stdarg.h'
'/usr/local/lib/clang/14.0.0/include/stdint.h'
'/usr/local/lib/clang/14.0.0/include/limits.h'
Target //:encoder_main failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 6.123s, Critical Path: 4.18s
INFO: 3 processes: 3 internal.
FAILED: Build did NOT complete successfully

Hearing artifacts in android example in Lyra

Tried Lyra android example on Pixel 4 but heard quite audible artifacts during decoding. There were frequent truncation instances too where I would only part of a sentence on playback with sudden cutoff.

Building with CMake

It would be nice if all the components of the software that is not already present in the distro on user's mavhine be buildable with CMake, and ones that are presente just linked.

Delay in decoder

Hi,
I was wondering about the decoder delay, I noticed that the decoded audio has an delay at the beginning compared to the original audio, I was wondering how long is it, 60ms or 80ms?is it a fixed delay?
Thanks.

quality is inferior to that advertised in the Lyra Google AI blog

I ran the samples published in the Lyra blog through the encoder_main and decoder_main binaries compiled from the open-sourced Lyra code (this repo). I compiled the floating point version of the code. I tried both the v0.0.1 and the latest commit, result is same. The decoded outputs (produced by running the original through encoder_main and then decoding the bit stream decoder_main) are attached:
lyra_reproduce_blog.zip

It is clear that the quality of the samples decoded from this code is inferior to the quality of the samples advertised in the Lyra blog.

Why is this? Weren't the samples in the blog produced by the same code?

Thank you in advance for clarifying.

bazel build -c opt :encoder_main error on 16.04.1-Ubuntu x86_64

when use bazel build -c opt :encoder_main, some error has occur

external/com_google_absl/absl/strings/str_cat.h: In function 'chromemedia::absl::lts_2020_09_23::strings_internal::AlphaNumBuffer<16ul> chromemedia::absl::lts_2020_09_23::SixDigits(double)':
external/com_google_absl/absl/strings/str_cat.h:401:64: error: 'struct chromemedia::absl::lts_2020_09_23::strings_internal::AlphaNumBuffer<16ul>' has no member named 'data'
result.size = numbers_internal::SixDigitsToBuffer(d, &result.data[0]);
^
wav_util.cc: At global scope:
wav_util.cc:22:22: error: expected '{' before '::' token
namespace chromemedia::codec {
^
wav_util.cc:22:24: error: 'codec' in namespace '::' does not name a type
namespace chromemedia::codec {
^
wav_util.cc:56:1: error: expected '}' at end of input
} // namespace chromemedia::codec
^
wav_util.cc:56:1: error: expected '}' at end of input
Target //:encoder_main failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.016s, Critical Path: 1.85s
INFO: 13 processes: 13 internal.
FAILED: Build did NOT complete successfully

"cannot find Foundation" error for Android example

System Information:
MacOS: 10.15.7
bazel 4.0.0-homebrew
ANDROID_NDK_HOME=$HOME/android/sdk/ndk/21.4.7075529
ANDROID_HOME=$HOME/android/sdk
Python 3.8.3
Apple clang version 12.0.0 (clang-1200.0.31.1), Target: x86_64-apple-darwin19.6.0
javac 1.8.0_181

external/androidndk/ndk/toolchains/aarch64-linux-android-4.9/prebuilt/darwin-x86_64/lib/gcc/aarch64-linux-android/4.9.x/../../../../aarch64-linux-android/bin/ld: cannot find Foundation: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //android_example:lyra_android_example failed to build

Are there any instructions tesaching me to use lyra in my own app with Android Studio step by step?

As readme.md says:
To build your own android app, you can either use the cc_library target outputs to create a .so that you can use in your own build system. Or you can use it with an android_binary rule within bazel to create an .apk file as in this example.
But I can't figure out how to creat the .so files I need and make them work with Android Studio.
Besides, I already have gradle in my project, so I don't want bazel participate in it.

The conditioning_only decoder.

What's the mean of conditioning_only decoder? I checked the source code but still not understand the different between conditioning_only and model_only . Has any explanation with them.

failure in building Lyra

I am building Lyra on Ubuntu 18.04.1 for Linux.
The bazel version that I have is 4.0.0 (although I have tried with 4.1.0 as well).
The gcc/g++ version that I have is 7.5.0.

The error appears to be that bazel is invoking gcc with -std=c++0x, yet calling in functionalies that are only supported starting C++17.

Any help would be appreciated.

In file included from layer_wrapper.h:29:0,
from conv1d_layer_wrapper.h:27,
from layer_wrappers_lib.h:21,
from causal_convolutional_conditioning.h:28,
from wavegru_model_impl.h:28,
from wavegru_model_impl.cc:15:
layer_wrapper_interface.h: At global scope:
layer_wrapper_interface.h:82:8: error: 'variant' in namespace 'std' does not name a template type
std::variant<FromDisk, FromConstant> from = FromDisk();

Questions on Lyra

First, kudos to Google Lyra team to open source this awesome technology!

Second, this post is not about issue about the project, I am writing this post to try to understand Lyra more.

  1. What is the optimal sample rate?

I played the encoder and decoder with audio data of different sample rates, the audio data is from testdata directory in source code.

// 8khz sample rate audio
-rw-rw-r-- 1 kxie kxie 130604 Jul 28 13:40 8khz_sample_000000_decoded.wav
-rw-rw-r-- 1 kxie kxie 1530 Jul 28 13:38 8khz_sample_000000.lyra
-rw-rw-r-- 1 kxie kxie 65708 Jul 28 13:25 8khz_sample_000000.wav

// 16khz sample rate audio
-rw-rw-r-- 1 kxie kxie 241964 Jul 28 13:41 16khz_sample_000001_decoded.wav
-rw-rw-r-- 1 kxie kxie 2835 Jul 28 13:38 16khz_sample_000001.lyra
-rw-rw-r-- 1 kxie kxie 241992 Jul 28 13:25 16khz_sample_000001.wav

// 32khz sample rate audio
-rw-rw-r-- 1 kxie kxie 161324 Jul 28 13:42 32khz_sample_000002_decoded.wav
-rw-rw-r-- 1 kxie kxie 1890 Jul 28 13:38 32khz_sample_000002.lyra
-rw-rw-r-- 1 kxie kxie 324140 Jul 28 13:25 32khz_sample_000002.wav

// 48khz sample rate audio
-rw-rw-r-- 1 kxie kxie 212524 Jul 28 13:42 48khz_sample_000003_decoded.wav
-rw-rw-r-- 1 kxie kxie 2490 Jul 28 13:38 48khz_sample_000003.lyra
-rw-rw-r-- 1 kxie kxie 638252 Jul 28 13:26 48khz_sample_000003.wav

From above, you can see the following,

  1. For 8khz sample rate audio, the size of decoded audio is about 2 times of the size of the original audio.
  2. For 16khz sample rate audio, the size of decoded audio is about same as the size of the original audio.
  3. For 32khz sample rate audio, the size of decoded audio is about half of the size of the original audio.
  4. For 48khz sample rate audio, the size of the decoded audio is about 1/3 of the size of the original audio.

So what is the optimal sample rate for input audio? Does that mean Lyra works best for audio of 8Khz sample rate?

2 Features are extracted from speech every 40ms and are then compressed for transmission at a bitrate of 3kbps.

Does this mean we have to signal the ptime as 40 in SDP?

3 This trick, plus 64-bit ARM optimisations, enables Lyra to not only run on cloud servers, but also on-device on mid-range phones, such as Pixel phones, in real time (with a processing latency of 100ms)

Lyra encodes frame of 40 ms, and the processing delay is 100 ms, does this mean it will take at least 140 ms to hear the first word on the receiver side?

Kudos to Lyra team again.

/Kaiduan

Document library architecture

It'd be nice if the lib had some docs (maybe in a form of a graph), describing its architecture at different levels of detail, including:

  • components present within the lib
  • their functionality, semantics and interfaces (not class interfaces, but what it consumes and produces)
  • components' interactions to other components and flows of data between them
  • possibilities to replace that component with something else, both other implementations of the same thing and implementations of completely different thing that may have similar effect
  • why each component is needed and if it is avoidable and impact of it being skipped or replaced

Compile Error

image
I encountered this error when compiling this project. Before compiling, I have installed other dependencies according to the instructions, and I have not encountered any problems, but when I execute this command (bazel build -c opt :encoder_main), an error occurs. Can you tell me the reason and how to fix it . I haven't used bazel before, so I don't know how to modify it, can you help me

Voice has a nasal sound in WebRTC p2p call

My subjective feeling: the voice has a nasal sound, I attached two files: lyra-speaking.wav is the audio file recorded from the mic who is speaking, lyra-listening.wav is the audio file recorded from the playout device that is listening.
Two devices are Google pixel 3(android 11)
Two settings files are audio processing parameters from WebRTC.
lyra-webrtc-audio-dump.zip

Looking for function definitions

I am not able to find few function definitions like -
GruWithARInput, SpMM_bias etc,
or for any other conv1D, GRU related computational function definitions.

I can see declarations in ".h" files, but not the definitions.

Are those definitions available in this repository,
or are the obj files being directly imported from other git repositories?

Where can I see code for those functions?

Thanks for support.

Building on Linux (Debian) without Android support

Hello,

When building on a Linux using the command given in README (bazel build -c opt :encoder_main), I get the following errors:

ERROR: /usr/local/src/lyra/WORKSPACE:121:1: name 'android_sdk_repository' is not defined
ERROR: /usr/local/src/lyra/WORKSPACE:128:1: name 'android_ndk_repository' is not defined

If I remove the lines related to android_sdk_repository and android_sdk_repository from the WORKSPACE file, I can build the encoder without issues.

I would like to find the correct way to deal with this and create a PR, but I don't know bazel, so I didn't find a way to ignore Android targets when building for Linux. Hopefully someone can fix it or point me to the right direction :)

edit: I'm using Debian in Docker, if that's any help:

FROM debian:bullseye-slim

WORKDIR /usr/local/src

RUN mkdir -p /usr/share/man/man1 \
    && apt-get update \
    && apt-get install -y \
        ninja-build git cmake clang python bazel-bootstrap

RUN git clone https://github.com/llvm/llvm-project.git \
    && cd llvm-project \
    && git checkout 96ef4f307df2 \
    && mkdir build_clang \
    && cd build_clang \
    && cmake -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_PROJECTS="clang" -DCMAKE_BUILD_TYPE=release ../llvm \
    && ninja \
    && $(which ninja) install \
    && cd .. \
    && mkdir build_libcxx \
    && cd build_libcxx \
    && cmake -G Ninja -DCMAKE_C_COMPILER=/usr/local/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/bin/clang++ -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi" -DCMAKE_BUILD_TYPE=release ../llvm \
    && ninja \
    && $(which ninja) install \
    && ldconfig

RUN git clone --depth 1 --branch v0.0.1 https://github.com/google/lyra \
    && cd lyra \
    && bazel build -c opt :encoder_main \
    && bazel build -c opt :decoder_main

quality issues

Hardware:
modest gaming laptop speaker running Win10 & Firefox

Example: demo blogpost

Issues:
Too much siblance (sharp s & ch & some other Transients)
Slow 'attack' at the start of some words (eg video of girl: M in "made", W for "What", & TH in "Think" have their initial loudness cutoff).
Overall EQ seems too bright

Pros:
Quality is easier to understand than Opus. & thanks for improving speech compression; so many times I had to manually tweak clients' talks' audio files to compress better.

I could do more deeper audio analysis if you need; I used to do audio DSP programming....

About the wavegru architecture

hi, I am little confused about the wavegru built in the code and I can't get the meaning of the rows and cols of the ar_to_gate layer and gru layer? It seems that the ar_to_gate and conditioning stack both have output with dim=3*1024. but the in_channel of gru layer is 1024... Is the model architecture different from the paper "Generative Speech Coding with Predictive Variance Regularization."?

Request: Investigate utilisation of versioned toolchains for improved hermeticity

In Bazel it's possible to force a specific compiler version with a custom toolchain that includes the download as part of the build process. For example, https://github.com/grailbio/bazel-toolchain.

I wonder if it's feasible to utilise something like this for this project, instead of requiring the host to have a specific clang version installed. Benefits could include more user friendly installation, seemless updating, and a tighter coupling between each commit and the required compiler version (the object files might even be built with the same toolchain to guarantee a match).

Not sure on the downsides, so that's why I'm making this issue a request to investigate.

MOS value is very low in WebRTC p2p call

I have integrated Lyra into WebRTC and tested it on two Pixel 3 phones(Android 11), the MOS(measured by Malden red box) of p2p call is only about 2.2, it can't reach the VoIP call standard, do we have a plan to optimize its quality?

Clang version to build llvm

When i use Clang 3.9.1 to build LLVM 12.0, it failed.
llvm-project/llvm/include/llvm/ADT/DenseMap.h:550:37: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange'
llvm-project/llvm/include/llvm/ADT/DenseMap.h:201:12: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange'
llvm-project/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp:811:11: error: no matching constructor for initialization of 'llvm::ValueEnumerator::MDRange'
Should i use which version of Clang to build llvm?

Lyra bitrate is way too high for a vocoder. How to reduce the bitrate?

Hi, I'm looking the code and trying to guess where to change to reduce the bitrate / change quantizers. There are 25 frames/s of 15 bytes, correct? Is there a way to change this without having to re-train the network?

3 kbit/s is way too high bitrate for a vocoder. State-of-the-art uses 1.6 kbit/s or less, for example, with LPCNet, or much less with AMBE, TWELP, codec2, or even 20 years old MELPe. For use in HF radios for example, 3 kbit/s it totally a no-go, way too high.

Is it possible to get in the range of 1.5 kbit/s with Lyra? Even with a degraded quality, having a 1 kbit/s option is important, otherwise all the "standard narrow-band" HF radio use cases are definitely lost.

use of undeclared identifier 'prefix_len'

Hi,i used

"bazel build android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK" 

to build apk for Android, but failed with following error:

external/com_google_glog/src/logging.cc:798:35: error: use of undeclared identifier 'prefix_len'
                        message + prefix_len);
                                  ^
1 error generated.

my bazel vesion is 4.0.0, and my system version is MacOS 11.3.1 .

could you help me? thank you

decoder was much slower than encoder

here is test log.

scguo@scguo-vm:~/Documents/audios$ ../lyra/bazel-bin/encoder_main --model_path=../lyra/wavegru --output_dir=./ --input_path=./12s.wav
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20210422 15:47:21.758337 22733 encoder_main_lib.cc:94] Elapsed seconds : 0
I20210422 15:47:21.758489 22733 encoder_main_lib.cc:95] Samples per second : 9.08575e+06
scguo@scguo-vm:~/Documents/audios$ ../lyra/bazel-bin/decoder_main --model_path=../lyra/wavegru --output_dir=./ --encoded_path=./12s.lyra
WARNING: Logging before InitGoogleLogging() is written to STDERR
W20210422 15:47:34.122872 22742 lyra_wavegru.h:80] lyra_wavegru running in slow generic mode.
I20210422 15:47:34.124763 22742 layer_wrapper.h:96] |lyra_16khz_ar_to_gates_| layer:  Shape: [3072, 4]. Sparsity: 0
I20210422 15:47:34.228018 22742 layer_wrapper.h:96] |lyra_16khz_gru_layer_| layer:  Shape: [3072, 1024]. Sparsity: 0.9375
I20210422 15:47:34.241358 22742 lyra_wavegru.h:226] Model size: 1271266 bytes
I20210422 15:47:34.241510 22742 wavegru_model_impl.cc:87] Feature size: 160
I20210422 15:47:34.241612 22742 wavegru_model_impl.cc:88] Number of samples per hop: 640
I20210422 15:47:34.246847 22742 layer_wrapper.h:96] |lyra_16khz_conv1d_| layer:  Shape: [512, 480]. Sparsity: 0.919987
I20210422 15:47:34.257881 22742 layer_wrapper.h:96] |lyra_16khz_conditioning_stack_0_| layer:  Shape: [512, 1024]. Sparsity: 0.920013
I20210422 15:47:34.268566 22742 layer_wrapper.h:96] |lyra_16khz_conditioning_stack_1_| layer:  Shape: [512, 1024]. Sparsity: 0.920013
I20210422 15:47:34.279476 22742 layer_wrapper.h:96] |lyra_16khz_conditioning_stack_2_| layer:  Shape: [512, 1024]. Sparsity: 0.920013
I20210422 15:47:34.289640 22742 layer_wrapper.h:96] |lyra_16khz_transpose_0_| layer:  Shape: [1024, 512]. Sparsity: 0.920013
I20210422 15:47:34.300124 22742 layer_wrapper.h:96] |lyra_16khz_transpose_1_| layer:  Shape: [1024, 512]. Sparsity: 0.920013
I20210422 15:47:34.310822 22742 layer_wrapper.h:96] |lyra_16khz_transpose_2_| layer:  Shape: [1024, 512]. Sparsity: 0.920013
I20210422 15:47:34.320670 22742 layer_wrapper.h:96] |lyra_16khz_conv_cond_| layer:  Shape: [1024, 512]. Sparsity: 0.920013
I20210422 15:47:34.389773 22742 layer_wrapper.h:96] |lyra_16khz_conv_to_gates_| layer:  Shape: [3072, 1024]. Sparsity: 0.919998
WARNING: Logging before InitGoogleLogging() is written to STDERR
W20210422 15:47:34.394865 22742 kernels_generic.h:241] SumVectors: using generic kernel!
I20210422 15:47:40.876049 22742 decoder_main_lib.cc:96] Elapsed seconds : 6
I20210422 15:47:40.876096 22742 decoder_main_lib.cc:97] Samples per second : 30309

encoder only takes 0 second, but decode takes 6 second.

Is the model and code of open source the same as the one used in the Lyra Google AI blog?

I built lyra from linux(Ubuntu 20.04) using bazel for a linux target. Then, I ran the samples published in the lyra blog through the encoder_main and decoder_main binaries compiled from the open-sourced lyra code (the v0.0.1 and the latest commit). However, the quality of the samples decoded from open source code is inferior to the quality of the samples advertised in the lyra blog.
Is the model and code of open source the same as the one used in the Lyra Google AI blog?Why is there such a big difference in sound quality?
Thank you.

android install problem

I am trying to build Android one.

So, I connected my labtop to the device. And ran.

bazel build android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK --verbose_failures --sandbox_debug

But this gives me this error. I don't know what's wrong.

external/androidndk/ndk/toolchains/aarch64-linux-android-4.9/prebuilt/darwin-x86_64/lib/gcc/aarch64-linux-android/4.9.x/../../../../aarch64-linux-android/bin/ld: cannot find Foundation: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //android_example:lyra_android_example failed to build
INFO: Elapsed time: 217.372s, Critical Path: 48.72s
INFO: 695 processes: 16 internal, 637 darwin-sandbox, 42 worker.
FAILED: Build did NOT complete successfully

Android no matching toolchains found for types @bazel_tools

It is ok when I run
bazel build -c opt :encoder_main
or
bazel build -c opt :decoder_main

however, when it comes to build an Android APP, the error occurs.

I'm sure that I have add the path of sdk and ndk to ~/.bashrc and reload it, and I can't unserstand what the bazel is showing about.

/////////////////////////////////////////////
ERROR: While resolving toolchains for target //android_example:lyra_android_example: no matching toolchains found for types @bazel_tools//tools/android:sdk_toolchain_type
ERROR: Analysis of target '//android_example:lyra_android_example' failed; build aborted: no matching toolchains found for types @bazel_tools//tools/android:sdk_toolchain_type
INFO: Elapsed time: 1.526s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (14 packages loaded, 24 targets configured)
//////////////////////////////////////////////

Fails to build on Ubuntu 16.04

Hi,

The latest code fails to build on Ubuntu 16.04 with bazel 4.1.0. The following is the output of

bazel build -c opt :encoder_main

Starting local Bazel server and connecting to it...
Loading:
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
currently loading:
Analyzing: target //:encoder_main (1 packages loaded, 0 targets configured)
Analyzing: target //:encoder_main (10 packages loaded, 6 targets configured)
Analyzing: target //:encoder_main (11 packages loaded, 6 targets configured)
Analyzing: target //:encoder_main (30 packages loaded, 123 targets configured)
Analyzing: target //:encoder_main (50 packages loaded, 329 targets configured)
WARNING: /home/kxie/.cache/bazel/_bazel_kxie/77ff6c6c1440df355d63577cfba068c9/external/com_google_audio_dsp/third_party/fft2d/BUILD:3:11: in linkstatic attribute of cc_library rule @com_google_audio_dsp//third_party/fft2d:fft2d: setting 'linkstatic=1' is recommended if there are no object files
Analyzing: target //:encoder_main (58 packages loaded, 1467 targets configured)
INFO: Analyzed target //:encoder_main (60 packages loaded, 1648 targets configured).
INFO: Found 1 target...
[2 / 5] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[215 / 448] Compiling src/google/protobuf/compiler/importer.cc; 1s linux-sandbox ... (4 actions running)
[217 / 448] Compiling src/google/protobuf/message.cc; 4s linux-sandbox ... (4 actions, 3 running)
ERROR: /home/kxie/lyra/lyra/BUILD:458:11: Compiling lyra_encoder.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 68 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from ./sparse_matmul/compute/gru_gates.h:28:0,
from sparse_matmul/sparse_matmul.h:21,
from dsp_util.h:24,
from lyra_encoder.cc:31:
./sparse_matmul/compute/matmul.h: In constructor 'csrblocksparse::MatmulBase::MatmulBase()':
./sparse_matmul/compute/matmul.h:49:55: error: '__get_cpuid_count' was not declared in this scope
__get_cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
^
In file included from lyra_encoder.cc:36:0:
lyra_config.h: At global scope:
lyra_config.h:55:44: error: 'chromemedia::codec::kSupportedSampleRates' declared as an 'inline' variable
inline constexpr int kSupportedSampleRates[] = {8000, 16000, 32000, 48000};
^
lyra_config.h:56:22: error: 'chromemedia::codec::kInternalSampleRateHz' declared as an 'inline' variable
inline constexpr int kInternalSampleRateHz = 16000;
^
lyra_config.h:57:22: error: 'chromemedia::codec::kNumQuantizationBits' declared as an 'inline' variable
inline constexpr int kNumQuantizationBits = 120;
^
Target //:encoder_main failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 25.939s, Critical Path: 8.56s
INFO: 7 processes: 5 internal, 2 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

The following is the output of

And the following is the output of lscpu command.

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 96
Model name: AMD A10-8700P Radeon R6, 10 Compute Cores 4C+6G
Stepping: 1
CPU MHz: 1296.898
CPU max MHz: 1800.0000
CPU min MHz: 1300.0000
BogoMIPS: 3593.16
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 96K
L2 cache: 1024K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov

Encode and decoder application

Hi ,

I would like to evaluate the performance of lyra codec, but I'm not able to build on redhat Linux machine(because I'm not finding the clang 3.5 rpm). Is it possible to provide the encoder and decoder binary?

fail to build lyra_android_example

I use the command " bazel build android_example:lyra_android_example --config=android_arm64 --copt=-DBENCHMARK" follow the README.md and got the error below:

WARNING: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/com_google_audio_dsp/third_party/fft2d/BUILD:3:11: in linkstatic attribute of cc_library rule @com_google_audio_dsp//third_party/fft2d:fft2d: setting 'linkstatic=1' is recommended if there are no object files INFO: Analyzed target //android_example:lyra_android_example (0 packages loaded, 0 targets configured). INFO: Found 1 target... ERROR: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/androidsdk/BUILD.bazel:13:25: Middleman _middlemen/external_Sandroidsdk_Saapt2_Ubinary-runfiles failed: missing input file 'external/androidsdk/build-tools/29.0.3/aapt2', owner: '@androidsdk//:build-tools/29.0.3/aapt2' Target //android_example:lyra_android_example failed to build Use --verbose_failures to see the command lines of failed build steps. ERROR: /private/var/tmp/_bazel_jiahong/91206591588ab49765e9be8ccee0dd3b/external/androidsdk/BUILD.bazel:13:25 Middleman _middlemen/external_Sandroidsdk_Saapt2_Ubinary-runfiles failed: 1 input file(s) do not exist INFO: Elapsed time: 5.936s, Critical Path: 5.68s INFO: 14 processes: 6 internal, 8 darwin-sandbox. FAILED: Build did NOT complete successfully


system:Mac
I installed Android Studio and installed the android sdk 29 and ndk 21 for requirement. How do I solve it ? Thanks !

Using in Python/Tensorflow

Hi,

I'd love to use Lyra to improve the quality of zoom calls i have recorded as mp3/mp4 files. How do I use this in python / tf to process audio files ? Any help would be appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.