Git Product home page Git Product logo

sonic's Introduction

Sonic is a simple algorithm for speeding up or slowing down speech.  However,
it's optimized for speed ups of over 2X, unlike previous algorithms for changing
speech rate.  The Sonic library is a very simple ANSI C library that is designed
to easily be integrated into streaming voice applications, like TTS back ends.

The primary motivation behind Sonic is to enable the blind and visually impaired
to improve their productivity with open source speech engines, like espeak.
Sonic can also be used by the sighted.  For example, Sonic can improve the
experience of listening to an audio book on an Android phone.

A native Java port of Sonic is in Sonic.java.  Main.java is a simple example of
how to use Sonic.java.  To play with it, you'll need a "talking.wav" file in the
current directory, and you'll want to change the speed, pitch or other
parameters manually in Main.java, in the main method.

Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
under the Apache 2.0 license, to promote usage as widely as possible.

Performance test:

I sped up a 751958176 byte wav file with sonic (a 9 hour, 28 minute mono audio
file encoded at 16-bit 11.KHz), but with the output writing disabled.  The
reported time, running Ubuntu 11.04 on my HP Pavilion dm4 laptop was:

real    0m50.839s
user    0m47.370s
sys     0m0.620s

The Java version is not much slower.  It reported:

real    0m52.043s
user    0m51.190s
sys     0m0.310s

Update, May 7, 2017
-------------------
I upgraded the pitch change algorithm to use a 12-point sinc FIR filter for
interpolation, rather than linearly interpolating between points.  This
significantly reduces noise introduced by the pitch change algorithm.  It is
most noticable in low-sample-rate streams, such as the 11,025 Hz output of the
Eloquence TTS engine.  The upgrade is in both the C and Java versions.


Author: Bill Cox
email: [email protected]

sonic's People

Contributors

abylouw avatar danielbair avatar drodsou avatar elelel avatar ftyghome avatar jcoffland avatar jwilk avatar malcolmslaney avatar nshmyrev avatar paulwoitaschek avatar sthibaul avatar waywardgeek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sonic's Issues

ArrayIndexOutOfBoundsException when changing pitch while playing

I get it randomly while stress-testing (every few hundred ms I change pitch between 1.0 and 2.0 while playing) ->

java.lang.ArrayIndexOutOfBoundsException: length=6916; index=6916  at com.github.waywardgeek.sonic.Sonic.interpolate(Sonic.java:828) at com.github.waywardgeek.sonic.Sonic.adjustRate(Sonic.java:867) at com.github.waywardgeek.sonic.Sonic.processStreamInput(Sonic.java:978) at com.github.waywardgeek.sonic.Sonic.writeBytesToStream(Sonic.java:1018)

Exception is thrown here:

value = in[inPos + i*numChannels]*weight;

I see that it is possible because buffer lengths boundaries for loops ignore SINC points here ->

for(position = 0; position < numPitchSamples - 1; position++) {

If I change:

for (position = 0; position < numPitchSamples - 1; position++) {

to

for (position = 0; position < numPitchSamples - 1 - SINC_FILTER_POINTS; position++) {

it works, but IDK how valid it is as I'm not an expert in audio maths.

Potential Integer Overflow at function `insertPitchPeriod`

Dear authors,
There exists a potential integer overflow at the function insertPitchPeriod at

sonic/sonic.c

Line 1056 in 8694c59

if (!enlargeOutputBufferIfNeeded(stream, period + newSamples)) {

caused by period + newSamples which can lead to an allocation error at sonic.c:465:37 enlargeOutputBufferIfNeeded.

sonic/sonic.c

Lines 460 to 469 in 8694c59

static int enlargeOutputBufferIfNeeded(sonicStream stream, int numSamples) {
int outputBufferSize = stream->outputBufferSize;
if (stream->numOutputSamples + numSamples > outputBufferSize) {
stream->outputBufferSize += (outputBufferSize >> 1) + numSamples;
stream->outputBuffer = (short*)sonicRealloc(
stream->outputBuffer,
outputBufferSize,
stream->outputBufferSize,
sizeof(short) * stream->numChannels);

When the sum overflows, the argument numSamples becomes a negative value.
The allocation function potentially fails because the if guard at sonic.c:463 fails to filter the value of outputBufferSize.

A possible fix suggestion would be adding an additional safety function and using it before calling the function.
For example,

size_t sonicSafeAdd(size_t a, size_t b) {
    size_t sum = a + b;
    if (sum >= SIZE_MAX || sum < a) {
        /// handle exit
    }
    return sum;
}

Could be used as

- enlargeOutputBufferIfNeeded(stream, (newSamples + period); 
+ enlargeOutputBufferIfNeeded(stream, (sonicSafeAdd(newSamples, period));

Thank you

Please improve the makefile

Several items that can be improved:

  • Don't hardcode gcc: you should use $(CC)
  • Every linking line needs $(LDFLAGS)
  • The executable could link against te shared library, not static library
  • No need to build/install the static library

Pkgsrc port, make the definition of PREFIX, CFLAGS, &c conditional

Hi, I'm making a pkgsrc port of sonic. Could you take a look to this small patch? It just preserve PREFIX. I let out the
implicit variables that can be overridden by Make (I edited the issue).

--- Makefile.orig	2024-01-30 07:54:18.202807498 +0000
+++ Makefile
@@ -11,11 +11,11 @@
 # speech recognition.
 #USE_SPECTROGRAM=1
 
-PREFIX=/usr
+PREFIX?=/usr
 
 UNAME := $(shell uname)
 ifeq ($(UNAME), Darwin)
-  PREFIX=/usr/local
+  PREFIX?=/usr/local
 endif
 
 BINDIR=$(PREFIX)/bin

sonic-patch-Makefile.patch

Build fails with -DSONIC_SPECTROGRAM: undefined symbol: sonicConvertSpectrogramToBitmap

===>  Building for libsonic-0.2.0.65
gmake[1]: Entering directory '/disk-samsung/freebsd-ports/audio/libsonic/work/sonic-release-0.2.0-65-gba33141'
cc -O2 -pipe -fno-omit-frame-pointer  -ansi -fPIC -pthread -Wno-unused-function -DSONIC_SPECTROGRAM -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing   -fstack-protector-strong -L/usr/local/lib  -o sonic wave.o main.o libsonic.a -lm 
cc: warning: argument unused during compilation: '-ansi' [-Wunused-command-line-argument]
ld: error: undefined symbol: sonicConvertSpectrogramToBitmap
>>> referenced by main.c
>>>               main.o:(main)

ld: error: undefined symbol: sonicWritePGM
>>> referenced by main.c
>>>               main.o:(main)

ld: error: undefined symbol: sonicDestroyBitmap
>>> referenced by main.c
>>>               main.o:(main)

ld: error: undefined symbol: sonicCreateSpectrogram
>>> referenced by sonic.c
>>>               sonic.o:(sonicComputeSpectrogram) in archive libsonic.a

ld: error: undefined symbol: sonicDestroySpectrogram
>>> referenced by sonic.c
>>>               sonic.o:(sonicDestroyStream) in archive libsonic.a

ld: error: undefined symbol: sonicAddPitchPeriodToSpectrogram
>>> referenced by sonic.c
>>>               sonic.o:(processStreamInput) in archive libsonic.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)

Numerical truncation bug in function `addFloatSamplesToInputBuffer`

Hi, I found that it could be a numerical truncation bug in

sonic/sonic.c

Line 519 in 8694c59

*buffer++ = (*samples++) * 32767.0f;

*samples could be out of range [-1.0, 1.0].

It should be guaranteed that the value of (*samples++) * 32767.0f is in the range [-32767, 32767], otherwise noise will be introduced. Just like here:

sonic/sonic.c

Lines 271 to 277 in 8694c59

value = (*samples * fixedPointVolume) >> 8;
if (value > 32767) {
value = 32767;
} else if (value < -32767) {
value = -32767;
}
*samples++ = value;

Thanks.

linear pitch scaling not working on m1 mac without errors

I'll preface this by sayingI am in no way familiar with c compilation or how to interpret errors or warnings. I completed what seemed like a successfull compilation cloning the repo and running "make", then I added the sonic executable to path. Everything seemed to go as planned, except that the outputs of the commands: "sonic -c -p 0.5 test.wav testsonic.wav" and "sonic -p 0.5 test.wav testsonic.wav" are identical. In other words, it doesn't scale pitch linearly even when asked for it, although when the -c flag is used the "Scaling pitch linearly." thing appears. I then tryed running "make install" to see if it was a problem with linking some kind of internal library(aka itself with sonic.o), but the output is the same. Here is the output of the command "gcc --version", again don't know anything about this so it was the first thing that came to mind:"""
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: arm64-apple-darwin23.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
"""
It probably came installed when I ran "xcode-select install" and I keep the dev tools regurarly updated from system settings, so I think you might be able to reproduce my environment pretty easily.
I tested it on another system and compared the outputs, they are the same as if I never specified the -c flag even though the program is obviously aware that's the case.
If it can help I could try cleaning everything up and recompiling it on a whole other machine rented from scaleway to post a clean output here.
Thanks!

Output seems to be slightly too fast

I am using sonic to speed up the audio of videos, and have been noticing that I am getting a slight audio/video desync. At 2.5x this seems to be around half a second for every half an hour of video (original length).

Example

$ youtube-dl -f bestaudio "https://www.youtube.com/watch?v=8sBYyrJDC0o" -o aud.webm
$ ffmpeg -loglevel panic -i aud.webm -f wav - | sox -t wav - -t wav slow.wav
$ sonic -s 2.5 slow.wav fast.wav

$ ffmpeg -i slow.wav 2>&1 | grep Duration | awk '{print $2}' | tr -d ,
00:31:00.10

$ ffmpeg -i fast.wav 2>&1 | grep Duration | awk '{print $2}' | tr -d ,
00:12:23.40

The first time is the length of the original audio, the second is the length of the audio sped up to 2.5x with sonic.

s = 31*60 +  0 + 10/60 = 1860.1666666666667
f = 12*60 + 23 + 40/60 = 0743.6666666666666
s/2.5                  = 0744.0666666666667
s/f                    = 2.5013446884805024

In this case the sped up audio was about 400ms shorter than what I was expecting- 743.6 seconds rather than 744-, and the speed up factor was about 2.501 rather than the specified 2.5.

Weird bug: Assertion failed: stream->newRatePosition != newSampleRate (For multiple pitch changes first and only for specific text at specific pitch)

Seems I'm not the only one with this problem:
#39

I will do Pitch 1 and the results are as expected:
dopus_Wpyw12wV8Q

I will do Pitch -1 this time and the console will show me: Assertion failed: stream->newRatePosition != newSampleRate
dopus_WHhNF3w9U1

I will do Pitch -1 again and change only one letter of a word and its fine again:
dopus_HEwG1opAhl

I generated the voice file over 10 times each, every time the same output.
For this fail to recreate I have to make several pitch changes first, and it only works for certain texts at certain pitches, which is weird.

Build failure on Mac

Undefined symbols for architecture x86_64:
"_fftw_destroy_plan", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
"_fftw_execute", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
"_fftw_plan_dft_r2c_1d", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
ld: symbol(s) not found for architecture x86_64

I had to run make with USE_SPECTROGRAM=0 to get it to build on Mac, but I don't know what functionality this will impact.

slowing down introduces silence at end

There seems to be a bug when a small sample, slowed down even more, will have a significant amount of silence (0s) added at end of output, that is not present in the input sample.
First is original "eh" sound, unmodified; you'll note it has no silence in the file.
image
Second is after running sonic C binary:

./sonic -s 0.060000002 eh.wav eh_slow.wav

You'll note that a large amount of silence (0 values from calloc I assume) has been introduced.
image
This does not seem right to me; I would not expect any silence at all to be introduced in the output sample.

A secondary point is also that I don't think the final output buffer that sonic creates has the correct duration, e.g., the sample is 0.053 seconds, and the slowdown factor is .060000002, which I'd expect would generate a sample of duration: 0.053 / 0.060000002 = 0.883333304, whereas the output sample is 0.732 seconds (including the zeros, without the zeros its even smaller).

I've attached the files for perusal/repro if you like. They have to be gzipped because github won't let me attach .wav files for some reason.
eh.wav.gz
eh_slow.wav.gz

3 bugs found using afl

1 SEGV bug

=================================================================
==20417==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000028 (pc 0x00000040853a bp 0x000000000000 sp 0x7fff7d053cc8 T0)
==20417==The signal is caused by a WRITE memory access.
==20417==Hint: address points to the zero page.
#0 0x408539 in sonicSetSpeed /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:285
#1 0x405e58 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:43
#2 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#3 0x7f538a7f5554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:285 in sonicSetSpeed
==20417==ABORTING

2 FPE bugs

=================================================================
==12489==ERROR: AddressSanitizer: FPE on unknown address 0x000000405ee5 (pc 0x000000405ee5 bp 0x60d000000040 sp 0x7ffe49cf7dc0 T0)
#0 0x405ee4 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:55
#1 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#2 0x7f64375d7554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#3 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE /root/Sec/Fuzzing/projects/sonic_asan/main.c:55 in runSonic
==12489==ABORTING

=================================================================
==12995==ERROR: AddressSanitizer: FPE on unknown address 0x000000407309 (pc 0x000000407309 bp 0x000000000000 sp 0x7ffcc9c9bd70 T0)
#0 0x407308 in findPitchPeriodInRange /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:778
#1 0x407308 in findPitchPeriod /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:822
#2 0x407308 in changeSpeed /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:1109
#3 0x407308 in processStreamInput /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:1158
#4 0x405f56 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:59
#5 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#6 0x7f280e242554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#7 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:778 in findPitchPeriodInRange
==12995==ABORTING

There is noise when playing at low speed

I'm using sonic with exoplayer
There is noise when playing at x0.5
It's a lot worse than what you hear on a macbook

Can you help me.

ExoPlayer

I replaced the old one with the latest sonic.java, and implemented the following methods

public void getOutput(ShortBuffer buffer) {
        int framesToRead = Math.min(buffer.limit() / numChannels, numOutputSamples);
        int size = framesToRead * numChannels;
        if (bufferToRead == null || bufferToRead.length < size) {
            bufferToRead = new short[size];
        }
        readShortFromStream(bufferToRead, framesToRead);
        buffer.put(bufferToRead, 0, size);
    }

    public void queueInput(ShortBuffer buffer) {
        int bufferSize = buffer.remaining();
        int framesToWrite = bufferSize / numChannels;
        if (bufferToWrite == null || bufferToWrite.length < buffer.limit()) {
            bufferToWrite = new short[buffer.limit()];
        }
        buffer.get(bufferToWrite, 0, buffer.limit());
        writeShortToStream(bufferToWrite, framesToWrite);
    }

    public void queueEndOfStream() {
        int remainingFrameCount = numInputSamples;
        float s = speed / pitch;
        float r = rate * pitch;
        int expectedOutputFrames =
                numOutputSamples + (int) ((remainingFrameCount / s + numPitchSamples) / r + 0.5f);

        // Add enough silence to flush both input and pitch buffers.
        enlargeInputBufferIfNeeded(remainingFrameCount + 2 * maxRequired);
        for (int xSample = 0; xSample < 2 * maxRequired * numChannels; xSample++) {
            inputBuffer[remainingFrameCount * numChannels + xSample] = 0;
        }
        numInputSamples += 2 * maxRequired;
        processStreamInput();
        // Throw away any extra frames we generated due to the silence we added.
        if (numOutputSamples > expectedOutputFrames) {
            numOutputSamples = expectedOutputFrames;
        }
        // Empty input and pitch buffers.
        numInputSamples = 0;
        this.remainingInputToCopy = 0;
        numPitchSamples = 0;
    }

    public int getOutputSize() {
        return numOutputSamples * numChannels * 2;
    }

Mp3 files for testing

w.mp3.zip

Removing Integer allocations from the Sonic Java port

Hi. I'm using Sonic in an Android application and it works great, much thanks for the library and for the Java port. I made a couple of small tweaks to avoid allocating Integers on what is a hot code path for me, and I'm wondering if you're interested in a pull request with my changes? Thanks.

sonic for python project

Hi
I wanted to use your algorithm for a app of mine. (Windows)

There is something called "sonic pi", is it related to your project?
If not, do you have any suggestions on how I could implement your algorithm in Python?
(Im searching for a solution without java)

I hope you can help me :D

How am I supposed to compile and use the library instead of the sonic program?

How am I supposed to compile and use the library instead of the sonic program?

I'm interested in computing a spectrogram.

I download, unzip, and then do:

make

After this I do:

gcc main.c -L. -lsonic

which fails to:


/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: /tmp/ccIksmQF.o: in function `runSonic':
main.c:(.text+0x77): undefined reference to `openInputWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0xe7): undefined reference to `openOutputWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0xfe): undefined reference to `closeWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x1e7): undefined reference to `readFromWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x271): undefined reference to `writeToWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x299): undefined reference to `closeWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x2ae): undefined reference to `closeWaveFile'
collect2: error: ld returned 1 exit status

Error after modifying pitch several times

I use Sonic.java
Sonic sonic = new Sonic(48000, 2);

public updatePitch(float p){
sonic.setPitch(p);
}

When I call ‘updatePitch’ multiple times,The program has stopped!I noticed that calling the ‘writeBytesToStream(byte[] inBuffer,
int numBytes)’ function didn't even get to the next step

Start and End Time

Hi, can we set start time and end time when playback with sonic library? Thank you.

Possible bug in pitchBuffer memory allocation

Hi,

Thank you for your work on this library.

I updated one of my libraries that use sonic and one of the changes was here:

sonic/sonic.c

Line 368 in e06dbb9

stream->pitchBufferSize = maxRequired + (maxRequired >> 1);

where my older version had:

stream->pitchBuffer = (short*)calloc(maxRequired, sizeof(short) * numChannels);

Now I am getting an invalid read memory error here. If I change the new version's line to:

stream->pitchBufferSize = maxRequired + (maxRequired >> 2);

to be the same as stream->inputBufferSize and stream->outputBufferSize then everything works again as expected.

Is this maybe a bug or am I possibly doing something wrong?

Library location

On 64 bit linux systems, the libs are to be installed within /usr/lib64/. Actually the libs are installed under /usr/lib

Waveform Concatenation

Thanks for your code. I want to add a simple interface for splicing two waveforms with reference to your code. Can you give me some hints?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.