waywardgeek / sonic Goto Github PK
View Code? Open in Web Editor NEWSimple library to speed up or slow down speech
License: Apache License 2.0
Simple library to speed up or slow down speech
License: Apache License 2.0
Sonic is a simple algorithm for speeding up or slowing down speech. However, it's optimized for speed ups of over 2X, unlike previous algorithms for changing speech rate. The Sonic library is a very simple ANSI C library that is designed to easily be integrated into streaming voice applications, like TTS back ends. The primary motivation behind Sonic is to enable the blind and visually impaired to improve their productivity with open source speech engines, like espeak. Sonic can also be used by the sighted. For example, Sonic can improve the experience of listening to an audio book on an Android phone. A native Java port of Sonic is in Sonic.java. Main.java is a simple example of how to use Sonic.java. To play with it, you'll need a "talking.wav" file in the current directory, and you'll want to change the speed, pitch or other parameters manually in Main.java, in the main method. Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved. It is released under the Apache 2.0 license, to promote usage as widely as possible. Performance test: I sped up a 751958176 byte wav file with sonic (a 9 hour, 28 minute mono audio file encoded at 16-bit 11.KHz), but with the output writing disabled. The reported time, running Ubuntu 11.04 on my HP Pavilion dm4 laptop was: real 0m50.839s user 0m47.370s sys 0m0.620s The Java version is not much slower. It reported: real 0m52.043s user 0m51.190s sys 0m0.310s Update, May 7, 2017 ------------------- I upgraded the pitch change algorithm to use a 12-point sinc FIR filter for interpolation, rather than linearly interpolating between points. This significantly reduces noise introduced by the pitch change algorithm. It is most noticable in low-sample-rate streams, such as the 11,025 Hz output of the Eloquence TTS engine. The upgrade is in both the C and Java versions. Author: Bill Cox email: [email protected]
I get it randomly while stress-testing (every few hundred ms I change pitch between 1.0 and 2.0 while playing) ->
java.lang.ArrayIndexOutOfBoundsException: length=6916; index=6916 at com.github.waywardgeek.sonic.Sonic.interpolate(Sonic.java:828) at com.github.waywardgeek.sonic.Sonic.adjustRate(Sonic.java:867) at com.github.waywardgeek.sonic.Sonic.processStreamInput(Sonic.java:978) at com.github.waywardgeek.sonic.Sonic.writeBytesToStream(Sonic.java:1018)
Exception is thrown here:
Line 829 in 71c5119
I see that it is possible because buffer lengths boundaries for loops ignore SINC points here ->
Line 865 in 71c5119
If I change:
for (position = 0; position < numPitchSamples - 1; position++) {
to
for (position = 0; position < numPitchSamples - 1 - SINC_FILTER_POINTS; position++) {
it works, but IDK how valid it is as I'm not an expert in audio maths.
In doc/index.md this url is referenced: http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html
The last cached version I found is this:
https://web.archive.org/web/20120731100136/http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html
Slighly off topic, the sonic.c source mentions 'AMDF', could you spell this out once (e.g. in the doc/index.md) - just found out after searching more that it is "Average Magnitude Difference Function".
Thanks!
Dear authors,
There exists a potential integer overflow at the function insertPitchPeriod
at
Line 1056 in 8694c59
caused by period + newSamples
which can lead to an allocation error at sonic.c:465:37 enlargeOutputBufferIfNeeded
.
Lines 460 to 469 in 8694c59
When the sum overflows, the argument numSamples
becomes a negative value.
The allocation function potentially fails because the if guard at sonic.c:463
fails to filter the value of outputBufferSize
.
A possible fix suggestion would be adding an additional safety function and using it before calling the function.
For example,
size_t sonicSafeAdd(size_t a, size_t b) {
size_t sum = a + b;
if (sum >= SIZE_MAX || sum < a) {
/// handle exit
}
return sum;
}
Could be used as
- enlargeOutputBufferIfNeeded(stream, (newSamples + period);
+ enlargeOutputBufferIfNeeded(stream, (sonicSafeAdd(newSamples, period));
Thank you
Several items that can be improved:
Hi, I'm making a pkgsrc port of sonic. Could you take a look to this small patch? It just preserve PREFIX. I let out the
implicit variables that can be overridden by Make (I edited the issue).
--- Makefile.orig 2024-01-30 07:54:18.202807498 +0000
+++ Makefile
@@ -11,11 +11,11 @@
# speech recognition.
#USE_SPECTROGRAM=1
-PREFIX=/usr
+PREFIX?=/usr
UNAME := $(shell uname)
ifeq ($(UNAME), Darwin)
- PREFIX=/usr/local
+ PREFIX?=/usr/local
endif
BINDIR=$(PREFIX)/bin
===> Building for libsonic-0.2.0.65
gmake[1]: Entering directory '/disk-samsung/freebsd-ports/audio/libsonic/work/sonic-release-0.2.0-65-gba33141'
cc -O2 -pipe -fno-omit-frame-pointer -ansi -fPIC -pthread -Wno-unused-function -DSONIC_SPECTROGRAM -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fstack-protector-strong -L/usr/local/lib -o sonic wave.o main.o libsonic.a -lm
cc: warning: argument unused during compilation: '-ansi' [-Wunused-command-line-argument]
ld: error: undefined symbol: sonicConvertSpectrogramToBitmap
>>> referenced by main.c
>>> main.o:(main)
ld: error: undefined symbol: sonicWritePGM
>>> referenced by main.c
>>> main.o:(main)
ld: error: undefined symbol: sonicDestroyBitmap
>>> referenced by main.c
>>> main.o:(main)
ld: error: undefined symbol: sonicCreateSpectrogram
>>> referenced by sonic.c
>>> sonic.o:(sonicComputeSpectrogram) in archive libsonic.a
ld: error: undefined symbol: sonicDestroySpectrogram
>>> referenced by sonic.c
>>> sonic.o:(sonicDestroyStream) in archive libsonic.a
ld: error: undefined symbol: sonicAddPitchPeriodToSpectrogram
>>> referenced by sonic.c
>>> sonic.o:(processStreamInput) in archive libsonic.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Hi, I found that it could be a numerical truncation bug in
Line 519 in 8694c59
*samples
could be out of range [-1.0, 1.0].
It should be guaranteed that the value of (*samples++) * 32767.0f
is in the range [-32767, 32767], otherwise noise will be introduced. Just like here:
Lines 271 to 277 in 8694c59
Thanks.
I'll preface this by sayingI am in no way familiar with c compilation or how to interpret errors or warnings. I completed what seemed like a successfull compilation cloning the repo and running "make", then I added the sonic executable to path. Everything seemed to go as planned, except that the outputs of the commands: "sonic -c -p 0.5 test.wav testsonic.wav" and "sonic -p 0.5 test.wav testsonic.wav" are identical. In other words, it doesn't scale pitch linearly even when asked for it, although when the -c flag is used the "Scaling pitch linearly." thing appears. I then tryed running "make install" to see if it was a problem with linking some kind of internal library(aka itself with sonic.o), but the output is the same. Here is the output of the command "gcc --version", again don't know anything about this so it was the first thing that came to mind:"""
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: arm64-apple-darwin23.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
"""
It probably came installed when I ran "xcode-select install" and I keep the dev tools regurarly updated from system settings, so I think you might be able to reproduce my environment pretty easily.
I tested it on another system and compared the outputs, they are the same as if I never specified the -c flag even though the program is obviously aware that's the case.
If it can help I could try cleaning everything up and recompiling it on a whole other machine rented from scaleway to post a clean output here.
Thanks!
I am using sonic to speed up the audio of videos, and have been noticing that I am getting a slight audio/video desync. At 2.5x this seems to be around half a second for every half an hour of video (original length).
$ youtube-dl -f bestaudio "https://www.youtube.com/watch?v=8sBYyrJDC0o" -o aud.webm
$ ffmpeg -loglevel panic -i aud.webm -f wav - | sox -t wav - -t wav slow.wav
$ sonic -s 2.5 slow.wav fast.wav
$ ffmpeg -i slow.wav 2>&1 | grep Duration | awk '{print $2}' | tr -d ,
00:31:00.10
$ ffmpeg -i fast.wav 2>&1 | grep Duration | awk '{print $2}' | tr -d ,
00:12:23.40
The first time is the length of the original audio, the second is the length of the audio sped up to 2.5x with sonic.
s = 31*60 + 0 + 10/60 = 1860.1666666666667
f = 12*60 + 23 + 40/60 = 0743.6666666666666
s/2.5 = 0744.0666666666667
s/f = 2.5013446884805024
In this case the sped up audio was about 400ms shorter than what I was expecting- 743.6 seconds rather than 744-, and the speed up factor was about 2.501 rather than the specified 2.5.
Seems I'm not the only one with this problem:
#39
I will do Pitch 1 and the results are as expected:
I will do Pitch -1 this time and the console will show me: Assertion failed: stream->newRatePosition != newSampleRate
I will do Pitch -1 again and change only one letter of a word and its fine again:
I generated the voice file over 10 times each, every time the same output.
For this fail to recreate I have to make several pitch changes first, and it only works for certain texts at certain pitches, which is weird.
Undefined symbols for architecture x86_64:
"_fftw_destroy_plan", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
"_fftw_execute", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
"_fftw_plan_dft_r2c_1d", referenced from:
_sonicAddPitchPeriodToSpectrogram in spectrogram.o
ld: symbol(s) not found for architecture x86_64
I had to run make with USE_SPECTROGRAM=0 to get it to build on Mac, but I don't know what functionality this will impact.
There seems to be a bug when a small sample, slowed down even more, will have a significant amount of silence (0s) added at end of output, that is not present in the input sample.
First is original "eh" sound, unmodified; you'll note it has no silence in the file.
Second is after running sonic C binary:
./sonic -s 0.060000002 eh.wav eh_slow.wav
You'll note that a large amount of silence (0 values from calloc I assume) has been introduced.
This does not seem right to me; I would not expect any silence at all to be introduced in the output sample.
A secondary point is also that I don't think the final output buffer that sonic creates has the correct duration, e.g., the sample is 0.053
seconds, and the slowdown factor is .060000002
, which I'd expect would generate a sample of duration: 0.053 / 0.060000002 = 0.883333304
, whereas the output sample is 0.732 seconds (including the zeros, without the zeros its even smaller).
I've attached the files for perusal/repro if you like. They have to be gzipped because github won't let me attach .wav files for some reason.
eh.wav.gz
eh_slow.wav.gz
1 SEGV bug
=================================================================
==20417==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000028 (pc 0x00000040853a bp 0x000000000000 sp 0x7fff7d053cc8 T0)
==20417==The signal is caused by a WRITE memory access.
==20417==Hint: address points to the zero page.
#0 0x408539 in sonicSetSpeed /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:285
#1 0x405e58 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:43
#2 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#3 0x7f538a7f5554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:285 in sonicSetSpeed
==20417==ABORTING
2 FPE bugs
=================================================================
==12489==ERROR: AddressSanitizer: FPE on unknown address 0x000000405ee5 (pc 0x000000405ee5 bp 0x60d000000040 sp 0x7ffe49cf7dc0 T0)
#0 0x405ee4 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:55
#1 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#2 0x7f64375d7554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#3 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE /root/Sec/Fuzzing/projects/sonic_asan/main.c:55 in runSonic
==12489==ABORTING
=================================================================
==12995==ERROR: AddressSanitizer: FPE on unknown address 0x000000407309 (pc 0x000000407309 bp 0x000000000000 sp 0x7ffcc9c9bd70 T0)
#0 0x407308 in findPitchPeriodInRange /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:778
#1 0x407308 in findPitchPeriod /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:822
#2 0x407308 in changeSpeed /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:1109
#3 0x407308 in processStreamInput /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:1158
#4 0x405f56 in runSonic /root/Sec/Fuzzing/projects/sonic_asan/main.c:59
#5 0x4015f2 in main /root/Sec/Fuzzing/projects/sonic_asan/main.c:184
#6 0x7f280e242554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#7 0x401a0b (/root/Sec/Fuzzing/projects/sonic_asan/sonic+0x401a0b)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: FPE /root/Sec/Fuzzing/projects/sonic_asan/sonic.c:778 in findPitchPeriodInRange
==12995==ABORTING
I have this question asked in StackOverFlow which trying to solver based on your library , can you provide some help pliz :)
I'm using sonic with exoplayer
There is noise when playing at x0.5
It's a lot worse than what you hear on a macbook
Can you help me.
I replaced the old one with the latest sonic.java, and implemented the following methods
public void getOutput(ShortBuffer buffer) {
int framesToRead = Math.min(buffer.limit() / numChannels, numOutputSamples);
int size = framesToRead * numChannels;
if (bufferToRead == null || bufferToRead.length < size) {
bufferToRead = new short[size];
}
readShortFromStream(bufferToRead, framesToRead);
buffer.put(bufferToRead, 0, size);
}
public void queueInput(ShortBuffer buffer) {
int bufferSize = buffer.remaining();
int framesToWrite = bufferSize / numChannels;
if (bufferToWrite == null || bufferToWrite.length < buffer.limit()) {
bufferToWrite = new short[buffer.limit()];
}
buffer.get(bufferToWrite, 0, buffer.limit());
writeShortToStream(bufferToWrite, framesToWrite);
}
public void queueEndOfStream() {
int remainingFrameCount = numInputSamples;
float s = speed / pitch;
float r = rate * pitch;
int expectedOutputFrames =
numOutputSamples + (int) ((remainingFrameCount / s + numPitchSamples) / r + 0.5f);
// Add enough silence to flush both input and pitch buffers.
enlargeInputBufferIfNeeded(remainingFrameCount + 2 * maxRequired);
for (int xSample = 0; xSample < 2 * maxRequired * numChannels; xSample++) {
inputBuffer[remainingFrameCount * numChannels + xSample] = 0;
}
numInputSamples += 2 * maxRequired;
processStreamInput();
// Throw away any extra frames we generated due to the silence we added.
if (numOutputSamples > expectedOutputFrames) {
numOutputSamples = expectedOutputFrames;
}
// Empty input and pitch buffers.
numInputSamples = 0;
this.remainingInputToCopy = 0;
numPitchSamples = 0;
}
public int getOutputSize() {
return numOutputSamples * numChannels * 2;
}
Hi. I'm using Sonic in an Android application and it works great, much thanks for the library and for the Java port. I made a couple of small tweaks to avoid allocating Integers on what is a hot code path for me, and I'm wondering if you're interested in a pull request with my changes? Thanks.
Hi
I wanted to use your algorithm for a app of mine. (Windows)
There is something called "sonic pi", is it related to your project?
If not, do you have any suggestions on how I could implement your algorithm in Python?
(Im searching for a solution without java)
I hope you can help me :D
Where are the processed files?
How am I supposed to compile and use the library instead of the sonic program?
I'm interested in computing a spectrogram.
I download, unzip, and then do:
make
After this I do:
gcc main.c -L. -lsonic
which fails to:
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: /tmp/ccIksmQF.o: in function `runSonic':
main.c:(.text+0x77): undefined reference to `openInputWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0xe7): undefined reference to `openOutputWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0xfe): undefined reference to `closeWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x1e7): undefined reference to `readFromWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x271): undefined reference to `writeToWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x299): undefined reference to `closeWaveFile'
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: main.c:(.text+0x2ae): undefined reference to `closeWaveFile'
collect2: error: ld returned 1 exit status
I use Sonic.java
Sonic sonic = new Sonic(48000, 2);
public updatePitch(float p){
sonic.setPitch(p);
}
When I call ‘updatePitch’ multiple times,The program has stopped!I noticed that calling the ‘writeBytesToStream(byte[] inBuffer,
int numBytes)’ function didn't even get to the next step
Hi, can we set start time and end time when playback with sonic library? Thank you.
https://github.com/waywardgeek/sonic links to http://dev.vinux-project.org/sonic, but the latter times out:
$ wget http://dev.vinux-project.org/sonic
--2017-02-15 21:04:39-- http://dev.vinux-project.org/sonic
Resolving dev.vinux-project.org (dev.vinux-project.org)... 99.32.249.89
Connecting to dev.vinux-project.org (dev.vinux-project.org)|99.32.249.89|:80... failed: Connection timed out.
Hi,
Thank you for your work on this library.
I updated one of my libraries that use sonic and one of the changes was here:
Line 368 in e06dbb9
where my older version had:
stream->pitchBuffer = (short*)calloc(maxRequired, sizeof(short) * numChannels);
Now I am getting an invalid read memory error here. If I change the new version's line to:
stream->pitchBufferSize = maxRequired + (maxRequired >> 2);
to be the same as stream->inputBufferSize
and stream->outputBufferSize
then everything works again as expected.
Is this maybe a bug or am I possibly doing something wrong?
After use, the audio is different from the original audio
Latest release version is release-0.2.0 which build on 28 Feb 2015 , it is so old, can you build a lastest version? Thanks!
Please provide tags for new releases from now on.
Thanks.
On 64 bit linux systems, the libs are to be installed within /usr/lib64/. Actually the libs are installed under /usr/lib
Thanks for your code. I want to add a simple interface for splicing two waveforms with reference to your code. Can you give me some hints?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.