Git Product home page Git Product logo

reaper's Introduction

REAPER: Robust Epoch And Pitch EstimatoR

This is a speech processing system. The reaper program uses the EpochTracker class to simultaneously estimate the location of voiced-speech "epochs" or glottal closure instants (GCI), voicing state (voiced or unvoiced) and fundamental frequency (F0 or "pitch"). We define the local (instantaneous) F0 as the inverse of the time between successive GCI.

This code was developed by David Talkin at Google. This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.

Downloading and Building reaper

cd convenient_place_for_repository
git clone https://github.com/google/REAPER.git
cd REAPER
mkdir build   # In the REAPER top-level directory
cd build
cmake ..
make

reaper will now be in convenient_place_for_repository/REAPER/build/reaper

You may want to add that path to your PATH environment variable or move reaper to your favorite bin repository.

Example:

To compute F0 (pitch) and pitchmark (GCI) tracks and write them out as ASCII files:

reaper -i /tmp/bla.wav -f /tmp/bla.f0 -p /tmp/bla.pm -a

Input Signals:

As written, the input stage expects 16-bit, signed integer samples. Any reasonable sample rate may be used, but rates below 16 kHz will introduce increasingly coarse quantization of the results, and higher rates will incur quadratic increase in computational requirements without gaining much in output accuracy.

While REAPER is fairly robust to recording quality, it is designed for use with studio-quality speech signals, such as those recorded for concatenation text-to-speech systems. Phase distortion, such as that introduced by some close-talking microphones or by well-intended recording-studio filtering, including rumble removal, should be avoided, for best results. A rumble filter is provided within REAPER as the recommended (default) high-pass pre-filtering option, and is implemented as a symmetric FIR filter that introduces no phase distortion.

The help text (-h) provided by the reaper program describes various output options, including debug output of some of the feature signals. Of special interest is the residual waveform which may be used to check for the expected waveshape. (The residual has a .resid filename extension.) During non-nasalized, open vocal tract vocalizations (such as /a/), each period should show a somewhat noisy version of the derivative of the idealized glottal flow. If the computed residual deviates radically from this ideal, the Hilbert transform option (-t) might improve matters.

The REAPER Algorithm:

The process can be broken down into the following phases:

  • Signal Conditioning
  • Feature Extraction
  • Lattice Generation
  • Dynamic Programming
  • Backtrace and Output Generation

Signal Conditioning

DC bias and low-frequency noise are removed by high-pass filtering, and the signal is converted to floating point. If the input is known to have phase distortion that is impacting tracker performance, a Hilbert transform, optionally done at this point, may improve performance.

Feature Extraction

The following feature signals are derived from the conditioned input:

  • Linear Prediction residual: This is computed using the autocorrelation method and continuous interpolation of the filter coefficients. It is checked for the expected polarity (negative impulses), and inverted, if necessary.
  • Amplitude-normalized prediction residual: The normalization factor is based on the running, local RMS.
  • Pseudo-probability of voicing: This is based on a local measure of low-frequency energy normalized by the peak energy in the utterance.
  • Pseudo-probability of voicing onset: Based on a forward delta of lowpassed energy.
  • Pseudo-probability of voicing offset: Based on a backward delta of lowpassed energy.
  • Graded GCI candidates: Each negative peak in the normalized residual is compared with the local RMS. Peaks exceeding a threshold are selected as GCI candidates, and then graded by a weighted combination of peak amplitude, skewness, and sharpness. Each of the resulting candidates is associated with the other feature values that occur closest in time to the candidate.
  • Normalized cross-correlation functions (NCCF) for each GCI candidate: The correlations are computed on a weighted combination of the speech signal and its LP residual. The correlation reference window for each GCI candidate impulse is centered on the inpulse, and correlations are computed for all lags in the expected pitch period range.

Lattice Generation

Each GCI candidate (pulse) is set into a lattice structure that links preceding and following pulses that occur within minimum and maximum pitch period limits that are being considered for the utterance. These links establish all of the period hypotheses that will be considered for the pulse. Each hypothesis is scored on "local" evidence derived from the NCCF and peak quality measures. Each pulse is also assigned an unvoiced hypothesis, which is also given a score based on the available local evidence. The lattice is checked, and modified, if necessary to ensure that each pulse has at least one voiced and one unvoiced hypothesis preceding and following it, to maintain continuity for the dynamic programming to follow. (Note that the "scores" are used as costs during dynamic programming, so that low scores encourage selection of hypotheses.)

Dynamic Programming

For each pulse in the utterance:
  For each period hypotheses following the pulse:
    For each period hypothesis preceding the pulse:
      Score the transition cost of connecting the periods.  Choose the
      minimum overall cost (cumulative+local+transition) preceding
      period hypothesis, and save its cost and a backpointer to it.
      The costs of making a voicing state change are modulated by the
      probability of voicing onset and offset.  The cost of
      voiced-to-voiced transition is based on the delta F0 that
      occurs, and the cost of staying in the unvoiced state is a
      constant system parameter.

Backtrace and Output Generation

Starting at the last peak in the utterance, the lowest cost period candidate ending on that peak is found. This is the starting point for backtracking. The backpointers to the best preceding period candidates are then followed backwards through the utterance. As each "best candidate" is found, the time location of the terminal peak is recorded, along with the F0 corresponding to the period, or 0.0 if the candidate is unvoiced. Instead of simply taking the inverse of the period between GCI estimates as F0, the system refers back to the NCCF for that GCI, and takes the location of the NCCF maximum closest to the GCI-based period as the actual period. The output array of F0 and estimated GCI location is then time-reversed for final output.

reaper's People

Contributors

dtalkin avatar gillesdegottex avatar jason-cooke avatar loverszhaokai avatar troughton avatar xavigonzalvo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reaper's Issues

Voiceless fricatives

Hi there, I’ve come across some cases where REAPER identifies GCIs during voiceless fricatives. Do you have any suggestions about what may be happening here. I can share the wav file. Thanks!

EpochTracker's destructor not freeing memory

I am trying to wrap reaper as a c extension for a python program however I noticed that there is a memory leak when calling a modified version of epoch_tracker_main.cc

I am not too familiar with C++ but the destructor for EpochTracker is empty

so I've added a call to CleanUp() method inside the destructor and and the leak seems to disappear.

EpochTracker::~EpochTracker(void) {
   CleanUp();
}

I was wondering if this makes sense as I am unfamiliar with c++ conventions

malloc() memory corruption error

Hi reaper,
I was running parallel reaper processes as worker processes spawned by python's multiprocessing Pool to speed up generation of .f0 files from multiple single-channel .wav files. In each worker process, I call

os.system("reaper -i FILENAME.wav -f FILENAME.f0 -e 0.02 -a")

which executes the reaper command processing that filename in a subshell (Obviously I made sure that no two reaper workers access the same FILENAME).
All of a sudden, I got an unexpected malloc error below.

(FYI I am running Ubuntu 16.04 LTS, but have confirmed this error persists on Mac OSX as well...)

Residual symmetry: P:837.185303  N:803.405762  MEAN:-0.159036
Inverting signal
*** Error in `reaper': malloc(): memory corruption: 0x0000000006212a70 ***

======= Backtrace: =========

/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff4097747e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7ff40977f13e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7ff409781184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7ff40a073e78]
reaper(_Z12MakeF0OutputR12EpochTrackerfPP5Track+0x82)[0x415fcf]
reaper(_Z18ComputeEpochsAndF0R12EpochTrackerffPP5TrackS3_S3_+0x119)[0x416361]
reaper(main+0x4c1)[0x416879]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff40971d830]
reaper(_start+0x29)[0x415ce9]
======= Memory map: ========
00400000-0043c000 r-xp 00000000 08:01 794191                             /usr/bin/reaper
0063b000-0063c000 r--p 0003b000 08:01 794191                             /usr/bin/reaper
0063c000-0063d000 rw-p 0003c000 08:01 794191                             /usr/bin/reaper
01686000-06230000 rw-p 00000000 00:00 0                                  [heap]
7ff404000000-7ff404021000 rw-p 00000000 00:00 0
7ff404021000-7ff408000000 ---p 00000000 00:00 0
7ff40945b000-7ff40961c000 rw-p 00000000 00:00 0
7ff4096fd000-7ff4098bd000 r-xp 00000000 08:01 394095                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff4098bd000-7ff409abd000 ---p 001c0000 08:01 394095                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff409abd000-7ff409ac1000 r--p 001c0000 08:01 394095                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff409ac1000-7ff409ac3000 rw-p 001c4000 08:01 394095                     /lib/x86_64-linux-gnu/libc-2.23.so
7ff409ac3000-7ff409ac7000 rw-p 00000000 00:00 0
7ff409ac7000-7ff409add000 r-xp 00000000 08:01 394116                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff409add000-7ff409cdc000 ---p 00016000 08:01 394116                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff409cdc000-7ff409cdd000 rw-p 00015000 08:01 394116                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff409cdd000-7ff409de5000 r-xp 00000000 08:01 394127                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff409de5000-7ff409fe4000 ---p 00108000 08:01 394127                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff409fe4000-7ff409fe5000 r--p 00107000 08:01 394127                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff409fe5000-7ff409fe6000 rw-p 00108000 08:01 394127                     /lib/x86_64-linux-gnu/libm-2.23.so
7ff409fe6000-7ff40a158000 r-xp 00000000 08:01 394983                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff40a158000-7ff40a358000 ---p 00172000 08:01 394983                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff40a358000-7ff40a362000 r--p 00172000 08:01 394983                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff40a362000-7ff40a364000 rw-p 0017c000 08:01 394983                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7ff40a364000-7ff40a368000 rw-p 00000000 00:00 0
7ff40a368000-7ff40a38e000 r-xp 00000000 08:01 394075                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff40a3e6000-7ff40a4b1000 rw-p 00000000 00:00 0
7ff40a517000-7ff40a582000 rw-p 00000000 00:00 0
7ff40a58a000-7ff40a58d000 rw-p 00000000 00:00 0
7ff40a58d000-7ff40a58e000 r--p 00025000 08:01 394075                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff40a58e000-7ff40a58f000 rw-p 00026000 08:01 394075                     /lib/x86_64-linux-gnu/ld-2.23.so
7ff40a58f000-7ff40a590000 rw-p 00000000 00:00 0
7ffd8caad000-7ffd8cace000 rw-p 00000000 00:00 0                          [stack]
7ffd8cbee000-7ffd8cbf0000 r--p 00000000 00:00 0                          [vvar]
7ffd8cbf0000-7ffd8cbf2000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Aborted

Could there be something I'm doing wrong? Or could Reaper be thread-unsafe? I just recently started using REAPER, and would really like to use this simple and powerful command-line tool frequently in the future.

Thank you for reading my issue :)
Best,

Wilson

std::vector "Out of Bounds" memory violation in ResampleAndReturnResults (Undefined behavior)

Hey there. Thanks for your great API.
We're in the middle of integrating REAPER into our software, and it seems awesome. However, we noticed a lot of seemingly random crashes, related to either allocation or deallocation, when running a lot of iterations of ResampleAndReturnResults. Looking through it, I stumbled across

  float last_time = (output_[0].resid_index / sample_rate_) + endpoint_padding_;
  int32_t n_frames = RoundUp(last_time / resample_interval);

[...]
for (int32_t i = limit; i >= 0; --i) {
  int32_t frame = RoundUp(output_[i].resid_index / (sample_rate_ * resample_interval));
[...]
}

Let's just do some math here.

Let's assume:
output[0].resid_index = 37976
sampleRate = 44100
resample_interval = 0.02

Yields:
lastTime = 0,8611
n_frames = 44

In the for loop, if i == 0,
frame = 37976 / 882 = 43,05 (round up ->) = 44

That's an out-of-bounds for the f0 vector that's only allocated 43, trying to access 44 (counting from zero). I suspect this to be the reason for the "random" crashes later on. What would the expected behavior be? Should it rather just be:

int32_t frame = RoundUp(output_[i].resid_index / (sample_rate_ * resample_interval)) -1;

?

All the best,
Sebastian

any sample input file for testing

I am getting this error message on a test wav file, not sure what format is the program expecting. A working sample wav file would help.

Attempt to load multi channel audioFailed to load waveform

reference document?

thank you for your share. Can you share reference document for REAPER. I read code but can not understand something and why you do that.

taking a long time

This algorithm sounds very interesting and I would like to compare it against some other methods for extracting vocal pitch. I am running reaper -i file.wav -f file_pitch.f0 -x 200 -m 50 -d debug_output.txt -e .01 and it is taking a very long time (20 min+) on a modern mac for a ~1 minute file. Is it supposed to take this long?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.