Comments (7)
@ezavesky : Let me provide a bit more detail to see if it helps you.
As you mention, it's not that rare to encounter audio that is not at exactly the same timebase as the original. My first use-case was matching tracks between different masterings of the Beatles albums; it turns out the master tapes stretch by several tenths of a percent each time they are played, so the different masterings, though perceptually identical, have noticeably different durations and can't be perfectly aligned in time. I was actually trying to align chord annotations, and the errors could amount to several beats between start and end of the tracks. But it's also relatively common to deliberately change playback speed, either to squeeze in more time for commercials, or to do fine beat-matching in crossfades, etc.
The fingerprint matching code finds similar landmarks (frequency peak pairs) in query and reference tracks, then calculates the difference of the absolute times within each track for each match. The assumption is that if the recordings match, even if we only identify a few common landmarks every second, over a long stretch of common material we'll see that the relative timing is consistent, indicating a non-chance match.
But if the timebases have been stretched so that t1
, time of an event in track 1, corresponds to t2 = offset + (1 + delta) * t1
in track 2, then the differences between each of the matching pairs will be dt = (t2 - t1) = offset + delta * t1
. We make a histogram of these time skews (at 23 ms resolution, or something, depending on the sampling rate), and look for the largest value, which is assumed to correspond to offset
. But if delta
is nonzero, those values may be smeared across multiple bins, possible to the extent of diminishing the largest value to be indistinguishable from noise (if the match is weak).
If we knew delta
, we could instead make a histogram of dt' = t2 - (1 + delta) * t1
, which would give us a sharp peak at the bin corresponding to offset
. Normally delta
is unknown, but if we're prepared to try a range of values, the one closest to the true delta
ought to give us the highest peak value across all the histograms, so we can use this to infer the underlying delta
(to some resolution).
In the Matlab code referenced above, I tested 41 values of delta
, ranging from -2% to 2% in steps of 0.1%. I found that 0.1% was a fine enough resolution to catch intermediate stretch values, and +/- 2% was enough to catch the worst case in my data. In practice, if it's time stretching by resampling (i.e., "stretching the tape" or wrong playback speed, rather than more complex pitch-preserving time scaling), then stretches of more than ~1% will also tend to shift the frequency peak bins, so the landmarks won't match any more anyway. (The frequency bin range is 0-255, so a "typical" bin of ~50 will shift over to 49 or 51 some time before you get to 2% stretch/compress. Note also that the landmark depends on timing between peaks, but this is only quantized to 0..63, so this has, on average, slightly broader bins, more tolerant of the stretching).
If you ignore this distortion of landmarks by time stretching (which was working for me), testing a range of deltas
requires simply recalculating the set of dt's
for each candidate delta
, building the histogram, then finding the largest value in the histogram, so it's pretty quick. I think it would be easy to add to the existing audfprint code (optional, off by default, controlled by a couple of flags like --max_time_stretch
and --time_stretch_resolution
) and would be a nice addition. I'm tempted to dive in myself, but I don't have much time at the moment.
DAn.
from audfprint.
Gosh, I have zero memory of adding that function to the Matlab audfprint. No, I never added it to the python version. In principle, the same information is available, but it requires substantial additional code in the matcher. Sorry.
from audfprint.
Heh ;) But the relevant bits / info is present in the hashtable? Just want to make sure before I try to do anything in there.
I guess the principle is relatively simple, currently we have the start time and end time (start time + length) of the query, along with the start time of the query in the source; I guess we need the end time of the query in the source, from the last pair of the matches, so we can calculate the time drift between the 2. Something like that?
from audfprint.
The relevant Matlab code is at:
https://labrosa.ee.columbia.edu/%7Edpwe/resources/matlab/audfprint/alignhashes.m.html
It actually performs a brute-force search for time warps in the range -2..2% in steps of 0.1% because I couldn't find a good closed-form way to estimate directly from the (time, time-offset) pairs.
The basic idea is that once you have all the skew times for the matching hashes at
https://github.com/dpwe/audfprint/blob/master/audfprint_match.py#L264
right now it just looks for modes in the skew times (sorted_hits[:, 1], using np.bincount), but instead you would try adjusting each skew time by a linear function of the hash time (sorted_hits[:, 3]) to see if adjusting by that increased the largest mode.
Good luck if you dig into this!
from audfprint.
Thanks a lot for the info, I'll be sure to make a pull request if I achieve anything useful ;)
from audfprint.
@wegel were you able to make any progress here?
Trying to understand the request... Given that the current code bins into windows of likely match, the skew addition (above) was to improve match timing granularity? It doesn't seem to add any robustness to actual timing skew (e.g. audio signal speed up or slow down, as sometimes employed by radio stations), right?
from audfprint.
My actual use case was to actually detect the (pretty small, something like 0.1%) time skew to be able to correct it; haven't worked on it though.
from audfprint.
Related Issues (20)
- Incorrect time range HOT 1
- Incorrect Time range. HOT 1
- Convert .afpk files to mp3? HOT 1
- Problem with "spreadpeaksinvector"
- Reduce memory usage HOT 2
- Use audfprint as a module
- How to increase bits for storing IDs and timestamp?
- illustrate
- Scan every folder in every fingerprint base named as folder HOT 3
- UNICODE chaaracters ERROR HOT 11
- Show more than 1 finded matched names in results. HOT 4
- Can this algorithm load the historical features into memory first, so that the matching speed is improved, but I don't know how to modify your basic code HOT 7
- Hello, if the song is 1 million (the duration of the song is about 3 minutes), the matching time is very slow, how to optimize it? HOT 1
- I found that according to The time and hash generated by audfprint are not continuous in time, which led me to use the binary method to search for the same hash, conduct a statistical ranking of the number of hashes, and search for the original version of individual audio (the climax audio), and the ranking is not the first.
- I found that some match matches are inaccurate. I want to know, is the time and hash generated by this audfprint continuous at the start-end time, or is it a peak value, which leads to the fact that if one song is 1 minute, another song 3 minutes, there are a lot of hashes in the previous minute in the last two minutes of the next song, which leads to the fact that the hash statistics are larger than one minute, resulting in inaccurate sorting, then the match situation is inaccurate, how to optimize this? HOT 2
- output matching different between windows and linux. I created a database of filehashes which is around 340MB. When i try to query around 100 mp3 files my output is different between my windows machine and my raspberri pi. The database is identical, the query is identical but the windows machine finds significantly more matches. Both are running python 3.9. Database was created on the windows machine and transfered to the pi. Anyone encountered something similar? HOT 7
- Question about concatenate afpk files HOT 2
- How to avoid big % Dropped HOT 5
- Can someone take on audfprint-gui for audfprint or create a new gui for it? HOT 2
- Ability to split pklzs into smaller sizes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audfprint.