dpwe / audfprint Goto Github PK

Landmark-based audio fingerprinting

License: MIT License

Python 96.31% Makefile 3.69%

audfprint's Introduction

audfprint

Landmark-based audio fingerprinting.

Landmark-based audio fingerprinting.
Create a new fingerprint dbase with "new",
append new files to an existing database with "add",
or identify noisy query excerpts with "match".
"precompute" writes a *.fpt file under precompdir
with precomputed fingerprint for each input wav file.
"merge" combines previously-created databases into
an existing database; "newmerge" combines existing
databases to create a new one.

Usage: audfprint (new | add | match | precompute | merge | newmerge | list | remove) [options] [<file>]...

Options:
  -d <dbase>, --dbase <dbase>     Fingerprint database file
  -n <dens>, --density <dens>     Target hashes per second [default: 20.0]
  -h <bits>, --hashbits <bits>    How many bits in each hash [default: 20]
  -b <val>, --bucketsize <val>    Number of entries per bucket [default: 100]
  -t <val>, --maxtime <val>       Largest time value stored [default: 16384]
  -u <val>, --maxtimebits <val>   maxtime as a number of bits (16384 == 14 bits)
  -r <val>, --samplerate <val>    Resample input files to this [default: 11025]
  -p <dir>, --precompdir <dir>    Save precomputed files under this dir [default: .]
  -i <val>, --shifts <val>        Use this many subframe shifts building fp [default: 0]
  -w <val>, --match-win <val>     Maximum tolerable frame skew to count as a match [default: 2]
  -N <val>, --min-count <val>     Minimum number of matching landmarks to count as a match [default: 5]
  -x <val>, --max-matches <val>   Maximum number of matches to report for each query [default: 1]
  -X, --exact-count               Flag to use more precise (but slower) match counting
  -R, --find-time-range           Report the time support of each match
  -Q, --time-quantile <val>       Quantile at extremes of time support [default: 0.05]
  -S <val>, --freq-sd <val>       Frequency peak spreading SD in bins [default: 30.0]
  -F <val>, --fanout <val>        Max number of hash pairs per peak [default: 3]
  -P <val>, --pks-per-frame <val>  Maximum number of peaks per frame [default: 5]
  -D <val>, --search-depth <val>  How far down to search raw matching track list [default: 100]
  -H <val>, --ncores <val>        Number of processes to use [default: 1]
  -o <name>, --opfile <name>      Write output (matches) to this file, not stdout [default: ]
  -K, --precompute-peaks          Precompute just landmarks (else full hashes)
  -k, --skip-existing             On precompute, skip items if output file already exists
  -C, --continue-on-error         Keep processing despite errors reading input
  -l, --list                      Input files are lists, not audio
  -T, --sortbytime                Sort multiple hits per file by time (instead of score)
  -v <val>, --verbose <val>       Verbosity level [default: 1]
  -I, --illustrate                Make a plot showing the match
  -J, --illustrate-hpf            Plot the match, using onset enhancement
  -W <dir>, --wavdir <dir>        Find sound files under this dir [default: ]
  -V <ext>, --wavext <ext>        Extension to add to wav file names [default: ]
  --version                       Report version number
  --help                          Print this message

audfprint require some packages You can install them with. pip install -r requirements.txt

This version uses ffmpeg to read input files. You must have a working ffmpeg binary in your path (try ffmpeg -V at the command prompt).

Based on Matlab prototype, http://www.ee.columbia.edu/~dpwe/resources/matlab/audfprint/ . This python code will actually read and use databases created by the Matlab code (version 0.90 upwards).

Usage

Build a database of fingerprints from a set of reference audio files:

> python audfprint.py new --dbase fpdbase.pklz Nine_Lives/0*.mp3
Wed Sep 10 10:52:18 2014 ingesting #0:Nine_Lives/01-Nine_Lives.mp3 ...
Wed Sep 10 10:52:20 2014 ingesting #1:Nine_Lives/02-Falling_In_Love.mp3 ...
Wed Sep 10 10:52:22 2014 ingesting #2:Nine_Lives/03-Hole_In_My_Soul.mp3 ...
Wed Sep 10 10:52:25 2014 ingesting #3:Nine_Lives/04-Taste_Of_India.mp3 ...
Wed Sep 10 10:52:28 2014 ingesting #4:Nine_Lives/05-Full_Circle.mp3 ...
Wed Sep 10 10:52:31 2014 ingesting #5:Nine_Lives/06-Something_s_Gotta_Give.mp3 ...
Wed Sep 10 10:52:32 2014 ingesting #6:Nine_Lives/07-Ain_t_That_A_Bitch.mp3 ...
Wed Sep 10 10:52:35 2014 ingesting #7:Nine_Lives/08-The_Farm.mp3 ...
Wed Sep 10 10:52:37 2014 ingesting #8:Nine_Lives/09-Crash.mp3 ...
Added 63241 hashes (24.8 hashes/sec)
Processed 9 files (2547.3 s total dur) in 21.6 s sec = 0.008 x RT
Saved fprints for 9 files ( 63241 hashes) to fpdbase.pklz

Add more reference tracks to an existing database:

> python audfprint.py add --dbase fpdbase.pklz Nine_Lives/1*.mp3
Read fprints for 9 files ( 63241 hashes) from fpdbase.pklz
Wed Sep 10 10:53:14 2014 ingesting #0:Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3 ...
Wed Sep 10 10:53:16 2014 ingesting #1:Nine_Lives/11-Pink.mp3 ...
Wed Sep 10 10:53:18 2014 ingesting #2:Nine_Lives/12-Attitude_Adjustment.mp3 ...
Wed Sep 10 10:53:20 2014 ingesting #3:Nine_Lives/13-Fallen_Angels.mp3 ...
Added 27067 hashes (22.0 hashes/sec)
Processed 4 files (1228.6 s total dur) in 13.0 s sec = 0.011 x RT
Saved fprints for 13 files ( 90308 hashes) to fpdbase.pklz

Match a fragment recorded of music playing in the background against the database:

> python audfprint.py match --dbase fpdbase.pklz query.mp3
Read fprints for 13 files ( 90308 hashes) from fpdbase.pklz
Analyzed query.mp3 of 5.573 s to 204 hashes
Matched query.mp3 5.573 sec 204 raw hashes as Nine_Lives/05-Full_Circle.mp3 at 50.085 s with 8 of 9 hashes
Processed 1 files (5.8 s total dur) in 2.6 s sec = 0.443 x RT

The query contained audio from Nine_Lives/05-Full_Circle.mp3 starting at 50.085 sec into the track. There were a total of 17 landmark hashes shared between the query and that track, and 14 of them had a consistent time offset. Generally, anything more than 5 or 6 consistently-timed matching hashes indicate a true match, and random chance will result in fewer than 1% of the raw common hashes being temporally consistent.

Merge a previously-computed database into an existing one:

> python audfprint.py merge --dbase fpdbase.pklz fpdbase0.pklz
Wed Apr  8 18:31:29 2015 Reading hash table fpdbase.pklz
Read fprints for 4 files ( 126989 hashes) from fpdbase.pklz
Read fprints for 9 files ( 280424 hashes) from fpdbase0.pklz
Saved fprints for 13 files ( 407413 hashes) to fpdbase.pklz

Merge two existing databases to create a new, third one:

> python audfprint.py newmerge --dbase fpdbase_new.pklz fpdbase.pklz fpdbase0.pklz
Read fprints for 4 files ( 126989 hashes) from fpdbase.pklz
Read fprints for 9 files ( 280424 hashes) from fpdbase0.pklz
Saved fprints for 13 files ( 407363 hashes) to fpdbase_new.pklz

Locating Matches

To find out not just that two files match, and not just the relative timing between them that makes them line up, but the exact time ranges that match in both query and reference files, use --find-time-range:

python audfprint.py match --dbase fpdbase.pklz query.mp3 --find-time-range
Sun Aug  9 18:13:54 2015 Reading hash table fpdbase.pklz
Read fprints for 9 files ( 158827 hashes) from fpdbase.pklz
Sun Aug  9 18:13:57 2015 Analyzed #0 query.mp3 of 5.619 s to 928 hashes
Matched    3.6 s starting at    0.8 s in query.mp3 to time   50.9 s in Nine_Lives/05-Full_Circle.mp3 with    12 of    39 common hashes at rank  0
Processed 1 files (5.8 s total dur) in 2.6 s sec = 0.451 x RT

Notice how the message includes the precise duration and time points in both query and reference item spanning the matches. Because a single spurious match elsewhere in the file can cause misleading results, these times are calculated after discarding a small number of the earliest and latest matches; this proportion is set by --time-quantile which is 0.01 by default (1% of matches ignored at beginning and end of match region when calculating match time range).

Scaling

The fingerprint database records 2^20 (~1M) distinct fingerprints, with (by default) 100 entries for each fingerprint bucket. When the bucket fills, track entries are dropped at random; since matching depends only on making a minimum number of matches, but no particular match, dropping some of the more popular ones does not prevent matching. The Matlab version has been successfully used for databases of 100k+ tracks. Reducing the hash density (--density) leads to smaller reference database size, and the capacity to record more reference items before buckets begin to fill; a density of 7.0 works well.

Times (in units of 256 samples, i.e., 23 ms at the default 11kHz sampling rate) are stored in the bottom 14 bits of each database entry, meaning that times larger than 2^14*0.023 = 380 sec, or about 6 mins, are aliased. If you want to correctly identify time offsets in tracks longer than this, you need to use a larger --maxtimebits; e.g. --maxtimebits 16 increases the time range to 65,536 frames, or about 25 minutes at 11 kHz. The trade-off is that the remaining bits in each 32 bit entry (i.e., 18 bits for the default 14 bit times) are used to store the track ID. Thus, by default, the database can only remember 2^18 = 262k tracks; using a larger --maxtimebits will reduce this; similarly, you can increase the number of distinct tracks by reducing --maxtimebits, which doesn't prevent matching tracks, but progressively reduces discrimination as the number of distinct time slots reduces (and can make the reported time offsets, and time ranges for --find-time-ranges, completely wrong for longer tracks).

audfprint's People

Contributors

Stargazers

Watchers

Forkers

icecast brewsterkahle jordanwhitney y2kbot cbrecabarren rowlandwatkins jonathanwoodard slifty karodievas yenchih wutongluxjtu vsaliy alexandregz seanbannister mattloose amania88 wanjinchang desperado1992 xmonader clcarwin bjurko pimpmypixel palmdalian programmeryuan bazo eoffermann himanshusharmakul yangzhongwei askoriy gil2abir marinameyta cabalist florianthalmann m12sl georgefedoseev kevincorcor a-r-i aliang1 arthurtofani agangzz petershan1119 aidman weartoby asdlei99 gbyfbi badideafactory moishy90 shuvuu vriez milcktoast deguilardi loretoparisi fat84 ezavesky twistedmove rarolabs patatetom hubeibei007 shuto-ueno ay-kikuchi newlife19891108 ddxk salaaar krackle akrishna77 ankitshah009 ldcbnh palash93 hellowlol liyucode amoliu ioxuy db48x cclauss snorkeldepth leo1361466 omerbal35 kedengfeng videoprocess fanofjava opencvbaby aldycool xiaojax uki1014 165cmtallboy gu701ce stephenwithph xulikui123321 wy192 hai19991331 yermak supuchun mcren88 ishine daniu123 nakhunchumpolsathien wuseguang pj-mathematician purplemyst lvchigo

audfprint's Issues

filtering based on id when matching time ranges

when matching time ranges in https://github.com/dpwe/audfprint/blob/master/audfprint_match.py#L178 wouldn't it make sense to filter based on id as well? Something like this:

match_times = hits[np.logical_and.reduce([ hits[:, 1] >= minoffset, hits[:, 1] <= maxoffset, hits[:, 0] == id ]), 3]

how to get audio file duration

Hello,

How can I get/calculate file duration from database? Can I store file duration while adding to database?

I want to query long mixed/combined audio file. Original files are in database. Is it possible already?

ps: I noticed that when I re-add files to database hash size doubles but number of files same as before.

Import Usage

Hi, if I were to import this library into a larger Python script, how would I go about running the equivalent of the following commands:
python audfprint.py match -x 10 --dbase test.pklz path/to/possible/match.wav
python audfprint.py add -x 10 --dbase test.pklz path/to/new/file.wav

I did not see anything in the documentation about how to use the program within a Python script.

By the way, thanks for writing such a useful program.

two little bugs?

1, In audfprint.py line 358, it should be
-v, --verbose Verbosity level
2, In audfprint_match.py line 183, it should be
results.resize((maxnresults, 7))

caculate time_ranges

Hi, professor Dan
Shouldn't the _caculate_time_ranges() function modified as:

def _calculate_time_ranges(self, hits, id, mode, mintime):
"""Given the id and mode, return the actual time support."""
match_times = sorted(hits[row, 3]
for row in np.nonzero(hits[:, 0]==id)[0]
if mode + mintime - self.window <= hits[row, 1]
and hits[row, 1] <= mode + mintime + self.window)

and execute: min_time, max_time = self._calculate_time_ranges(hits, id, mode, mintime) in _approx_match_counts().

since (mode+mintime) physically indicates real time_skew.

Matching fails with ValueError: The first argument of bincount must be non-negative

Dan Schultz reports:

I wanted to let you know I'm seeing an error from audfprint:

Traceback (most recent call last):
File "/usr/local/bin/audfprint/audfprint.py", line 482, in
main(sys.argv)
File "/usr/local/bin/audfprint/audfprint.py", line 465, in main
strip_prefix=args['--wavdir'])
File "/usr/local/bin/audfprint/audfprint.py", line 155, in do_cmd
msgs = matcher.file_match_to_msgs(analyzer, hash_tab, filename, num)
File "/usr/local/bin/audfprint/audfprint_match.py", line 326, in file_match_to_msgs
rslts, dur, nhash = self.match_file(analyzer, ht, qry, number)
File "/usr/local/bin/audfprint/audfprint_match.py", line 317, in match_file
rslts = self.match_hashes(ht, q_hashes)
File "/usr/local/bin/audfprint/audfprint_match.py", line 272, in match_hashes
results = self._approx_match_counts(hits, bestids, rawcounts)
File "/usr/local/bin/audfprint/audfprint_match.py", line 228, in _approx_match_counts
allbincounts = np.bincount((allids << timebits) + alltimes)
ValueError: The first argument of bincount must be non-negative

Windows vs. LINUX

audfprint appears to be Linux focused.

audfprint_match.py resource and the log method geared towards *NIX.
Here is a possible option for this issue: (requires psutil), not sure why function names are different ..

import os,platform,psutil

def process_info():
    rss=usrtime=0
    p=psutil.Process(os.getpid())
    if platform.system().lower()=='windows':
        rss=p.memory_info()[0]
        usrtime=p.cpu_times()[0]
    elif platform.system().lower()=='linux':
        rss=p.get_memory_info()[0]
        usrtime=p.get_cpu_times()[0]
    return rss,usrtime

if __name__ == "__main__":
    print process_info()

Initial review of the filelist from the glob specification on the command line appears to want a list of files returned.
There appears to be a problem with full windows path names. C:\fullpath... using the --list mode helps.

self.table hash contradiction

Hi @dpwe,

I'm investigating on how the self.table in hash_table.py works when inserting and querying data. I found some confusion in this issue, please see my notes below:

# In hash_table.py, function get_hits():
# hashes = parameter query hash returned from wavfile2hashes (list of (time, hash) tuples)
# hash_ is taken straight from hashes, means the hash value is NOT masked
hash_ = hashes[ix][1]

# We check how many data contained in the self.counts for the given hash_
# hash_ is still NOT masked which is correct because self.counts stores unmasked hashes
nids = min(self.depth, self.counts[hash_])

# This is where it gets confusing, because we are trying to get the table values using the unmasked hash
tabvals = self.table[hash_, :nids]

# But the implementation of storing data into self.table in hash_table.py function store() is like this:
# sortedpairs = time-hash pairs ingested from sound file, returned from wavfile2hashes (list of (time, hash) tuples)
for time_, hash_ in sortedpairs:
    # This is still OK because we are getting the counts using the original hash (not masked)
    count = self.counts[hash_]
    time_ &= timemask
    # This is the point where the hash_ variable is replaced with the masked version of hash_
    hash_ &= hashmask
    val = (idval + time_) #.astype(np.uint32)
    if count < self.depth:
        # This is where we store into the self.table USING the HASHED VERSION of hash_, which is contradict the previous retrieval process of self.table
        self.table[hash_, count] = val

TL;DR; is, the way we store values into the hash table is using the masked hash as the key, and then we get the values from the hash table using the original / unmasked hash as the key. Can you kindly explain the situation please, thanks!

using .pklz against .pklz for match

Hey, can we use a .pklz file against .pklz for querying(matching).
For example: python3 audfprint.py match --dbase ads.pklz recs.pklz --find-time-range --exact-count --max-matches 200 --min-count 50 --opfile results.out

maxtime=16384 ? Units?

Does -- maxtime parameter represent the number of samples or seconds?

Audio alignment / skew detection

Hi,

Can this python version be used for detecting / correcting audio alignment as the Matlab version (eg, from https://labrosa.ee.columbia.edu/~dpwe/resources/matlab/audfprint/#3 ) ? If so, any pointer on how to proceed? Thanks!

Precompdir / wavdir behavior

I may be using the parameters incorrectly, but the output isn't behaving the way I would expect

I have a folder for my media /var/audfprint/media_cache, and folder for fingerprints /var/audfprint/afpt_cache

I'm passing them in using --precompdir and -wavdir as appropriate. I would expect a fingerprint for a given media file in the media cache to appear directly in the afpt cache. However, the resulting file is nested inside of a directory path in the afpt cache.

It would be very helpful to be able to specify that I don't want audfprint to recreate my directory structure in the fingerprint directory.

Example Input:

$> python audfprint.py precompute --precompdir /var/audfprint/afpt_cache --wavdir /var/audfprint/media_cache media-1.mp3

Output:

> Thu Dec 31 18:29:05 2015 precomputing hashes for /var/audfprint/media_cache/media-1.mp3 ...
> wrote /var/audfprint/afpt_cache/var/audfprint/media_cache/media-1.afpt ( 49183 hashes, 1859.936 sec)

Desired Output:

> Thu Dec 31 18:29:05 2015 precomputing hashes for /var/audfprint/media_cache/media-1.mp3 ...
> wrote /var/audfprint/afpt_cache/media-1.afpt ( 49183 hashes, 1859.936 sec)

Reducing dependencies

Hey, you have made a great algorithm. I have one need that is to eliminate the ffmpeg dependency. Is there a way i can avoid using that package or use an alternate library which does not use ffmpeg in the background. Thanks !!

Precomputing makes the accuracy go worse!

Hey Dan, hope you are doing great.

So I pre-computed some 'recordings' on which I wanted to query a subclip which is contained in all of the said 'recordings'.
In parallel, I also saved the 'recordings' as a single .pkl database and queried the same subclip on it.
Turns out, the first method fails to recognize the subclip in many of the recordings whereas the second method works flawlessly.
Attached below is just one such instance:-

Results by 1st method: NOMATCH precomp/home/ubuntu/mm/audfprint-master/tests/data/ABC001 2018-09-08 16-00-00.afpt 3655.4 sec 377315 raw hashes

Results by 2nd method: Matched 46.6 s starting at 935.6 s in ./tests/data/ABC001 2018-09-08 16-00-00.mp3 to time 2.0 s in ./tests/data/adi/clip.mp3 with 1132 of 35696 common hashes at rank 0 count 8

Hope I make the issue clear enough,
Thanks!

Removing the last hash from a database

Apologies for the sparsely written issue, I can add more details after an impending deadline!

I noticed that when I removed the only hash from a database I get a division by zero error at the line where the percentage of dropped hashes.

Steps to reproduce:

Add a file to a new database
Remove it from that database (without adding any others)

Any way to force a minimum # peaks/second?

I've been using audfprint for doing peak spectral peak analysis in order to understand how/what Shazam et al see as "peaks" within a given track. So I've mostly been using the precompute and -K options to produce peak files and then extract the location/frequency pairs from the afpk file to use elsewhere.

I'm noticing that, especially at relatively low densities (say, 10 hashes/sec), I can end up with no peaks detected for long stretches. For example, on a test track I'm using I'll end up without any peaks in a stretch of 4 seconds or more. This is for sonic material with perceptible activity.

So I have two questions about this.

First, is there any way to force a higher sensitivity without requiring an overall higher density? In other words, in the areas with lots of peaks (at a density of 10 hashes/sec), I don't need any more in the dense areas, but I would appreciate more peak detection in the lulls. Can I ask for a minimum hashes/sec?

Second, does Shazam et al ever left that long of a stretch go w/o recorded peaks? My intuition playing with it is no, but I was surprised to encounter such low peak detection.

Memory Error when writing out 50k tracks on a 1GB machine

When I upped the machine to 3GB of virtual memory, it worked fine. I tried using the --addcheckpoint to see if I could write it out before it got to large, but it did not seem to be a recognized command line option in the newest release.

The real time to search 50k tracks with a 10 second clip when in a docker container (rajbot/audfprint) was 11seconds.

-brewster

Max file in database

Hi!

I'm crating the database out of 10k files. My question is, is it possible? I've seen that while processing the files this message appears:

Read fprints for 2023 files ( 8013111 hashes) from fpdbase.pklz (1.09% dropped)

Dropped ones is getting higher and number of files "2023" doesn't increase.

Should I create smaller databases?

Thanks!

samplerate test

I tested the samplerate for ingest of content and query values as follows. Note Case 1 is correct. All source data is original 8KHZ.
--exact-count
--min-count 1
--density 100
--max-matches 10
--match-win 5
--pks-per-frame 2

Case1:--samplerate 11000, first case is the query itself in the content, second instance is a match, 62.1 seconds is another instance, All detections in case 1 are correct.

at 46.476 s with 2 of 9 hashes at rank 1
at 47.686 s with 1 of 9 hashes at rank 1
at 62.183 s with 1 of 9 hashes at rank 1

Case 2: Here we have a match with itself, however it misses matches at 47.6 and 62 in the top 10 results.

With 8KHZ data --samplerate 8000

 at 46.048 s with 3 of 13 hashes at rank 1
 at 332.416 s with 2 of 13 hashes at rank 1
 at 12.672 s with 1 of 13 hashes at rank 1
 at 13.472 s with 1 of 13 hashes at rank 1
 at 97.408 s with 1 of 13 hashes at rank 1
 at 121.056 s with 1 of 13 hashes at rank 1
 at 146.304 s with 1 of 13 hashes at rank 1
 at 257.216 s with 1 of 13 hashes at rank 1
 at 323.008 s with 1 of 13 hashes at rank 1

Merging Tables With Different Hashbits

Hi,

I'm trying to merge tables with different number of hashbits. When I match against the original table it detects songs correctly, but when I match against the table that was merged with the original table it doesn't detect songs.

I don't really understand your implementation of the hash table, so if this operation doesn't make sense, please let me know.

I'm doing this so that I can keep the size of the original hash table small. I'm storing many small pickled files in a database and merging those files into a hash table in memory when needed.

Package

Any chance we could turn this in to a package and publish it to pypi?

One audio file cannot be matched

Hi,

I trained 5 audio files to establish a pklz file, however, it is strange to find there are only 4 audio files are matched(with good result), one audio file could not be recognized. It seems that the reason is the mode of this audio file cannot be found. So how to change the parameters to recognize this audio file? And which of the parameters are critical to influence the results?

Thanks.

Precomputed technique gives same start time for different snippets when using larger data sets

I am precomputing the pklz files for a mixture of wav and mp3 files that make up a collection of 68 large files. Then I am seeing whether 1249 snippets appear in one of these 68 larger files.

Interesting, i always find that the last value is wrong and the same as the penultimate value, as if there is an algorithmic error.

Example:

And the Penultimate and last value show the same start times also:
Matched precomp/source/aud/20100216_LeadersQuestions.afpt 409.9 sec 14841 raw hashes as segmented/aud/Po01PtFG_20100216_st_0019.wav at -275.7 s with 590 of 662 common hashes at rank 8
Matched precomp/source/aud/20100216_LeadersQuestions.afpt 409.9 sec 14841 raw hashes as segmented/aud/Po01PtFG_20100216_st_0010.wav at -275.7 s with 585 of 662 common hashes at rank 9

This problem does not arise if I create a much smaller number of .afpt files and send all segmented/aud/Po01PtFG_20100216_st_*.wav into it.

I'm wondering am I running the code with slightly wrong options and this is the cause of the problem? If I run with density = 100 I get a few snippets of duration > 10secs that are not recognised in using the algorithm. If I set density = 50 I get only snippets < 2.5secs that are not recognised which I would be throwing away anyway. If I have much smaller densities I obtain more and more NOMATCH values.

My script for running the software looks something like this:

ls source/aud/.wav source/aud/.mp3 > largeFiles.list
ls segmented/aud/$2 > allSnippets.list

rm -rf precomp

echo "Precompute peaks for all large files..."
./audfprint.py precompute --samplerate 11024 --density 50 --shifts 1 --precompdir precomp --ncores 4 --list largeFiles.list
find precomp/ -name "*.afpt" > precomp.list

echo "Take the snippets and build a database of their peak profiles..."
./audfprint.py new --dbase snippets.db --density 50 --samplerate 11025 --shifts 4 --list allSnippets.list

echo "Lastly, find matches and arrange on screen to make the information easier to read..."
./audfprint.py match --dbase snippets.db --match-win 2 --min-count 20 --max-matches 100 --sortbytime --opfile matches.txt --ncores 4 --list precomp.list

echo "Check against known results..."
grep Matched matches.txt | sed -e "s@precomp@@" -e "s@/2/data/@@g" -e "s/.mp3//" -e "s/.wav//" -e "s/.afpt//" | awk '{printf "%s\t%2d:%2.1f\t%s\n",$15,(-$11%3600/60),(-$11%60),$9}' > recognised.txt

Trying to use a database

Hi,

I'm looking into delegating some of the scalability issues to a known database, for now MySQL.

I can read fingerprints (with your code) and store them in MySQL (using some dejavu db code); and I can read hash matches back:
https://gist.github.com/Laurian/7869355a000c803f26bb434935a367cb#file-test-py

I'm struggling with how to feed those hash matches back into your further processing as I don't quite follow the magic around hashmask, timemask and some of the numpy operations you do.

How would you recommend approaching alternate hashtable implementations?

Calculate the fingerprint from a smartphone

Hi,
This tool works very well.
I have wrapped it in a node server and it works like a charm.
However, when a request is sent to my server, the server calcul the fingerprint from the file, and the CPU is used at 100%.

For my project, it's not viable. I was wondering if that could be possible to calcul the fingerprint from a Smartphone in Java (Android) and Swift (IOS), because I guess that if the CPU is used à 100%, it's mainly because of the fingerprint's calcul ?

Thanks

How search works?

Could you please depict how this landmarks based fingerprint search works in your implementation? Also, assumed I want to embed the audio representation in a different way (let's say spectrum or correlogram instead of fingerprint), would the landmark based representation work?
Thank you.

Accuracy when compared to other popular audio fingerprinting library - dejavu?

Did someone here test accuracy of this library when compared to dejavu audio fingerprinting. Did the author introduced any new concept to improve the accuracy?

new user question

Hi,

To see if I had installed correctly. I ran "new" and then "match" on the same sample mp3 file. However match ran for a few minutes and then "Terminated".

I can't tell if that error matters.

Would appreciate any help. Apologies if I am doing something wrong.

Is there a post-install reference check I can do to see if my install is ok?

python audfprint.py new -d animals2 references/280.mp3
/usr/local/lib/python2.7/dist-packages/librosa/core/audio.py:33: UserWarning: Could not import scikits.samplerate. Falling back to scipy.signal
warnings.warn('Could not import scikits.samplerate. '
Thu Oct 15 07:42:13 2015 ingesting #0: references/280.mp3 ...
Added 162 hashes (16.7 hashes/sec)
Processed 1 files (9.7 s total dur) in 8.9 s sec = 0.913 x RT
Saved fprints for 1 files ( 162 hashes) to animals2
Dropped hashes= 0 (0.00%)
root@ubuntu:~/freetype-2.5.3/audfprint-master# python audfprint.py match -d animals2 references/280.mp3
/usr/local/lib/python2.7/dist-packages/librosa/core/audio.py:33: UserWarning: Could not import scikits.samplerate. Falling back to scipy.signal
warnings.warn('Could not import scikits.samplerate. '
Thu Oct 15 07:43:08 2015 Reading hash table animals2
Terminated

MemoryError

machine info : i3 2nd series 4 cores, 8gb ram, 256 gb ssd, xubuntu 16.04
query rec_170126_080002.afpt or orignal file is FM radio recording.
fps : 23348 files ( 2022959318 hashes) from /afp/songs.db
database created using : adsa = ['audfprint.py', 'new', '--dbase', songs_db_file, '--density', '100', '--skip-existing', '--shifts', '6', '--maxtime', '32768', '--ncores', '4', '--list', tmpFile]
-- args.

when --ncores 4 then no error like below, but freezes... without --ncores 4 or single core then following error occurs.
What configuration do you suggest? Or what can we do?

(Line numbers may not same as original file, I added custom lines such as song duration table .. etc.)

./bin/audfprint.py match --dbase /afp/songs.db --match-win 2 --min-count 200 --max-matches 100 --sortbytime --opfile /afp/runs/tmp/songs_0126_141018.rec --find-time-range --list /afp/runs/tmp/run_0126_141018.rec
Read fprints for 23348 files ( 2022959318 hashes) from /afp/songs.db
Thu Jan 26 14:55:58 2017 Analyzed #0 /afp/precomp/1701/26/rec/rec_170126_080002.afpt of 3401.259 s to 298482 hashes
Traceback (most recent call last):
  File "./bin/audfprint.py", line 490, in <module>
    main(sys.argv)
  File "./bin/audfprint.py", line 473, in main
    strip_prefix=args['--wavdir'])
  File "./bin/audfprint.py", line 161, in do_cmd
    msgs = matcher.file_match_to_msgs(analyzer, hash_tab, filename, num)
  File "/afp/bin/audfprint_match.py", line 335, in file_match_to_msgs
    rslts, dur, nhash = self.match_file(analyzer, ht, qry, number)
  File "/afp/bin/audfprint_match.py", line 326, in match_file
    rslts = self.match_hashes(ht, q_hashes)
  File "/afp/bin/audfprint_match.py", line 281, in match_hashes
    results = self._approx_match_counts(hits, bestids, rawcounts)
  File "/afp/bin/audfprint_match.py", line 234, in _approx_match_counts
    allbincounts = np.bincount((allids << timebits) + alltimes)
MemoryError

Ability to store different identifiers for audio

At the moment audfprint uses the filename/path as identifier for the songs.

It would be great to have an option to pass other values to audfprint. Maybe by an additional parameter.

I plan to use audfprint together with data from musicbrainz.org and would like to store those IDs for the ingested songs.

Make initial search depth a parameter

Currently, Matcher.match_hashes only examines the top 100 most promising ref items to calculate the modal time skews and time-filtered hash counts. For larger databases, this may not be deep enough to find true matches. In any case, hard-coding a parameter like this is very poor style.

Report time support

Add an option to calculate and report the time support within the query of the matching hashes i.e. the earliest and latest times of found matching hashes. (This also gives the time support within the reference when added to the skew time). Useful for organizing multiple matching excerpts in a single query, even if they come from arbitrary places within ref items.

Concatenating precomuted hashes of consecutive parts of file

Is there a way to concatenate precomputed hashes so that the times align properly?

I have recorded consecutive mp3 files and I precompute them. I would like to concatenate the precomputed files without needing to attach the original mp3 files and precomputing the attached file.

I'm using your code as a library, so if you can share a code sample that would be great.

I've tried offsetting the hash time by 430 for each file I add but it's not exact. even if I get the length of the original file using audio_read it's not exact.

Improve matches in pitch/tempo changing song samples

There is a possibility this is one of the disadvantages of the algorithm used in audfprint, but is there a way to improve the match % in songs where the pitch/tempo was changed? (via the Options?)

According to what I tested (using Ableton) if you change pitch >-/+1% songs start to be not matchable. Same goes to changing tempo only (preserving the key) with >-/+ 5%.

-Lior

doesn't run under ubuntu 16.04

./audfprint.py

Traceback (most recent call last):
  File "./audfprint.py", line 24, in <module>
    import audfprint_analyze
  File "/audfprint/audfprint_analyze.py", line 26, in <module>
    import librosa
  File "/usr/local/lib/python2.7/dist-packages/librosa/__init__.py", line 12, in <module>
    from . import core
  File "/usr/local/lib/python2.7/dist-packages/librosa/core/__init__.py", line 108, in <module>
    from .time_frequency import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python2.7/dist-packages/librosa/core/time_frequency.py", line 10, in <module>
    from ..util.exceptions import ParameterError
  File "/usr/local/lib/python2.7/dist-packages/librosa/util/__init__.py", line 70, in <module>
    from . import decorators
  File "/usr/local/lib/python2.7/dist-packages/librosa/util/decorators.py", line 67, in <module>
    from numba.decorators import jit as optional_jit
  File "/usr/local/lib/python2.7/dist-packages/numba/__init__.py", line 9, in <module>
    from . import config, errors, runtests, types
  File "/usr/local/lib/python2.7/dist-packages/numba/config.py", line 11, in <module>
    import llvmlite.binding as ll
  File "/usr/local/lib/python2.7/dist-packages/llvmlite/binding/__init__.py", line 10, in <module>
    from .module import *
  File "/usr/local/lib/python2.7/dist-packages/llvmlite/binding/module.py", line 8, in <module>
    from .value import ValueRef
  File "/usr/local/lib/python2.7/dist-packages/llvmlite/binding/value.py", line 9, in <module>
    class Linkage(enum.IntEnum):
AttributeError: 'module' object has no attribute 'IntEnum'

anyone has same issue?

Retrieving more than 1 response for a "match" request

on a database of 50k tracks, many of which are almost duplicates (the works of Elvis on many albums), I only got one response back for hounddog and it was not even the one that it was orginally sampled from.

Is there a parameter to retrieve more hits?

-brewster

Choose landmarks randomly within "landing zone"

Currently, peaks are paired with peaks in the allowable time/frequency range on a first-come, first-served basis until the fan out is exhausted. To match more uniformly between samples with widely differing amounts of clutter (or even different densities), the fan out should be a uniform random sample from the peaks in the range.

Shorter query question

I was doing some test with the code.
The only difference is that now the query is 1s. To make it work, I increase the landmark density and change the target zone to a small one(63bins and 15 symbols). Even without noise and degradation, the recognition rate is just 92%. Is this expected? As the code is designed mainly for query 5sec at least?

Listing items in the database

Hi,

Is is the any way to list the items currently in a database?

thanks

search by track id/name?

Hello,
Is it possible to search in hashes which track_id or filename is known? not from whole database? If possible where can I begin? I am new to python and don't understand the codes very well. I need some help please.

I believe that if database is huge (1000 s of tracks) and lot of query takes lot of resources and time, right? This filtering helps a lot? or not?

About max bucketsize.

Hello dpwe,
The error was occurred when I set --bucketsize 512 or more.
Why max bucket num is less than 511?
Sorry my poor English.
Thanks.

Support larger hashes

Rather than hard-coding to store pairs of peaks, allow for more informative (but more miss-liable) hashes by combining three or more peaks.

Large memory footprint and computational times

The software seems to take over my whole Debian Jessie dual quad core machine (Intel i7) when performing a pre compute on a 24 hour video obtained from http://oireachtasdebates.oireachtas.ie/ with ncores=1. The files are about 4-6.4Gb in size but should they be fully loaded at the time of processing? Or does the algorithm require full loading of the video to jump around within it? I'm guessing that the full RAM and swap are depleted. free under Linux shows that I have 6.4Gb used and 1.6Gb so it seems to be proportional to the size of the file being processed. The load average on my machine is over 9!!! Can I add an option to reduce this problem? I know I can use avconv to split the videos into 1hr segments but it's not a great workaround.

Also some process in 2 minutes whilst others take 150 minutes.

I'm using density=50.

Scaling Question

Hi,

I have a few questions regarding scaling:

Considerng current implementation of HashTable, which holds 1,048,576 unique hashes with each hash holds 100 entries, you've explained that this configuration is able to store more than 100k song tracks. If I have millions of song tracks, would it lessen the accuracy of detection against all the millions of song tracks? Because I understand that when the bucket is filled (over 100 entries) for a specific hash, that specific hash's bucket it gets randomly replaced in a random position of the entries between 1 to 100 (replaced by new song / track id).
If number 1 is correct, if I increase the number of unique hashes plus increasing the bucket entries for each hash, will it allow better detection against millions of song / tracks? I think it will have tradeoff with performance / speed.

Thanks.

Provide for removal of items from database

Gives varied results if computed vice versa

So I was dabbling with the implementation of the code wherein I was searching for a small audio clip(abc) into a larger recording(xyz).
When I pickle "abc" and the run the match command with "xyz" as the query, I revive perfect results, but the time detected is a total chaos when i do it vice versa; pickle xyz and match abc.
Any idea why?
Thanks a ton!

Couldn't match the query.mp3

thanks for your code.

but I couldn't match the query.mp3 in tests/data/query.mp3

I test as follows:
python3 audfprint.py new --dbase fpdbase.pklz tests/data/Nine_Lives/0*.mp3
python3 audfprint.py add --dbase fpdbase.pklz tests/data/Nine_Lives/1*.mp3
python3 audfprint.py match --dbase fpdbase.pklz tests/data/query.mp3

Then get the result:
NOMATCH tests/data/query.mp3 5.6 sec 267 raw hashes

so i didn't know why the result is not same as in README? Thank you very much, wish to your replay.

Test Audio

Hello,

I am looking at converting this to Python3. I am looking over the tests and profiling but I can't seem to find any of the associated audio file. Are they available?

IndexError: too many indices for array

Hi, having issues with recognising a clip using this script.

This is the command I used:
python audfprint.py match --dbase db1.pklz SS9-18.wav

And this is the traceback I get:
Read fprints for 4864 files ( 25037164 hashes) from db1.pklz (2.75% dropped)
Traceback (most recent call last):
File "audfprint.py", line 490, in
main(sys.argv)
File "audfprint.py", line 473, in main
strip_prefix=args['--wavdir'])
File "audfprint.py", line 156, in do_cmd
msgs = matcher.file_match_to_msgs(analyzer, hash_tab, filename, num)
File "/home/ben/audio_recognition-master/bin/audfprint/audfprint_match.py", line 379, in file_match_to_msgs
rslts, dur, nhash = self.match_file(analyzer, ht, qry, number)
File "/home/ben/audio_recognition-master/bin/audfprint/audfprint_match.py", line 355, in match_file
q_hashes = analyzer.wavfile2hashes(filename)
File "/home/ben/audio_recognition-master/bin/audfprint/audfprint_analyze.py", line 407, in wavfile2hashes
self.peaks2landmarks(peaklist)))
File "/home/ben/audio_recognition-master/bin/audfprint/audfprint_analyze.py", line 89, in landmarks2hashes
hashes[:, 0] = landmarks[:, 0]
IndexError: too many indices for array

Time is of the essence here unfortunately, and any help would be appreciated 😄

Accuracy against Noise

Hi DAn,

Currently I'm testing the audfprint matching accuracy for mp3 song samples each with 60 seconds in length, against their own extracted parts of the audio with randomized starting position and varying lengths between 5 and 15 seconds, so that the matched audio is a part of the ingested audio which makes them identical in terms of their waveform. The tests went perfectly with 100% match. But then I've added some noise to into the randomize-extracted parts by merging noise sound with the clean ones. There are 3 type of noises, all of them are Brownian noise with 0.2 amplitude (low noise), 0.5 amplitude (medium) and 0.8 amplitude (high noise). The scale of amplitude noises is between 0 and 1. This test resulted in various results, but the bottom line is, the higher the noise, the probability of NOMATCH is increases.

To increase the accuracy against such tests (mixed with noises, even though they are identical), by your suggestion, what parameters should I tune? I'm afraid if I blindly test changing each of them, I may actually not know if I hit the right spot or not. I know there's a lot of them (--density, --pks-per-frame, etc), I also try to understand what each of them actually do, I'm getting there, but still needs to learn a lot :D. Thanks.