Hi, I'm looking into delegating some of the scalability issues to a

Trying to use a database about audfprint HOT 2 CLOSED

dpwe commented on July 17, 2024

Trying to use a database

from audfprint.

Comments (2)

Laurian commented on July 17, 2024

More specific, I can store in a database the data which def store(self, name, timehashpairs) is called with; what I'm not sure is how to plug my database data into def get_hits(self, hashes) function.

Should I skip any hashmask/timemaks bits? I understand those were needed for the current limitations of the hash table.

from audfprint.

dpwe commented on July 17, 2024

In principle, the hash table builds an index that, for each of the 2^20 hash values, stores the tracks that include that hash (and the time frames within those tracks at which the hash occurs). hashtable.store(track_name, time_hash_pairs_in_track) records all the hashes for a particular track (specified by name, but internally represented by an index); hashtable.get_hits(time_hash_pairs_in_query) gathers all the tracks that include each of the hashes present in the query by merging each of per-hash lists in the database.

To convert this to a different database, you set up the database to store (track_id, time_frame_in_track) pairs indexed by each hash. Then, on get_hits, you go through each hash, retrieve the list of associated (track_id, time_frame) pairs, then calculate the per-hash per-track data row, which contains (track_id, time_frame_in_track - time_frame_in_query, hash, time_frame_in_track).

The second value (time frame difference) is computed for convenience: this is the value that will be approximately constant for multiple hashes matching from a common piece of audio. The match logic then looks, for each matched track_id), for the time difference with the greatest number of hashes.

The hashmask/timemask stuff is just to allow me to store the (track_id, time_frame_in_track) pairs in a single int32. If you're going full-database, you probably don't want to mess with that level of optimization, just store the pairs as-is.

from audfprint.

Trying to use a database about audfprint HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent