machine info : i3 2nd series 4 cores, 8gb ram, 256 gb ssd, xubuntu 16.04 query rec

db creation args: adsa = ['audfprint.py', 'new', '--db

Note that the np.bincount logic was changed in <a class="commit-link" data-hovercard-t

MemoryError about audfprint HOT 6 CLOSED

shuvuu commented on August 16, 2024

MemoryError

from audfprint.

Comments (6)

shuvuu commented on August 16, 2024

xubuntu task manager shows this process was using ~ 1.5 GB memory.
when multi core each process was using ~ 1 GB

from audfprint.

shuvuu commented on August 16, 2024

when I use different database
Read fprints for 641 files ( 58808989 hashes) from /afp/songs.db
when task manager shows process using ~ 2.8 GB ram

--> database density is too much?

from audfprint.

shuvuu commented on August 16, 2024

I just upgraded ram to 16 gb and still got same error. what should I do?

from audfprint.

dpwe commented on August 16, 2024

That's interesting. The crash is occurring with np.bincount. This step is an optimization for finding the modal time shift by building a huge histogram in memory. It's memory consumption is proportional to the product of number of tracks in the reference database, maxtime, and the max duration of both reference items and query (depending on which items are involved). Breaking the query into smaller pieces may fix it. Try it with the first 10 minutes. BTW, --nshifts 6 looks very odd. I think you're better off not specifying that flag. --nshifts 0 for the query might help. DAn.

…

On Thu, Jan 26, 2017 at 3:59 AM, shuvuu ***@***.***> wrote: I just upgraded ram to 16 gb and still got same error. what should I do? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAhs0QG1ztFr3faWSlyvaTDeUQMRR5uEks5rWGBdgaJpZM4LuWd5> .

from audfprint.

shuvuu commented on August 16, 2024

db creation args: adsa = ['audfprint.py', 'new', '--dbase', songs_db_file, '--density', '40', '--skip-existing', '--maxtime', '32768', '--ncores', '4', '--list', tmpFile] nshifts is not set or 0
db : 8627 files ( 85002334 hashes), file size: 243M
querying file duration 1 hour. still got same error.
full error log:


...........................................................................
/afp/bin/audfprint.py in main(argv=['audfprint.py', 'match', '--dbase', '/afp/mongol_0127.db', '--match-win', '2', '--min-count', '200', '--max-matches', '100', '--sortbytime', '--opfile', '/afp/tmp/songs_.lst', '--ncores', '4', '--find-time-range', '--list', '/afp/tmp/run_.lst'])
    463         do_cmd_multiproc(cmd, analyzer, hash_tab, filename_iter,
    464                          matcher, args['--precompdir'],
    465                          precomp_type, report,
    466                          skip_existing=args['--skip-existing'],
    467                          strip_prefix=args['--wavdir'],
--> 468                          ncores=ncores)
        ncores = 4
    469     else:
    470         do_cmd(cmd, analyzer, hash_tab, filename_iter,
    471                matcher, args['--precompdir'], precomp_type, report,
    472                skip_existing=args['--skip-existing'],

...........................................................................


/afp/bin/audfprint.py in do_cmd_multiproc(cmd='match', analyzer=<audfprint_analyze.Analyzer object>, hash_tab=<hash_table.HashTable object>, filename_iter=<generator object filename_list_iterator>, matcher=<audfprint_match.Matcher object>, outdir='.', type='hashes', report=<function report>, skip_existing=False, strip_prefix='', ncores=4)
    250         msgslist = joblib.Parallel(n_jobs=ncores)(
    251             # Would use matcher.file_match_to_msgs(), but you
    252             # can't use joblib on an instance method
    253             joblib.delayed(matcher_file_match_to_msgs)(matcher, analyzer,
    254                                                        hash_tab, filename)
--> 255             for filename in filename_iter
        filename_iter = <generator object filename_list_iterator>
    256         )
    257         for msgs in msgslist:
    258             report(msgs)
    259

...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=4), iterable=<generator object <genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=4)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print('Done %3i out of %3i | elapsed: %s finished',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
MemoryError                                        Fri Jan 27 11:12:56 2017
PID: 25956                                   Python 2.7.12: /usr/bin/python
...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function matcher_file_match_to_msgs>
        args = (<audfprint_match.Matcher object>, <audfprint_analyze.Analyzer object>, <hash_table.HashTable object>, '/afp/precomp/input_file.afpt')
        kwargs = {}
        self.items = [(<function matcher_file_match_to_msgs>, (<audfprint_match.Matcher object>, <audfprint_analyze.Analyzer object>, <hash_table.HashTable object>, '/afp/precomp/input_file.afpt'), {})]
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/afp/bin/audfprint.py in matcher_file_match_to_msgs(matcher=<audfprint_match.Matcher object>, analyzer=<audfprint_analyze.Analyzer object>, hash_tab=<hash_table.HashTable object>, filename='/afp/precomp/input_file.afpt')
    227         pr[core].join()
    228
    229
    230 def matcher_file_match_to_msgs(matcher, analyzer, hash_tab, filename):
    231     """Cover for matcher.file_match_to_msgs so it can be passed to joblib"""
--> 232     return matcher.file_match_to_msgs(analyzer, hash_tab, filename)
        matcher.file_match_to_msgs = <bound method Matcher.file_match_to_msgs of <audfprint_match.Matcher object>>
        analyzer = <audfprint_analyze.Analyzer object>
        hash_tab = <hash_table.HashTable object>
        filename = '/afp/precomp/input_file.afpt'
    233
    234 def do_cmd_multiproc(cmd, analyzer, hash_tab, filename_iter, matcher,
    235                      outdir, type, report, skip_existing=False,
    236                      strip_prefix=None, ncores=1):

...........................................................................
/afp/bin/audfprint_match.py in file_match_to_msgs(self=<audfprint_match.Matcher object>, analyzer=<audfprint_analyze.Analyzer object>, ht=<hash_table.HashTable object>, qry='/afp/precomp/input_file.afpt', number=None)
    330         return (rslts[:self.max_returns, :], durd, len(q_hashes))
    331
    332     def file_match_to_msgs(self, analyzer, ht, qry, number=None):
    333         """ Perform a match on a single input file, return list
    334             of message strings """
--> 335         rslts, dur, nhash = self.match_file(analyzer, ht, qry, number)
        rslts = undefined
        dur = undefined
        nhash = undefined
        self.match_file = <bound method Matcher.match_file of <audfprint_match.Matcher object>>
        analyzer = <audfprint_analyze.Analyzer object>
        ht = <hash_table.HashTable object>
        qry = '/afp/precomp/input_file.afpt'
        number = None
    336         t_hop = analyzer.n_hop/float(analyzer.target_sr)
    337         if self.verbose:
    338             qrymsg = qry + (' %.1f '%dur) + "sec " + str(nhash) + " raw hashes"
    339         else:

...........................................................................
/afp/bin/audfprint_match.py in match_file(self=<audfprint_match.Matcher object>, analyzer=<audfprint_analyze.Analyzer object>, ht=<hash_table.HashTable object>, filename='/afp/precomp/input_file.afpt', number=None)
    321                 numberstring = ""
    322             print time.ctime(), "Analyzed", numberstring, filename, "of", \
    323                   ('%.3f'%durd), "s " \
    324                   "to", len(q_hashes), "hashes"
    325         # Run query
--> 326         rslts = self.match_hashes(ht, q_hashes)
        rslts = undefined
        self.match_hashes = <bound method Matcher.match_hashes of <audfprint_match.Matcher object>>
        ht = <hash_table.HashTable object>
        q_hashes = [(2, 41037), (2, 41104), (2, 41740), (2, 245764), (2, 247569), (2, 247688), (6, 247565), (6, 247631), (6, 247684), (9, 532109), (9, 532234), (9, 532427), (10, 372496), (10, 372617), (10, 372683), (10, 696590), (10, 697361), (10, 698774), (10, 983322), (10, 986129), ...]
    327         # Post filtering
    328         if self.sort_by_time:
    329             rslts = rslts[(-rslts[:, 2]).argsort(), :]
    330         return (rslts[:self.max_returns, :], durd, len(q_hashes))

...........................................................................
/afp/bin/audfprint_match.py in match_hashes(self=<audfprint_match.Matcher object>, ht=<hash_table.HashTable object>, hashes=[(2, 41037), (2, 41104), (2, 41740), (2, 245764), (2, 247569), (2, 247688), (6, 247565), (6, 247631), (6, 247684), (9, 532109), (9, 532234), (9, 532427), (10, 372496), (10, 372617), (10, 372683), (10, 696590), (10, 697361), (10, 698774), (10, 983322), (10, 986129), ...], hashesfor=None)
    276         bestids, rawcounts = self._best_count_ids(hits, ht)
    277
    278         #log("len(rawcounts)=%d max(bestcountsixs)=%d" %
    279         #    (len(rawcounts), max(bestcountsixs)))
    280         if not self.exact_count:
--> 281             results = self._approx_match_counts(hits, bestids, rawcounts)
        results = undefined
        self._approx_match_counts = <bound method Matcher._approx_match_counts of <audfprint_match.Matcher object>>
        hits = array([[    335,    3861,   41037,       2],
   ...  8163, -147252,  966850,  151424]], dtype=int32)
        bestids = array([15350, 15226, 15289, 15116, 14836, 15300,... 15112, 15299, 15322, 15196, 15037], dtype=int32)
        rawcounts = array([ 9946,  5445, 10457, 11158, 12669, 11451,... 7676, 12615,  7222,  8288,  6512,  9495,  9505])
    282         else:
    283             results = self._exact_match_counts(hits, bestids, rawcounts,
    284                                                hashesfor)
    285         # Sort results by filtered count, descending

...........................................................................
/afp/bin/audfprint_match.py in _approx_match_counts(self=<audfprint_match.Matcher object>, hits=array([[    335,    3861,   41037,       2],
   ...  8163, -147252,  966850,  151424]], dtype=int32), ids=array([15350, 15226, 15289, 15116, 14836, 15300,... 15112, 15299, 15322, 15196, 15037], dtype=int32), rawcounts=array([ 9946,  5445, 10457, 11158, 12669, 11451,... 7676, 12615,  7222,  8288,  6512,  9495,  9505]))
    229         mintime = np.amin(alltimes)
    230         alltimes -= mintime
    231         nresults = 0
    232         # Hash IDs and times together, so only a single bincount
    233         timebits = max(1, encpowerof2(np.amax(alltimes)))
--> 234         allbincounts = np.bincount((allids << timebits) + alltimes)
        allbincounts = undefined
        allids = array([  335,   409,   673, ..., 14474, 13210,  8163])
        timebits = 18
        alltimes = array([155263, 151879, 153009, ...,   1895,   1433,   4150])
    235         min_time = 0
    236         max_time = 0
    237         for urank, (id, rawcount) in enumerate(zip(ids, rawcounts)):
    238             # Make sure id is an int64 before shifting it up.

MemoryError:
______________________

from audfprint.

dpwe commented on August 16, 2024

Note that the np.bincount logic was changed in e64d933 to drastically reduce the peak memory usage. This should be fixed.

from audfprint.

MemoryError about audfprint HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent