I have a database with 630,000,000 entries, and a modified HashDB and sortidx that use

Sortidx tries to load the entire DB into memory instead of just sections, making the -r parameter irrelevant. about crackstation-hashdb HOT 3 CLOSED

defuse commented on June 12, 2024

Sortidx tries to load the entire DB into memory instead of just sections, making the -r parameter irrelevant.

from crackstation-hashdb.

Comments (3)

Powback commented on June 12, 2024

If I understand it correctly, the file will only be placed into memory if the filesize is less than bufcount.

It seems rater strange that the -r parameter would be relevant if the buffersize has to be over the filesize for it to be loaded into memory at all.

I would assume that instead of trying to load the whole file into memory, it would load a portion of it into memory, process that, and move on to another portion.

from crackstation-hashdb.

defuse commented on June 12, 2024

The sorting is done with quicksort which works by successively breaking the problem down into sorting problems of approximately half the size. If the file is bigger than the memory limit, it'll need to do the first iterations of quicksort on disk instead of in memory, and once it gets broken down into small enough pieces it'll begin to do the sorting in RAM (which is much faster).

When I built the CrackStation.net databases (which are about 15 billion entries, 1/42 the size of your database) it took about a week with 16GB of RAM... so based on that you'd probably have to improve the sorting code to get your database sorted in a reasonable amount of time.

One idea is to change the code to buffer the left and right halves in RAM and then write them out to disk all at once when the buffer gets full, so that the disk is being read and written to linearly instead of randomly. Another idea is to switch to mergesort, merging whole buffer-sized chunks at a time in RAM. Or maybe there's a fancy sorting algorithm that would do even better.

from crackstation-hashdb.

Powback commented on June 12, 2024

Thank you for your reply.
The timeframe for 15 billion entries at 16GB RAM puts the estimated timeframe into perspective.

I'm not sure why Sortidx got halted for two days, it might be due to a slow HDD.

I got the sorting time down to about 8 hours with 630 000 000 entries when the entire file was loaded into memory.

An other solution would be to split the database into sections that fits into ram, and then search through all databases in the end, bringing the search-time up a bit, but the sorting time down by quite a lot.

This would also allow me to progressively add to the combined database instead of having to re-sort.

I will look into the suggestions as well.
Thanks a lot!

from crackstation-hashdb.

Sortidx tries to load the entire DB into memory instead of just sections, making the -r parameter irrelevant. about crackstation-hashdb HOT 3 CLOSED

Comments (3)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent