Git Product home page Git Product logo

Comments (3)

Powback avatar Powback commented on June 12, 2024

If I understand it correctly, the file will only be placed into memory if the filesize is less than bufcount.

It seems rater strange that the -r parameter would be relevant if the buffersize has to be over the filesize for it to be loaded into memory at all.

I would assume that instead of trying to load the whole file into memory, it would load a portion of it into memory, process that, and move on to another portion.

from crackstation-hashdb.

defuse avatar defuse commented on June 12, 2024

The sorting is done with quicksort which works by successively breaking the problem down into sorting problems of approximately half the size. If the file is bigger than the memory limit, it'll need to do the first iterations of quicksort on disk instead of in memory, and once it gets broken down into small enough pieces it'll begin to do the sorting in RAM (which is much faster).

When I built the CrackStation.net databases (which are about 15 billion entries, 1/42 the size of your database) it took about a week with 16GB of RAM... so based on that you'd probably have to improve the sorting code to get your database sorted in a reasonable amount of time.

One idea is to change the code to buffer the left and right halves in RAM and then write them out to disk all at once when the buffer gets full, so that the disk is being read and written to linearly instead of randomly. Another idea is to switch to mergesort, merging whole buffer-sized chunks at a time in RAM. Or maybe there's a fancy sorting algorithm that would do even better.

from crackstation-hashdb.

Powback avatar Powback commented on June 12, 2024

Thank you for your reply.
The timeframe for 15 billion entries at 16GB RAM puts the estimated timeframe into perspective.

I'm not sure why Sortidx got halted for two days, it might be due to a slow HDD.

I got the sorting time down to about 8 hours with 630 000 000 entries when the entire file was loaded into memory.

An other solution would be to split the database into sections that fits into ram, and then search through all databases in the end, bringing the search-time up a bit, but the sorting time down by quite a lot.

This would also allow me to progressively add to the combined database instead of having to re-sort.

I will look into the suggestions as well.
Thanks a lot!

from crackstation-hashdb.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.