Git Product home page Git Product logo

Comments (6)

SamStudio8 avatar SamStudio8 commented on August 16, 2024

Some servers really seem to struggle with generating MD5 quickly. We could set the partial_limit and partial_sample on a per-node basis, as long as all clients are instructed to hash in that way. This is only problematic when that file is copied to a different node, where it won't be a hash friend...

from chitin.

SamStudio8 avatar SamStudio8 commented on August 16, 2024

Perhaps copied files will have to maintain the hashing instructions from their origin. Moving towards a system where records can be sync'd between machines, it's probably not a terrible idea for a list of nodes to be distributed with their hashing setups?

from chitin.

SamStudio8 avatar SamStudio8 commented on August 16, 2024

I really want to avoid baking these parameters in, or having users set the parameters themselves.

from chitin.

SamStudio8 avatar SamStudio8 commented on August 16, 2024

A year later and this is indeed pretty terrible. The solution remains slow for large files on remote distributed file systems like cephfs:

Nov 20 00:11:00 sam-ganon chitind: Hashed [...] (134.14GB in 1:13:04.415635)

from chitin.

SamStudio8 avatar SamStudio8 commented on August 16, 2024

alright I've made this fewer garbage now. 5135ca8 changes the large file hashing method from seeking through the file and taking blocksize sized samples, to taking much larger consecutive samples to take better advantage of caching. this reduces the number of seeks but makes the gaps between blocks much larger. the hash has never been about file security, but general integrity. users who are worried about this could just set all files to be hashed i suppose.

from chitin.

SamStudio8 avatar SamStudio8 commented on August 16, 2024

im gonna close this for now so it looks like we are making progress but i fully expect to see another issue about the hashing system raised

from chitin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.