Comments (6)
Some servers really seem to struggle with generating MD5 quickly. We could set the partial_limit and partial_sample on a per-node basis, as long as all clients are instructed to hash in that way. This is only problematic when that file is copied to a different node, where it won't be a hash friend...
from chitin.
Perhaps copied files will have to maintain the hashing instructions from their origin. Moving towards a system where records can be sync'd between machines, it's probably not a terrible idea for a list of nodes to be distributed with their hashing setups?
from chitin.
I really want to avoid baking these parameters in, or having users set the parameters themselves.
from chitin.
A year later and this is indeed pretty terrible. The solution remains slow for large files on remote distributed file systems like cephfs:
Nov 20 00:11:00 sam-ganon chitind: Hashed [...] (134.14GB in 1:13:04.415635)
from chitin.
alright I've made this fewer garbage now. 5135ca8 changes the large file hashing method from seeking through the file and taking blocksize sized samples, to taking much larger consecutive samples to take better advantage of caching. this reduces the number of seeks but makes the gaps between blocks much larger. the hash has never been about file security, but general integrity. users who are worried about this could just set all files to be hashed i suppose.
from chitin.
im gonna close this for now so it looks like we are making progress but i fully expect to see another issue about the hashing system raised
from chitin.
Related Issues (20)
- Make directory `Item`s more useful HOT 3
- Don't quit on CTRL-C
- Automated absolute paths can cause unexpected behaviour HOT 1
- Have a FUSE backed file system
- `EventSet` to house executions of `%script` HOT 1
- Report number of running jobs in toolbar HOT 1
- Be less dumb about integrity checks HOT 3
- multiprocessing tracking bug HOT 6
- Is multiprocessing a good idea anyway? HOT 5
- Apply command to... HOT 2
- Daemon to handle a queue to make database insertions HOT 1
- [ffs] Race condition in closing scripts with easy jobs and lots of procs
- Allow re-running of an experiment automatically HOT 1
- [tracking] Pain points HOT 2
- Automated variable replacement in %script is great until you try to use awk HOT 1
- Dashboard
- CommandBlock could take the "script" meta category to reduce some rows HOT 1
- chitin3 Roadmap HOT 11
- Generate uuid cmd_str on server HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chitin.