Comments (3)
Good questions.
In the HDFS variant, the whole key for performance, if any, was: "move computation to the da
ta", which was achieved by maintaining multiple parts of a big file across machines as multiple copies.How is this addressed in pfs - does it also maintain multiple copies of same file as multiple blocks / parts (if yes, how) ?
Pfs has a similar concept of bringing the computation to the data. Right now we don't do block storage for large files but we'll be implementing that in the near future. I've opened #58 to track that issue and there's a more in depth description of what the implementation will look like.
In the MapReduce variant of hadoop, for any given job, same copy of map-reduce method gets run on all machines, which makes it not suitable for data-flow computations, and which is not the case or concept for 'micro-services'. The core concept of micro-services is to build larger complex computation out of small and different disparate components /services. Which gives more control because you can tune / control individual micro-services without affecting the whole network.
I'm not 100% sure I understand what's being asked here so apologies if my answer doesn't address your concerns. Pachyderm organizes jobs in to pipelines, this gives it an understanding of where the data's next destination is which allows us to efficiently stream data between jobs.
from pachyderm.
Thanks @jdoliner Will look into the referred issues (such as #58) to get more info on this. Will get back in case of any questions.
from pachyderm.
Closing. @KrishnaPG If you have other questions, feel free ask on GH or email us [email protected].
from pachyderm.
Related Issues (20)
- Local chunk error when listing/reading files after restarting Docker Desktop HOT 1
- `pachctl logs` help text is wrong HOT 1
- There is no `pachctl create project` support in pachctl HOT 2
- Spout pipeline can't be restarted HOT 1
- Service pipeline stops serving static files after new data committed HOT 9
- Directory path collision error - pipeline that fails HOT 2
- pachctl get file returns 'branch "master" not found in repo'
- Can't run pachctl on WSL2 HOT 6
- Integrate pull request preview environments HOT 2
- pachtl put_file pfs folder specification HOT 2
- Console styling problems in airgapped (offline) environment HOT 2
- Proxy configuration does not honor no_proxy variable with hostname HOT 3
- Pachd says running but is Never Ready HOT 1
- Vulnerability of dependency "github.com/containerd/containerd"
- Examine Golang Arenas for GC Performance
- wrong proxy port in local deployment tutorial HOT 3
- 429 error when doing a put file using a url in the Pachyderm tutorial HOT 1
- Offer Database Hosting Locally instead of AWS s3
- pgbouncer cannot connect to server
- Unable to connect to PachD HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pachyderm.