Git Product home page Git Product logo

Comments (3)

jdoliner avatar jdoliner commented on May 22, 2024

Good questions.

In the HDFS variant, the whole key for performance, if any, was: "move computation to the da
ta", which was achieved by maintaining multiple parts of a big file across machines as multiple copies.

How is this addressed in pfs - does it also maintain multiple copies of same file as multiple blocks / parts (if yes, how) ?

Pfs has a similar concept of bringing the computation to the data. Right now we don't do block storage for large files but we'll be implementing that in the near future. I've opened #58 to track that issue and there's a more in depth description of what the implementation will look like.

In the MapReduce variant of hadoop, for any given job, same copy of map-reduce method gets run on all machines, which makes it not suitable for data-flow computations, and which is not the case or concept for 'micro-services'. The core concept of micro-services is to build larger complex computation out of small and different disparate components /services. Which gives more control because you can tune / control individual micro-services without affecting the whole network.

I'm not 100% sure I understand what's being asked here so apologies if my answer doesn't address your concerns. Pachyderm organizes jobs in to pipelines, this gives it an understanding of where the data's next destination is which allows us to efficiently stream data between jobs.

from pachyderm.

KrishnaPG avatar KrishnaPG commented on May 22, 2024

Thanks @jdoliner Will look into the referred issues (such as #58) to get more info on this. Will get back in case of any questions.

from pachyderm.

JoeyZwicker avatar JoeyZwicker commented on May 22, 2024

Closing. @KrishnaPG If you have other questions, feel free ask on GH or email us [email protected].

from pachyderm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.