Git Product home page Git Product logo

Comments (10)

josevalim avatar josevalim commented on August 26, 2024 3

I have started this. I decided to keep the _stages naming. We will have:

  • from_stages (producer, producer_consumer), through_stages (producer_consumer), and into_stages (producer_consumer, consumer)

  • from_specs (producer, producer_consumer), through_specs (producer_consumer), and into_specs (producer_consumer, consumer)

The former receives already running processes. It is already in master. The second will receive supervisor child specs so we start those processes as part of flow.

Thoughts?

/cc @lackac

from flow.

CptnKirk avatar CptnKirk commented on August 26, 2024 1

Maybe we should look at the overall picture for naming and purpose. What is the end goal of the Flow library? Feature parity with Akka Streams? I.e., support full graphs including cycles, multiple input/output components, direct vs async invocation flow invocation.

Akka Streams has a dizzying array of functionality and they're probably on their 8th generation at this point (including many failed starts and iterations).

First, is GenStage Flow's (only) component model? Internally Flow isn't implemented purely as a concatenation of internal GenStages. Should GenStage be the means by which 3rd parties add additional pseudo-DSL support to Flow? Is it Flow's primary component architecture?

GenStage is nice and it brings reactive manifesto concepts to Elixir. But I'm not sure it's the best component model for Flow. Elixir gets a lot of mileage from Enumerable and Collectable protocols, allowing for a rich composition of functions and linear data flows around them.

Now we're looking to provide the next generation of data flow concepts supporting non-linear flows, concurrent parallel flows, all while supporting non-blocking backpressure. I think Flow's component model probably ought to be protocol-based, with the Flow library being primarily responsible for materializing flow resources and coordinating the asynchronous backpressure aspects. Akka is famous for its Actor System, however, the Akka Streams component model isn't actor based.

Flow needs to incorporate GenStages better. But how it does so should be driven by GenStage's desired place within Flow. This was a long way around, but how GenStages ought to fit into Flow will influence the API that supports them. Since the semantics around the function matter.

Call could be:
Flow.add_stage/3 # (flow, stage, options)
Flow.add_and_start_stage/3
Flow.add_graph(stage_to_graph(stage))
Flow.add_producer_consumer/3
Flow.add_and_start_producer_consumer/3

My thinking is that while Flow may ultimately start stages, that the public API wouldn't include any start_stage like APIs. Flow seems to be taking a lifted approach, where it builds up a blueprint and at the end activates it. Most starting should happen at the end when Flow goes to execute that blueprint.

If started GenStages need to be incorporated into a Flow that's ok, a GenStage aware proxy component can be used within the Flow blueprint and do the appropriate thing when that blueprint is materialized and executed. Flow might have helper functions that assist with this, but I wouldn't expect it as part of the primary DSL.

from flow.

CptnKirk avatar CptnKirk commented on August 26, 2024

tl;dr

Stitching multiple flows together via from_stage, into_stages doesn't provide the same behavior as having a GenStage producer_consumer acting as a function within the scope of an overall flow.

We're currently discussing possible naming and behavior in the thread.

Flow.into_producer_consumer/2
Flow.start_producer_consumer
Flow.start_stage?

from flow.

josevalim avatar josevalim commented on August 26, 2024

The goal of Flow is not parity with Akka Streams and it is closer to something like Apache Spark but focused on concurrency/single node (at least for now). What you describe should probably be a separate project with its own goals and ideas. I think the APIs proposed earlier (start_producers/from_producers) and similar fit well into Flow because at least you can keep the supervision tree in a single place instead of scattering it around. But we don't plan to go anywhere beyond that.

from flow.

josevalim avatar josevalim commented on August 26, 2024

Just to be super clear, I think all of this is outside of Flow's scope:

Now we're looking to provide the next generation of data flow concepts supporting non-linear flows, concurrent parallel flows, all while supporting non-blocking backpressure. I think Flow's component model probably ought to be protocol-based, with the Flow library being primarily responsible for materializing flow resources and coordinating the asynchronous backpressure aspects. Akka is famous for its Actor System, however, the Akka Streams component model isn't actor based.

Flow is mostly about focusing on the data and not about the graph. You partition because the data requires it, not because of the graph or because of back-pressure.

from flow.

CptnKirk avatar CptnKirk commented on August 26, 2024

Ok. Makes sense. I look forward to start_stage/through_stage or whatever APIs then.

Is a more general stream processing library something the core team is interested in looking into? Seems that a Spark implementation should ultimately be built on top of that foundation.

from flow.

josevalim avatar josevalim commented on August 26, 2024

from flow.

lackac avatar lackac commented on August 26, 2024

@josevalim I like the semantics of the naming. Simple, in line with the current scheme, yet expressive.

How are you planning to supervise the specs? If I'm not mistaken currently there's a Coordinator process which is not a supervisor, but start_links one. This then supervises all GenStage processes. Would the specs given to from_specs, through_specs, and into_specs be on the same level? In what order are they started?

There might be simple answers to these, but when I did this manually I found that it wasn't that easy to figure out the right order. In any case it would be great to include this information in the docs for the new functions. I like the detail you put in the docs of through_stages and into_stages.

Good progress! I didn't expect you to have time for this while dealing with the Elixir 1.7 release. :)

from flow.

josevalim avatar josevalim commented on August 26, 2024

@lackac the specs are started under the same supervision tree as the flow (the one under the GenServer) and the stages are always started in order. So producers -> producers_consumers -> producers_consumers -> producers_consumers -> producers_consumers -> consumers.

I have just pushed the code and the docs. So a review and pull requests of any area of improvement is welcome. :) I will cut a new release tomorrow. Thanks for the review so far!

from flow.

lackac avatar lackac commented on August 26, 2024

@josevalim sorry, I was offline for most of today. For what it's worth, the changes up to through_stages looked good to me on paper, but haven't had a chance to try them yet. I'm going to work on a project that uses Flow again tomorrow so will have a chance to test both sets of functions. I'm planning to settle on using from_specs and into_specs instead of the hand-rolled supervision we're doing now. I'll report back after that.

from flow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.