Git Product home page Git Product logo

Comments (9)

mmalohlava avatar mmalohlava commented on May 28, 2024

Hi Raghav,

thank you for the comment!

First, let me clarify motivation for Sparkling Water: from our point of view we want to enable Spark
and H2O users to use both platforms together easily and hence bring benefits for both platforms.
Mainly, if you have an existing Spark workflow and you would like to use advanced ML toolbox or
other services provided by H2O.

Definitely, I can imagine much more tighter integration using Spark execution primitives, but
current integration (which can be considered really simple) allows us to provide all H2O services
(including UI, R/Python connectors) on the top of Spark, evaluate users demands, and mainly to
create non-trivial applications on the top of both platforms.

I am not a Flink expert, however based on discussion with Flink guys Kostas and Stephan we figured
out that the integration of H2O with Flink (in the way as it is done in Sparkling Water) should be
straight forward (if i remember well they mentioned a few technical obstacles which were considered
more like cosmetics details).

I can still see benefits for Flink and H2O from integration - the same benefits which we stress for
Sparkling Water - although the integration would not be technically perfect.

Please let me know if you would like to discuss integration in more details, or do code review!
Thank you!
michal

Dne 4/16/15 v 10:11 PM raghavchalapathy napsal(a):

Hi Michal

I was working on integrating H20 with FLINK , but I observe that FLINK roadmap are following a
Mahout DSL between Flink and Mahout along the same lines as the integration with Spark, rather
than in the way it is done with H2O.

Refer to this link below
http://mail-archives.apache.org/mod_mbox/flink-dev/201501.mbox/%3CCANC1h_s=DtNjS+KQcU-Uxdb=i+_o4KPV-EOKQacr-KpPFX_OKw@mail.gmail.com%3E

Kindly advice are there any limitations / necessity for integrating FLINK with H20

I believe FLINK would benefit with the fact that
H20 provides deep learning out of the box
Using R Data frames
Please correct me if I am wrong

Raghav


Reply to this email directly or view it on GitHub #7.

from sparkling-water.

raghavchalapathy avatar raghavchalapathy commented on May 28, 2024

Thank you so much for the eloborate insight !!
shall get in touch with you approriately

with regards
Raghav

from sparkling-water.

alexeyegorov avatar alexeyegorov commented on May 28, 2024

Just out of curiosity: what is the current status on "flinking water" as the last response about this topic is around year ago?

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 28, 2024

@alexeyegorov no updates on our side. No news from @raghavchalapathy so far.

Do you have some specific idea / use-case for Flinking Water?
We can help with design and navigate development if you would like to participate.

btw: I love the name Flinking Water 👍 !

from sparkling-water.

alexeyegorov avatar alexeyegorov commented on May 28, 2024

@mmalohlava I am writing a master thesis where I want to compare performance of Storm, Spark and Flink using some further abstraction layer (streams framework developed on my faculty in Dortmund). As we have a cooperation with physicists working on gamma-ray astronomy we have a case of high-volume image data from a telescope. At the moment some offline tool as Rapidminer (developed also in Dortmund) or WEKA is used to train model with Random Forest to detect gammas vs. hadrons.
As part of my thesis I was thinking of a running pipeline in a Lambda or Kappa architecture style using some framework for building a model and then apply it on the new incoming stream, all using distributed computing. I found that H2O has very wide range of ML algorithms, Spark's MLlib is also pretty fancy, while Flink ML is still rather weak. Depending on how good or bad Flink would be (some people describe it as much faster than micro-batched Spark Streaming), it would be interesting to combine H2O with Flink.
I am not sure if I am able to start off "flinking water" on my own but I thought it would be a win-win situation for Flink!? As far as it would be some people working on that it would get lucrative for myself to invest more time in it. I don't think I have time for this whole project on my own.

In case I need some support, I feel that I can get it from you and your team! ;)

p.s. after "sparkling water" it is rather straight forward to come up with "flinking water"... especially in German it sounds more fancy keyword! I definitely share your excitement about both, the word combination and the project itself.

p.s.s. btw, out motivation is current development of a large telescope array that will generate even much much more data than a single telescope. ;)

from sparkling-water.

mmalohlava avatar mmalohlava commented on May 28, 2024

Cool! Sounds great!

Are you planning to use some online learning? Right now in h2o we support only offline learners.

However, you can build model offline on a batch of data (you can use H2O directly) , then export model as a code (Model POJO) and compile it as a storm bolt which you can plug directly into Flink. The model would help you score incoming events.

Would it work for you?

Wow - what is a schedule regarding large telescope array?

from sparkling-water.

chobeat avatar chobeat commented on May 28, 2024

Good to see the topic is still alive. We have a huge interest in replicating the work done with Spark on Flink because it's the processing engine we use here at Radicalbit. We haven't started yet because the effort is not clear. So any news or any sign of interest from other companies or students toward this goal is welcome.

@alexeyegorov I agree that you don't really need Flinking Water for your result. Learning inside Flink is not necessary to do what you need. H2O is great for the portability of its models and it could be a good option. Never tried that on Flink though. I'm developing a library called Flink-JPMML that could help you when mature but right now is not something I consider good enough to be shared with others, and it's OT here anyway.

from sparkling-water.

alexeyegorov avatar alexeyegorov commented on May 28, 2024

@mmalohlava schedule is very complex as it contains different pipelines. As far as I know around 2018 first telescopes should go online.

@chobeat @mmalohlava H2O gained my attention as I was comparing different possibilities of distributed machine learning frameworks. My main goal is just comparing Flink, Spark and Storm on our data. Machine learning is just a second step and does not have to be performed inside of H2O. But as it supports some amount of algorithms I thought it could be interesting for later experiments. Especially it could be interesting to test execution time of same algorithms inside Flink and Spark.

@chobeat Flink-JPMML seems not to be needed in our case as we use another abstraction layer (streams framework) on top of Spark, Flink or Storm. In this way we are able to write simple stream processing tasks in java and then just run them in Spark, Flink or elsewhere. We already implemented support for reading and applying models in PMML and simple JSON formats. But nevertheless, if you intend to share Flink-JPMML it would be interesting to look at it. ;)

from sparkling-water.

jakubhava avatar jakubhava commented on May 28, 2024

Closing this for now as it's not actually an issue. Thanks for discussion!

from sparkling-water.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.