Git Product home page Git Product logo

Comments (6)

Al-assad avatar Al-assad commented on May 21, 2024

@wolfboys What's more, I think we should further abstract the file system used by StreamX workspace which storing temporary files, compiled flink job jar or others.

Currently StreamX workspace is forced to be stored on HDFS and should also support local FS or other FS, in the improved storage abstraction shot is to rely on workspace URI for automatic determination.

For example, "hdfs:///streamx/workspace" is automatically directed to HDFS, "/streamx/workspace/" or "local:///streamx/workspace" to local FS, _"s3:///streamx/workspace/"_ to S3.

from incubator-streampark.

Al-assad avatar Al-assad commented on May 21, 2024

StreamX users will have a requirement to use only Flink K8s schema features, such as issue #192. Their usage scenario for StreamX is to use it only as an ETL platform, with no Hadoop clusters included in the deployment scenario(with other OLAP downlinks such as ClickHouse, TiDB, etc.)

from incubator-streampark.

wolfboys avatar wolfboys commented on May 21, 2024

Yes, we need to remove the current mandatory dependence on Hadoop environment and make it optional. We can decide which environment dependence to use according to the different submission mode (yarn application, k8s, local) selected by users. As far as Hadoop is concerned, there are HDFS based or S3 based ones. There may be some subtle differences and changes. As far as local mode is concerned, it is different from the implementation of Hadoop dependent environment, It's another kind of realization, and the details need to be discussed

from incubator-streampark.

wolfboys avatar wolfboys commented on May 21, 2024

In addition, Now cloud native is more and more popular, and there are more storage in this way, we need to discuss the storage of Hadoop on S3.

from incubator-streampark.

Al-assad avatar Al-assad commented on May 21, 2024

@wolfboys I think the focus is not on S3, but on allowing users to freely choose the resource storage of streamx or the freedom to decide on storage based on reality.
as shown below:
Untitled-2021-06-29-1245

from incubator-streampark.

datayangl avatar datayangl commented on May 21, 2024

Flink's design of filesystem pretty match our goal, so I suggest we can firstly go through the abstraction and implemention of flink, and make it configurable. refrence: org.apache.flink.core.fs.FileSystem

from incubator-streampark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.