At present, streamx only support "yarn application" and must be on Hadoop environment.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[FEATURE] Remove mandatory dependency of Hadoop environment about incubator-streampark HOT 6 CLOSED

apache commented on May 21, 2024

[FEATURE] Remove mandatory dependency of Hadoop environment

from incubator-streampark.

Comments (6)

Al-assad commented on May 21, 2024

@wolfboys What's more, I think we should further abstract the file system used by StreamX workspace which storing temporary files, compiled flink job jar or others.

Currently StreamX workspace is forced to be stored on HDFS and should also support local FS or other FS, in the improved storage abstraction shot is to rely on workspace URI for automatic determination.

For example, "hdfs:///streamx/workspace" is automatically directed to HDFS, "/streamx/workspace/" or "local:///streamx/workspace" to local FS, _"s3:///streamx/workspace/"_ to S3.

from incubator-streampark.

Al-assad commented on May 21, 2024

StreamX users will have a requirement to use only Flink K8s schema features, such as issue #192. Their usage scenario for StreamX is to use it only as an ETL platform, with no Hadoop clusters included in the deployment scenario（with other OLAP downlinks such as ClickHouse, TiDB, etc.）

from incubator-streampark.

wolfboys commented on May 21, 2024

Yes, we need to remove the current mandatory dependence on Hadoop environment and make it optional. We can decide which environment dependence to use according to the different submission mode (yarn application, k8s, local) selected by users. As far as Hadoop is concerned, there are HDFS based or S3 based ones. There may be some subtle differences and changes. As far as local mode is concerned, it is different from the implementation of Hadoop dependent environment, It's another kind of realization, and the details need to be discussed

from incubator-streampark.

wolfboys commented on May 21, 2024

In addition, Now cloud native is more and more popular, and there are more storage in this way, we need to discuss the storage of Hadoop on S3.

from incubator-streampark.

Al-assad commented on May 21, 2024

@wolfboys I think the focus is not on S3, but on allowing users to freely choose the resource storage of streamx or the freedom to decide on storage based on reality.
as shown below:

from incubator-streampark.

datayangl commented on May 21, 2024

Flink's design of filesystem pretty match our goal, so I suggest we can firstly go through the abstraction and implemention of flink, and make it configurable. refrence: org.apache.flink.core.fs.FileSystem

from incubator-streampark.

Recommend Projects

[FEATURE] Remove mandatory dependency of Hadoop environment about incubator-streampark HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent