Comments (6)
@wolfboys What's more, I think we should further abstract the file system used by StreamX workspace which storing temporary files, compiled flink job jar or others.
Currently StreamX workspace is forced to be stored on HDFS and should also support local FS or other FS, in the improved storage abstraction shot is to rely on workspace URI for automatic determination.
For example, "hdfs:///streamx/workspace" is automatically directed to HDFS, "/streamx/workspace/" or "local:///streamx/workspace" to local FS, _"s3:///streamx/workspace/"_ to S3.
from incubator-streampark.
StreamX users will have a requirement to use only Flink K8s schema features, such as issue #192. Their usage scenario for StreamX is to use it only as an ETL platform, with no Hadoop clusters included in the deployment scenario(with other OLAP downlinks such as ClickHouse, TiDB, etc.)
from incubator-streampark.
Yes, we need to remove the current mandatory dependence on Hadoop environment and make it optional. We can decide which environment dependence to use according to the different submission mode (yarn application, k8s, local) selected by users. As far as Hadoop is concerned, there are HDFS based or S3 based ones. There may be some subtle differences and changes. As far as local mode is concerned, it is different from the implementation of Hadoop dependent environment, It's another kind of realization, and the details need to be discussed
from incubator-streampark.
In addition, Now cloud native is more and more popular, and there are more storage in this way, we need to discuss the storage of Hadoop on S3.
from incubator-streampark.
@wolfboys I think the focus is not on S3, but on allowing users to freely choose the resource storage of streamx or the freedom to decide on storage based on reality.
as shown below:
from incubator-streampark.
Flink's design of filesystem pretty match our goal, so I suggest we can firstly go through the abstraction and implemention of flink, and make it configurable. refrence: org.apache.flink.core.fs.FileSystem
from incubator-streampark.
Related Issues (20)
- [Bug] loadCheckpointMetadata throws error Caused by: java.io.EOFException HOT 1
- [Bug] throw an exception when HA's flink cluster on kubernetes_native_session is stop in streampark HOT 4
- [Bug] After k8s submit fail, submit again cause once submit multi-jobs. HOT 2
- [Bug] When submitting a task to k8s session's flink cluster, the task was submitted repeatedly. HOT 1
- [Bug] The returned address is the k8s's node ip instead of the LoadBalancer url. When starting flink cluster of kubernetes native session on streampark. HOT 3
- [Bug] java.lang.IllegalArgumentException: No enum constant org.apache.streampark.console.core.enums.NoticeTypeEnum.1
- [Bug] kubernetes application can't cancel job HOT 1
- [Bug] yarn mode stop flink job fail HOT 3
- [Bug] Upgrade sql script may cause error when user database is not streampak. HOT 3
- [Bug] The flink home configuration cannot edit and delete after clicking sync conf without path of The flink home. HOT 1
- [Feature] Avoid Null Pointer Exception in Switch keyword
- does the job have import and export functions? HOT 1
- streampark commons-cli NoSuchMethodError HOT 2
- Internal server error: null HOT 1
- [Bug] Bug title streampark HOT 1
- [Bug] Field 'modify_time' doesn't have a default value HOT 1
- streampark2.1.2 Mapping Application Job FINISHED HOT 2
- run jar file occur errors HOT 3
- [Feature] Some suggestions HOT 2
- [Bug] streampark delete k8s deployment when JobManager restart HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-streampark.