Comments (3)
文章里面提到,如果将watermark的生成放到source端,那么会更好。目前最新版本确实已经支持了。
但是,watermark的存在,本身是为了解决window操作中的数据迟到问题。如果在source端就将watermark生成,但是后面没有用到window操作,或者是window操作很少,生成的大量watermark就不会被利用起来,导致性能损失。那为啥在source端生成watermark要好一些呢?不解。
from coolplayspark.
您好,有一个疑问,文章里提到:“再次强调,(a+) 在对 event time 做 window() + groupBy().aggregation() 即利用状态做跨执行批次的聚合,并且 (b+) 输出模式为 Append 模式或 Update 模式时,才需要 watermark,其它时候不需要;”
但其实只要做基于event_time的filter,例如MapGroupsWithState中的GroupStateTimeout.EventTimeTimeout,也需要使用watermark。
from coolplayspark.
您好 如果我需要对当天全部数据进行groupBy+agg聚合操作,此时不使用window但是设置了watermark,会是什么样的情况?我不明白的是window不设置的情况下,会是无限增长的嘛
from coolplayspark.
Related Issues (20)
- 关于SparkStreaming的join操作 HOT 2
- [SS]《1.1 Structured Streaming 实现思路与实现概述》讨论区 HOT 9
- [SS]《1.2 Structured Streaming 之 Output Modes 解析》讨论区 HOT 5
- [SS]《2.1 Structured Streaming 之 Source 解析》讨论区 HOT 1
- [SS]《2.2 Structured Streaming 之 Sink 解析》讨论区 HOT 3
- [SS]《3.1 Structured Streaming 之状态存储解析》讨论区 HOT 8
- [SS]《4.1 Structured Streaming 之 Event Time 解析》讨论区 HOT 2
- [SS]《[Q&A] Structured Streaming 与 Spark Streaming 的区别》讨论区 HOT 1
- 请教问题
- Spark技术群二维码过期 HOT 2
- 这篇文档("0.1 Spark Streaming 实现思路与模块概述.md")存在描述错误的地方 HOT 1
- 大神有没有 sparkstreaming 读取kafka相关的代码
- 程序编译的时候是kafka_client-0.10.jar的,spark-submit的时候加载了CDH自带的spark-assembly。导致类冲突 HOT 1
- driver端异常恢复, 如何确保exactly once语义的呢? HOT 1
- 【question】在watermark下spark如何维护kafka的offset
- structured streaming java.io.EOFException
- StateStore的实现以及exactly-once HOT 1
- 读取多个topic数据效率问题 HOT 1
- spark streaming读取redis问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coolplayspark.