Comments (5)
In my mind, version_kv <left> join version_kv
currently has two scenarios:
one is with aggregation, we need to output changelog results
the other is without aggregation,
i. If the target table is version_kv, we only need to join to output the intermediate results (without changelog)
ii. If the target table is changelog_kv, we still output changelog results
from proton.
For syntax:
--- case 1, with aggregation , the join result is changelog, and the aggr result is append-only
select id, max(v) from kv1 join kv2 using(id);
--- case 2, with aggregation, the join result is changelog, and the aggr result is changelog
select id, max(v), _tp_delta from kv1 join kv2 using(id) emit changelog;
--- case 3, without aggregation, the join result is append-only
select id, kv1.v, kv2.v from kv1 [left] join kv2 using(id);
--- case 4, without aggregation, the join result is changelog
select id, kv1.v, kv2.v, _tp_delta from kv1 [left] join kv2 using(id) emit changelog
from proton.
In my mind,
version_kv <left> join version_kv
currently has two scenarios: one is with aggregation, we need to output changelog results the other is without aggregation, i. If the target table is version_kv, we only need to join to output the intermediate results (without changelog) ii. If the target table is changelog_kv, we still output changelog results
Right, we just need support another emit strategy like EMIT UPSERT, it's users responsibility to pick the right emit strategy. For aggregation, we pick changelog for them, for plain join, we pick changelog by default as well but user can override it with EMIT UPSERT ? We don't need consider target stream since that will be very complex and some times it is hard like an external target table (we don't know what it is). This seems good enough ?
from proton.
I don't like to support another emit strategy EMIT UPSERT
, which is not a common strategy and is just used in versioned_kv join versioned_kv
.
Introducing a special emit strategy will make it more complex and difficult for users to get started
I prefer that the default is append-only unless emit changelog is specified.
from proton.
Make sense to me
from proton.
Related Issues (20)
- Latency of tumble window HOT 10
- MV recovery issue HOT 1
- Skip checkpointing if there is no new data processed HOT 2
- follow up to skip checkpointing if there is no new data processed
- Substream (partition by) crash HOT 1
- Ctrl+C to stop `proton server` got some Fatal logs
- versioned-kv join table(versioned-kv) HOT 1
- list stream rest endpoint should return target stream for MV
- bump contrib/boost version
- Cannot replay from VersionedKV HOT 1
- migrate to c++23 some clean up
- Add unit test for streaming aggregator HOT 1
- porting Fix possible hung/deadlock on query cancellation
- Proton Extenal Stream HOT 2
- Got an assertion failed with a small checkpoint interval
- p95, p90 return incorrect result
- add a config setting for best effort recover times.
- Disable backfill from historical store for VersionedKV
- Support changelog aggregate functions `first_value/last_value/earliest/latest`
- porting Fix data race in CompletedPipelineExecutor.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from proton.