risingwavelabs / risingwave Goto Github PK

Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. 🚀 10x more productive. 🚀 10x more cost-efficient.

Home Page: https://www.risingwave.com/slack

License: Apache License 2.0

Dockerfile 0.07% Shell 1.18% Rust 91.74% Python 2.38% JavaScript 0.05% CSS 0.01% TypeScript 0.70% Java 3.49% Go 0.32% Roff 0.01% PureBasic 0.01% PHP 0.03% Ruby 0.03% PLpgSQL 0.01%

database stream-processing cloud-native sql distributed-database rust serverless postgresql real-time postgres

risingwave's Introduction

🌊 Reimagine stream processing.

Documentation 📑 Hands-on Tutorials 🎯 RisingWave Cloud 🚀 Get Instant Help

RisingWave is a Postgres-compatible streaming database engineered to provide the simplest and most cost-efficient approach for processing, analyzing, and managing real-time event streaming data.

Try it out in 60 seconds

Install RisingWave standalone mode:

curl https://risingwave.com/sh | sh

Then follow the prompts to start and connect to RisingWave.

To learn about other installation options, such as using a Docker image, see Quick Start.

Production deployments

RisingWave Cloud offers the easiest way to run RisingWave in production, with a forever-free developer tier.

For Docker deployment, please refer to Docker Compose.

For Kubernetes deployment, please refer to Kubernetes with Helm or Kubernetes with Operator.

Why RisingWave for real-time materialized views?

RisingWave specializes in providing incrementally updated, consistent materialized views — a persistent data structure that represents the results of event stream processing. Compared to materialized views, dynamic tables, and live tables in other database and data warehouse systems, RisingWave's materialized view stands out in several key aspects:

Highly cost-efficient - up to 95% cost savings compared to state-of-the-art solutions
Synchronous refresh without compromising consistency
Extensive SQL support including joins, deletes, and updates
High concurrency in query serving
Instant fault tolerance
Transparent dynamic scaling
Speedy bootstrapping and backfilling

RisingWave's extensive CDC support further enables users to seamlessly offload event-driven workloads such as materialized views and triggers from operational databases (e.g., PostgreSQL) to RisingWave.

Why RisingWave for stream processing?

RisingWave provides users with a comprehensive set of frequently used stream processing features, including exactly-once consistency, time window functions, watermarks, and more. RisingWave significantly reduces the complexity of building stream processing applications by allowing developers to express intricate stream processing logic through cascaded materialized views. Furthermore, it allows users to persist data directly within the system, eliminating the need to deliver results to external databases for storage and query serving.

Compared to existing stream processing systems like Apache Flink, Apache Spark Streaming, and ksqlDB, RisingWave stands out in two primary dimensions: Ease-of-use and cost efficiency, thanks to its PostgreSQL-style interaction experience and Snowflake-like architectural design (i.e., decoupled storage and compute).

	RisingWave 🌊	Traditional stream processing systems
Learning curve 🎢	PostgreSQL-style experience	System-specific concepts
Integration 🔗	PostgreSQL ecosystem	System-specific ecosystem
Complex queries (e.g., joins) 💡	Highly efficient	Inefficient
Failure recovery 🚨	Instant	Minutes or even hours
Dynamic scaling 🚀	Transparent	Stop-the-world
Bootstrapping and Backfilling ⏪	Accelerated via dynamic scaling	Slow

RisingWave as a database

RisingWave is fundamentally a database that extends beyond basic streaming data processing capabilities. It excels in the effective management of streaming data, making it a trusted choice for data persistence and powering online applications. RisingWave offers an extensive range of database capabilities, which include:

High availability
Serving highly concurrent queries
Role-based access control (RBAC)
Integration with data modeling tools, such as dbt
Integration with database management tools, such as Dbeaver
Integration with BI tools, such as Grafana
Schema change
Processing of semi-structured data

In-production use cases

Within your data stack, RisingWave can assist with:

Processing and transforming event streaming data in real time
Offloading event-driven queries (e.g., materialized views, triggers) from operational databases
Performing real-time ETL (Extract, Transform, Load)
Supporting real-time feature stores

Read more at use cases. RisingWave is extensively utilized in real-time applications such as monitoring, alerting, dashboard reporting, machine learning, among others. It has already been adopted in fields such as financial trading, manufacturing, new media, logistics, gaming, and more. Check out customer stories.

Community

Looking for help, discussions, collaboration opportunities, or a casual afternoon chat with our fellow engineers and community members? Join our Slack workspace!

Notes on telemetry

RisingWave collects anonymous usage statistics to better understand how the community is using RisingWave. The sole intention of this exercise is to help improve the product. Users may opt out easily at any time. Please refer to the user documentation for more details.

License

RisingWave is distributed under the Apache License (Version 2.0). Please refer to LICENSE for more information.

Contributing

Thanks for your interest in contributing to the project! Please refer to contribution guidelines for more information.

risingwave's People

Contributors

Stargazers

Watchers

Forkers

b41sh jdmfr junnplus tennyzhuang hzxa21 jyz0309 jackwener fedomn 3aceshowhand xiaoyong-z sgzw moveisthebest bugenzhao strikew peterwrighten cnlhc ben1009 gingerkidney hawkingrei duzhanyuan zhangwentai earayu drgnchan lyhiving amdahl-rs flyarong pengwzju andrewrong ramseyxu zhaopinglu hangscer8 luoyuxia juan-chen45 d2lark pleiadesian grapebaba 0x0b01 yibit ezreal1997 jrthe42 bgeng777 spread0x yongbingwang 3lang3 arenatlx wenym1 windowsxp-beta alexodus alejandrosuarez mohnishbasha klion26 nliver jn7163 skyzh shanicky dingweiqings lroolle melodylail zombee0 littlefall turbofei shhdgit wjjmjh geometryolife djsczhu dreamflychen mu-l pengduo tszkitlo40 noneback lichongjie morganwu277 kinddevil fuyufjh huangjw806 ichn-hu jingshanglu bobsongplus bellondr playfloor aloyszhang cloud-za chenywv lianqian li215065046 micheal616 caesar168 dongtingting kevinyhzou 0x7f5e xiekeyi98 chentao catandcoder wsry zhangqs0205 isgasho ezade clslaid y-wei neverchanje

risingwave's Issues

streaming: report actor error

Currently, we simply print the error as warning in console, and let the actor down. In the future, we should report this to dashboard, and restart the actor.

At the same time, some of our implementations also spawns extra futures apart from the actor. We should also manage this. For example, BarrierAligner will spawn two futures to poll data from each of the executors. #2229

Tracking: Implement transaction in hummock

We introduce the concept of transaction in Hummock to aim the checkpointing process so that changes can be committed and rollbacked atomically.

Implement Hummock service atop meta service and migrate version manager (#1744 singularity-data/risingwave-legacy#2156)
Introduce transaction in hummock service (transaction state, lifecycle, visibility management) (#2513)
Support transaction commit (#2513)
Support transaction rollback (#2513)
State store integration
Stream manager integration

Olap: tpch q3.

Blockers:

singularity-data/risingwave-legacy#2263

Batch: TPC-H Q19

Query:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = ':1'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= :4 and l_quantity <= :4 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':2'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= :5 and l_quantity <= :5 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':3'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= :6 and l_quantity <= :6 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

ci: build and test `release` version

Currently, we only test our debug build of our program. In the future, we should use cargo build --release for our e2e test. This is the convention followed by TiKV: run unit tests with debug version, and integration tests with release version.

Tracking: Support TPC-H Queries (for Java frontend)

Currently we've implemented almost all necessary operators and expressions required by TPC-H. This issue tracks bug-fix and implementation of TPC-H queries.

A query is marked as supported iff. it has been added to the end-to-end test.

SQL Features in Queries

Query	Group-by (Aggregation)	Order-by (Sort/TopN)	Join	Non-Correlated Subquery	Correlated Subquery	CTE / View
Q1	✔️	✔️
Q2		✔️	✔️		✔️ (value in filter)
Q3	✔️	✔️	✔️
Q4	✔️	✔️			✔️ (exists)
Q5	✔️	✔️	✔️
Q6	✔️ (simple)
Q7	✔️	✔️	✔️	✔️ (table)
Q8	✔️	✔️	✔️	✔️ (table)
Q9	✔️	✔️	✔️	✔️ (table)
Q10	✔️	✔️	✔️
Q11	✔️	✔️	✔️		✔️ (value in filter)
Q12	✔️	✔️	✔️
Q13	✔️	✔️	✔️ (outer)	✔️ (table)
Q14	✔️ (simple)		✔️
Q15	✔️	✔️	✔️		✔️ (value in filter)	✔️
Q16	✔️	✔️	✔️		✔️ (not in)
Q17	✔️ (simple)		✔️	✔️ (table)
Q18	✔️	✔️	✔️		✔️ (in)
Q19	✔️ (simple)		✔️
Q20	✔️	✔️	✔️	✔️ (in & table)
Q21	✔️	✔️	✔️		✔️ (exists)
Q22	✔️	✔️		✔️ (table)	✔️ (not exists & value in filter)

Batch

Streaming

riselab: add s3 support

Basically there are two ways to support it:

    - use: compute-node
      port: 5689
      exporter-port: 1224
      state-backend: s3://bucket

If using the above way, we will need to add state-backend parameter to every compute-node. In some cases, we need to write it 3 times.

Another way is to make s3 as a mock service.

  default:
    - use: s3
       bucket: xxxx
    - use: meta-node
    - use: compute-node
    - use: prometheus
    - use: frontend

... and for compute-node:

  compute-node:
    address: "127.0.0.1"
    port: 5688
    exporter-address: "127.0.0.1"
    exporter-port: 1222
    id: compute-node-${port}
    provide-minio: "minio*"
    provide-meta-node: "meta-node*"
+   provide-s3: "s3*"
    user-managed: false

Batch: TPC-H Q22

Query:

select
	cntrycode,
	count(*) as numcust,
	sum(c_acctbal) as totacctbal
from
	(
		select
			substring(c_phone from 1 for 2) as cntrycode,
			c_acctbal
		from
			customer
		where
			substring(c_phone from 1 for 2) in
				(':1', ':2', ':3', ':4', ':5', ':6', ':7')
			and c_acctbal > (
				select
					avg(c_acctbal)
				from
					customer
				where
					c_acctbal > 0.00
					and substring(c_phone from 1 for 2) in
						(':1', ':2', ':3', ':4', ':5', ':6', ':7')
			)
			and not exists (
				select
					*
				from
					orders
				where
					o_custkey = c_custkey
			)
	) as custsale
group by
	cntrycode
order by
	cntrycode;

Tracking: state_store benchmark roadmap.

Reference:

Design note: https://singularity-data.larksuite.com/docs/docusUkBFoG9B5CwOruT68Sx65f
End to end benchmark roadmap: #2232

Tracking:
Stage one (target 2022.1.5):

A command line executable tool ss_bench with following capabilities:
- Configurable backends: (InMemoryStateStore, hummock+s3, hummock+minio, TikvStateStore, RocksDBStateStore)
- Configurable hummock options: (SST size, block size, bloom filter, default bucket, checksum algo, etc..)
- Configurable data sizes: (key length/value length).
- Configurable mode: (simple get/ simple scan / simple write_batch / mixed).

Stage two (target 2022.1.10):

Launch Premetheus endpoint to expose internal storage counters.
Configurable concurrencies: (multithreaded get/scan/write_batch operations).
Review performance with ss_bench.

Stage three: more likely patterns as our internal states:

Sequencial writes within single key space.
Random writes within single key space.
(Sequencial, random) x (writes, deletes, updates) within single key space.
(Sequencial, random) x (writes, deletes, updates) x (single key space, multiple key spaces)
(Tiered, Leveled compaction) x (Sequencial, random ingestion) x (single, multiple key space)

OLAP: tpch q5.

Tracking: Binder (validator)

Binder (validator) to bind & check variables exist in their namespace

Clauses:

Types & Exprs:

Batch: TPC-H Q15

Query:

select
	s_suppkey,
	s_name,
	s_address,
	s_phone,
	total_revenue
from
	supplier,
	(
		select
		l_suppkey,
		sum(l_extendedprice * (1 - l_discount))
	from
		lineitem
	where
		l_shipdate >= date '1993-01-01'
		and l_shipdate < date '1993-01-01' + interval '3' month
	group by
		l_suppkey
	) as revenue0 (supplier_no, total_revenue)
where
	s_suppkey = supplier_no
	and total_revenue = (
		select
			max(total_revenue)
		from
			(
				select
				l_suppkey,
				sum(l_extendedprice * (1 - l_discount))
			from
				lineitem
			where
				l_shipdate >= date '1993-01-01'
				and l_shipdate < date '1993-01-01' + interval '3' month
			group by
				l_suppkey
			) as revenue0 (supplier_no, total_revenue)
	)
order by
	s_suppkey;

Remove single mode

"Single mode" was designed to generate and execute a plan without Exchange, which is only used in early development.

Note that the so-called "single mode" is NOT a mode designed for single node deployment. Exchange is necessary for parallel execution no mater on single node or cluster.

Currently, there is more and more code handling "single mode". It's time to remove these stuff.

batch: HashJoin: support non-equi condition

message HashJoinNode {
  JoinType join_type = 1;
  repeated int32 left_key = 2;
  repeated int32 left_output = 3;
  repeated int32 right_key = 4;
  repeated int32 right_output = 5;
+  expr.ExprNode other_condition = 6;
}

Tracking: unify `Table` and `MView` & use Hummock storage for `TABLE_V2`

There're a lot in common between a TableSouce and MView. In this tracking issue, we will unify their write paths and integrate batch table with Hummock storage. For more information, check this design doc.

Tracking: Benckmarking RisingWave

This is to track the works related to benckmarking state store of RisingWave. The final goal is to do the experiments in this doc: https://singularity-data.larksuite.com/docs/docusF6l8lV5BuOCWoq7Cu71fEe

Stage 1

Dimensions and metrics for state management of RisingWave
Experiments configuration
Implement state backends for RocksDB singularity-data/risingwave-legacy#2292
Implement state backends for TiKV singularity-data/risingwave-legacy#2202
Implement MetricsManager for RisingWave singularity-data/risingwave-legacy#2198 singularity-data/risingwave-legacy#2234

Stage 2

singularity-data/risingwave-legacy#2285 singularity-data/risingwave-legacy#2359
singularity-data/risingwave-legacy#2410
singularity-data/risingwave-legacy#2436
#195
#194
singularity-data/risingwave-legacy#2411
#45 (Implement ss_bench tool for state store layer)
#127

Stage 3

Conduct StateStore benchmark using ss_bench on memory, RocksDB, TiKV and Hummock backends
E2E Performance experiments(TPC-H)
E2E Performance experiments(Nexmark)
Overhead & cost experiments

batch: Hash agg should split chunk according to chunk_size

meta: record plan and fragment mapping

We should know the corresponding SQL and plan for a given group of fragment in dashbaord.

Tracking: Basic Built-in Functions

Reference: PostgreSQL: Documentation: 14: Chapter 9. Functions and Operators

General for all types

TEXT format input / output (cast from / into string)
BINARY format input / output
singularity-data/risingwave-legacy#1916
singularity-data/risingwave-legacy#667
/issues/2284

Aggregate

count

Equality and Ordering for "most" types

Aggregate

min / max

Boolean

explicit cast from / into int
Implement AND/OR/NOT operators
IS [NOT] [TRUE | FALSE]
#8933

Aggregate

#8936

Window functions

#8961

Misc

#14118

Olap: Tpch q7.

Query:

select
	supp_nation,
	cust_nation,
	l_year,
	sum(volume) as revenue
from
	(
		select
			n1.n_name as supp_nation,
			n2.n_name as cust_nation,
			extract(year from l_shipdate) as l_year,
			l_extendedprice * (1 - l_discount) as volume
		from
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2
		where
			s_suppkey = l_suppkey
			and o_orderkey = l_orderkey
			and c_custkey = o_custkey
			and s_nationkey = n1.n_nationkey
			and c_nationkey = n2.n_nationkey
			and (
				(n1.n_name = 'ROMANIA' and n2.n_name = 'INDIA')
				or (n1.n_name = 'INDIA' and n2.n_name = 'ROMANIA')
			)
			and l_shipdate between date '1995-01-01' and date '1996-12-31'
	) as shipping
group by
	supp_nation,
	cust_nation,
	l_year
order by
	supp_nation,
	cust_nation,
	l_year;

streaming: use ManagedValueState for append-only max

Hummock: File Cache System

This is a tracking issue for Hummock File Cache System.

Hummock File Cache System servers as a new tier of cache to utilize the spare disk space as block cache.

ci: e2e coverage for Rust

Currently, e2e doesn't take Rust codebase into account when calculating coverage. In the future, we should do this.

java frontend spends 6s+ to insert 6000 rows

Benchmark TPC-H Q1: insert statement with 6000+ rows is time consuming

Already tried to suppress logging.
Seems like each inserted value is processed multiple passes during planning/optimization.
(Thanks @zehaowei for the report and investigation.)

Batch: TPC-H Q12

Query:

select
	l_shipmode,
	sum(case
		when o_orderpriority = '1-URGENT'
			or o_orderpriority = '2-HIGH'
			then 1
		else 0
	end) as high_line_count,
	sum(case
		when o_orderpriority <> '1-URGENT'
			and o_orderpriority <> '2-HIGH'
			then 1
		else 0
	end) as low_line_count
from
	orders,
	lineitem
where
	o_orderkey = l_orderkey
	and l_shipmode in ('FOB', 'SHIP')
	and l_commitdate < l_receiptdate
	and l_shipdate < l_commitdate
	and l_receiptdate >= date '1994-01-01'
	and l_receiptdate < date '1994-01-01' + interval '1' year
group by
	l_shipmode
order by
	l_shipmode;

Blocking:

#218
#257

OLAP: tpch q4.

Query:

select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date '1997-07-01'
	and o_orderdate < date '1997-07-01' + interval '3' month
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority;

Blocker:

#1916
#154
singularity-data/risingwave-legacy#2316

Batch: TPC-H Q20

Query:

select
	s_name,
	s_address
from
	supplier,
	nation
where
	s_suppkey in (
		select
			ps_suppkey
		from
			partsupp
		where
			ps_partkey in (
				select
					p_partkey
				from
					part
				where
					p_name like ':1%'
			)
			and ps_availqty > (
				select
					0.5 * sum(l_quantity)
				from
					lineitem
				where
					l_partkey = ps_partkey
					and l_suppkey = ps_suppkey
					and l_shipdate >= date ':2'
					and l_shipdate < date ':2' + interval '1' year
			)
	)
	and s_nationkey = n_nationkey
	and n_name = ':3'
order by
	s_name;

regress test: precision of decimal is not long enough

CREATE TABLE num_data (id int4, val numeric(210,10));

numeric test in regress test require numeric type with precision at 210, but the implementation of decimal in https://docs.rs/rust_decimal/latest/rust_decimal/ have only 64 bits

Trigger barrier in meta service

Hummock: bench S3

The granularity of cache population and eviction depends on the performance of S3 put / multi part upload / get / part get / byte-range get. We need to bench S3 first before deciding some designs.

#198

Tracking: Create mv on mv

Batch: TPC-H Q21

Query:

select
	s_name,
	count(*) as numwait
from
	supplier,
	lineitem l1,
	orders,
	nation
where
	s_suppkey = l1.l_suppkey
	and o_orderkey = l1.l_orderkey
	and o_orderstatus = 'F'
	and l1.l_receiptdate > l1.l_commitdate
	and exists (
		select
			*
		from
			lineitem l2
		where
			l2.l_orderkey = l1.l_orderkey
			and l2.l_suppkey <> l1.l_suppkey
	)
	and not exists (
		select
			*
		from
			lineitem l3
		where
			l3.l_orderkey = l1.l_orderkey
			and l3.l_suppkey <> l1.l_suppkey
			and l3.l_receiptdate > l3.l_commitdate
	)
	and s_nationkey = n_nationkey
	and n_name = ':1'
group by
	s_name
order by
	numwait desc,
	s_name;

Blocker:

#521
#587

MVCC of frontend schema

About version in meta rpc. every ddl will make the version of schema +1.

ci: retire `start_cluster.sh`

RiseLAB is now capable of all tasks of start_cluster.sh, and the e2e-test-riselab is now running smoothly across main branch and PRs. After we added source support for RiseLAB e2e, it seems to be a good time to retire start_cluster.sh in CI.

Use remote catalog service

singularity-data/risingwave-legacy#2459
singularity-data/risingwave-legacy#2460
#957

Bug: table row id distributed generation

Currently when inserting tuples into distributed tables, each partition will allocate implicit ids according to their local row number independently. This is done at singularity-data/risingwave-legacy#1613 .

Duplication may occur in a distributed table when union all partitions together.

To fix this, a quick fix is to assign tuple id as a 16-digit parition id + 48 digit local row id.

Batch: TPC-H Q8

Query:

select
	o_year,
	sum(case
		when nation = 'INDIA' then volume
		else 0
	end) / sum(volume) as mkt_share
from
	(
		select
			extract(year from o_orderdate) as o_year,
			l_extendedprice * (1 - l_discount) as volume,
			n2.n_name as nation
		from
			part,
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2,
			region
		where
			p_partkey = l_partkey
			and s_suppkey = l_suppkey
			and l_orderkey = o_orderkey
			and o_custkey = c_custkey
			and c_nationkey = n1.n_nationkey
			and n1.n_regionkey = r_regionkey
			and r_name = 'ASIA'
			and s_nationkey = n2.n_nationkey
			and o_orderdate between date '1995-01-01' and date '1996-12-31'
			and p_type = 'PROMO BRUSHED COPPER'
	) as all_nations
group by
	o_year
order by
	o_year;

Tracking: Frontend Catalog

MVP Features

singularity-data/risingwave-legacy#2261
singularity-data/risingwave-legacy#2572
singularity-data/risingwave-legacy#2669
singularity-data/risingwave-legacy#2347
singularity-data/risingwave-legacy#2473
#958

Refactor

singularity-data/risingwave-legacy#2560
#415
singularity-data/risingwave-legacy#2592
singularity-data/risingwave-legacy#2917

Table V2

#1199

Tracking: Implement e2e benchmark tool for RisingWave

Mainly benchmarking for state management of RisingWave, e2e throughput, price/performance, data freshness, e2e latency

Functionality

Cmd tool
Configurable source & workload generator
Execution pipeline(pg sql, Kafka, CDC)
Metrics service
Support using pg SQL & table source to benchmark RisingWave
Support using Kafka to benchmark RisingWave
Support using CDC(MySQL->Debezium->Kafka) to benchmark RisingWave

Source

Implement TPC-H as benchmarking source

Supports min/max aggregation on boolean types.

Min/Max on boolean types is not supported yet.

CREATE TABLE supplier (
        s_suppkey  INTEGER,
        s_name VARCHAR(25),
        s_address VARCHAR(40),
        s_nationkey INTEGER,
        s_phone VARCHAR(15),
        s_acctbal NUMERIC,
        s_comment VARCHAR(101));

select min(s_suppkey > 1) from supplier;

feat: streaming: supports multiple distribution key for HashDispatcher

Currently, Dispatcher only supports single distribution key:

// A dispatcher redistribute messages.
// We encode both the type and other usage information in the proto.
message Dispatcher {
  enum DispatcherType {
    SIMPLE = 0;
    HASH = 1;
    BROADCAST = 2;
  }
  DispatcherType type = 1;
  int32 column_idx = 2; // <--- HERE!!
}

Size of `AllOrNoneState<MemoryStateStore>` is huge

println!("{}", std::mem::size_of::<AllOrNoneState<MemoryStateStore>>());

176

It's too huge as a hash value, but we can optimize it later.

Originally posted by @TennyZhuang in #2114 (comment)

[Tracking] streaming: async flush and operator merge support

Currently, we can only get output of HashAgg after flush is called. That is because states like Max requires:

apply_batch generates a sequence of change log (flush_status)
merge the status to the state store by flushing them in write batch
read the real max from the state store

This will cause several problems:

flushing a large incremental state will lead to spike latency
in some of our fail-over designs, we can only add SST to the LSM tree when all nodes have processed the barrier

Therefore, managed states shouldn't require that all changes have been flushed before producing the correct output for this epoch.

To achieve this, we will need:

implement iterator on StateStore
implement state merge on get_output for ManagedExtremeState
implement state merge on ManagedStringAggState
... when other ManagedStates are added, add new tasks here
support async flush in HashAgg
support async flush in SimpleAgg
... when other executors are integrated into the state store, add new tasks here

Duplicate def of Table/Database/Schmea Id

riselab: support graceful stop

RiseLAB currently only kill the tmux session. In the future, we should support graceful exit.

Call tmux list-windows -a -F "#{pane_pid} #{window_name}" to get pid of the components.
Send kill -SIGINT -pgid to stop the components.

Negative num mod error

In pg, the result is :

postgres=# SELECT (-32768)::int2 % (-1)::int2;
 ?column?
----------
        0
(1 row)

In our system, the result is:

dev=>SELECT (-32768)::int2 % (-1)::int2;
Error: Out of range
(1 row)

riselab: support Grafana theme configuration

riselab:
  default:
    - use: grafana
      theme: dark

New name for RiseLAB

dashboard: show plan along with fragments

This also requires the meta service to record information about plans apart from fragments.

Tracking: Minimal Frontend Framework in Rust

This issue tracks the works to rewrite a basic frontend in rust.

This new frontend is expected to be a replacement for the current Java frontend, but they will coexist for several months. During this period, we may need two end-to-end tests for each frond-end respectively, until the new frontend can cover all features.

Expected Features

In the first stage, we only support

CREATE TABLE
DROP TABLE
INSERT
SELECT
- columns or constant values (no expressions allowed)
- from one table (no joins allowed)
- with very simple WHERE condition (column = constant)
CREATE MATERIALIZED VIEW
- Same as SELECT

risingwavelabs / risingwave Goto Github PK

risingwave's Introduction

🌊 Reimagine stream processing.

Try it out in 60 seconds

Production deployments

Why RisingWave for real-time materialized views?

Why RisingWave for stream processing?

RisingWave as a database

In-production use cases

Community

Notes on telemetry

License

Contributing

risingwave's People

Contributors

Stargazers

Watchers

Forkers

risingwave's Issues

SQL Features in Queries

Batch

Streaming

General for all types

Equality and Ordering for "most" types

Boolean

Window functions

Misc

More

MVP Features

Refactor

Table V2

Expected Features

Components

Recommend Projects

Recommend Topics

Recommend Org