Git Product home page Git Product logo

risingwavelabs / risingwave Goto Github PK

View Code? Open in Web Editor NEW
6.3K 78.0 509.0 188.88 MB

Cloud-native SQL stream processing, analytics, and management. KsqlDB and Apache Flink alternative. ๐Ÿš€ 10x more productive. ๐Ÿš€ 10x more cost-efficient.

Home Page: https://www.risingwave.com/slack

License: Apache License 2.0

Dockerfile 0.07% Shell 1.18% Rust 91.74% Python 2.38% JavaScript 0.05% CSS 0.01% TypeScript 0.70% Java 3.49% Go 0.32% Roff 0.01% PureBasic 0.01% PHP 0.03% Ruby 0.03% PLpgSQL 0.01%
database stream-processing cloud-native sql distributed-database rust serverless postgresql real-time postgres

risingwave's Introduction

๐ŸŒŠ Reimagine stream processing.

Documentationย ย ย ๐Ÿ“‘ย ย ย  Hands-on Tutorialsย ย ย ๐ŸŽฏย ย ย  RisingWave Cloudย ย ย ๐Ÿš€ย ย ย  Get Instant Help

RisingWave is a Postgres-compatible streaming database engineered to provide the simplest and most cost-efficient approach for processing, analyzing, and managing real-time event streaming data.

RisingWave

Try it out in 60 seconds

Install RisingWave standalone mode:

curl https://risingwave.com/sh | sh

Then follow the prompts to start and connect to RisingWave.

To learn about other installation options, such as using a Docker image, see Quick Start.

Production deployments

RisingWave Cloud offers the easiest way to run RisingWave in production, with a forever-free developer tier.

For Docker deployment, please refer to Docker Compose.

For Kubernetes deployment, please refer to Kubernetes with Helm or Kubernetes with Operator.

Why RisingWave for real-time materialized views?

RisingWave specializes in providing incrementally updated, consistent materialized views โ€” a persistent data structure that represents the results of event stream processing. Compared to materialized views, dynamic tables, and live tables in other database and data warehouse systems, RisingWave's materialized view stands out in several key aspects:

  • Highly cost-efficient - up to 95% cost savings compared to state-of-the-art solutions
  • Synchronous refresh without compromising consistency
  • Extensive SQL support including joins, deletes, and updates
  • High concurrency in query serving
  • Instant fault tolerance
  • Transparent dynamic scaling
  • Speedy bootstrapping and backfilling

RisingWave's extensive CDC support further enables users to seamlessly offload event-driven workloads such as materialized views and triggers from operational databases (e.g., PostgreSQL) to RisingWave.

Why RisingWave for stream processing?

RisingWave provides users with a comprehensive set of frequently used stream processing features, including exactly-once consistency, time window functions, watermarks, and more. RisingWave significantly reduces the complexity of building stream processing applications by allowing developers to express intricate stream processing logic through cascaded materialized views. Furthermore, it allows users to persist data directly within the system, eliminating the need to deliver results to external databases for storage and query serving.

Real-time Data Pipelines without or with RisingWave

Compared to existing stream processing systems like Apache Flink, Apache Spark Streaming, and ksqlDB, RisingWave stands out in two primary dimensions: Ease-of-use and cost efficiency, thanks to its PostgreSQL-style interaction experience and Snowflake-like architectural design (i.e., decoupled storage and compute).

RisingWave ๐ŸŒŠ Traditional stream processing systems
Learning curve ๐ŸŽข PostgreSQL-style experience System-specific concepts
Integration ๐Ÿ”— PostgreSQL ecosystem System-specific ecosystem
Complex queries (e.g., joins) ๐Ÿ’ก Highly efficient Inefficient
Failure recovery ๐Ÿšจ Instant Minutes or even hours
Dynamic scaling ๐Ÿš€ Transparent Stop-the-world
Bootstrapping and Backfilling โช Accelerated via dynamic scaling Slow

RisingWave as a database

RisingWave is fundamentally a database that extends beyond basic streaming data processing capabilities. It excels in the effective management of streaming data, making it a trusted choice for data persistence and powering online applications. RisingWave offers an extensive range of database capabilities, which include:

  • High availability
  • Serving highly concurrent queries
  • Role-based access control (RBAC)
  • Integration with data modeling tools, such as dbt
  • Integration with database management tools, such as Dbeaver
  • Integration with BI tools, such as Grafana
  • Schema change
  • Processing of semi-structured data

In-production use cases

Within your data stack, RisingWave can assist with:

  • Processing and transforming event streaming data in real time
  • Offloading event-driven queries (e.g., materialized views, triggers) from operational databases
  • Performing real-time ETL (Extract, Transform, Load)
  • Supporting real-time feature stores

Read more at use cases. RisingWave is extensively utilized in real-time applications such as monitoring, alerting, dashboard reporting, machine learning, among others. It has already been adopted in fields such as financial trading, manufacturing, new media, logistics, gaming, and more. Check out customer stories.

Community

Looking for help, discussions, collaboration opportunities, or a casual afternoon chat with our fellow engineers and community members? Join our Slack workspace!

Notes on telemetry

RisingWave collects anonymous usage statistics to better understand how the community is using RisingWave. The sole intention of this exercise is to help improve the product. Users may opt out easily at any time. Please refer to the user documentation for more details.

License

RisingWave is distributed under the Apache License (Version 2.0). Please refer to LICENSE for more information.

Contributing

Thanks for your interest in contributing to the project! Please refer to contribution guidelines for more information.

risingwave's People

Contributors

bowenxiao1999 avatar bugenzhao avatar chenzl25 avatar dependabot[bot] avatar fuyufjh avatar hzxa21 avatar kwannoel avatar li0k avatar little-wallace avatar liurenjie1024 avatar lmatz avatar mrcroxx avatar neverchanje avatar shanicky avatar skyzh avatar soundofdestiny avatar st1page avatar stdrc avatar strikew avatar sunt-ing avatar tabversion avatar tennyzhuang avatar wangrunji0408 avatar wcy-fdu avatar wenym1 avatar xiangjinwu avatar xxchan avatar yezizp2012 avatar yuhao-su avatar zwang28 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

risingwave's Issues

streaming: report actor error

Currently, we simply print the error as warning in console, and let the actor down. In the future, we should report this to dashboard, and restart the actor.

At the same time, some of our implementations also spawns extra futures apart from the actor. We should also manage this. For example, BarrierAligner will spawn two futures to poll data from each of the executors. #2229

Tracking: Implement transaction in hummock

We introduce the concept of transaction in Hummock to aim the checkpointing process so that changes can be committed and rollbacked atomically.

  • Implement Hummock service atop meta service and migrate version manager (#1744 singularity-data/risingwave-legacy#2156)
  • Introduce transaction in hummock service (transaction state, lifecycle, visibility management) (#2513)
  • Support transaction commit (#2513)
  • Support transaction rollback (#2513)
  • State store integration
  • Stream manager integration

Batch: TPC-H Q19

Query:

select
	sum(l_extendedprice* (1 - l_discount)) as revenue
from
	lineitem,
	part
where
	(
		p_partkey = l_partkey
		and p_brand = ':1'
		and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
		and l_quantity >= :4 and l_quantity <= :4 + 10
		and p_size between 1 and 5
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':2'
		and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
		and l_quantity >= :5 and l_quantity <= :5 + 10
		and p_size between 1 and 10
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	)
	or
	(
		p_partkey = l_partkey
		and p_brand = ':3'
		and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
		and l_quantity >= :6 and l_quantity <= :6 + 10
		and p_size between 1 and 15
		and l_shipmode in ('AIR', 'AIR REG')
		and l_shipinstruct = 'DELIVER IN PERSON'
	);

ci: build and test `release` version

Currently, we only test our debug build of our program. In the future, we should use cargo build --release for our e2e test. This is the convention followed by TiKV: run unit tests with debug version, and integration tests with release version.

Tracking: Support TPC-H Queries (for Java frontend)

Currently we've implemented almost all necessary operators and expressions required by TPC-H. This issue tracks bug-fix and implementation of TPC-H queries.

A query is marked as supported iff. it has been added to the end-to-end test.

SQL Features in Queries

Query Group-by (Aggregation) Order-by (Sort/TopN) Join Non-Correlated Subquery Correlated Subquery CTE / View
Q1 โœ”๏ธ โœ”๏ธ
Q2 โœ”๏ธ โœ”๏ธ โœ”๏ธ (value in filter)
Q3 โœ”๏ธ โœ”๏ธ โœ”๏ธ
Q4 โœ”๏ธ โœ”๏ธ โœ”๏ธ (exists)
Q5 โœ”๏ธ โœ”๏ธ โœ”๏ธ
Q6 โœ”๏ธ (simple)
Q7 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (table)
Q8 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (table)
Q9 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (table)
Q10 โœ”๏ธ โœ”๏ธ โœ”๏ธ
Q11 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (value in filter)
Q12 โœ”๏ธ โœ”๏ธ โœ”๏ธ
Q13 โœ”๏ธ โœ”๏ธ โœ”๏ธ (outer) โœ”๏ธ (table)
Q14 โœ”๏ธ (simple) โœ”๏ธ
Q15 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (value in filter) โœ”๏ธ
Q16 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (not in)
Q17 โœ”๏ธ (simple) โœ”๏ธ โœ”๏ธ (table)
Q18 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (in)
Q19 โœ”๏ธ (simple) โœ”๏ธ
Q20 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (in & table)
Q21 โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ (exists)
Q22 โœ”๏ธ โœ”๏ธ โœ”๏ธ (table) โœ”๏ธ (not exists & value in filter)

Batch

  • Batch: TPC-H Q1
  • #177
  • #439
  • #296
  • #440
  • Batch: TPC-H Q6
  • #441
  • #443
  • singularity-data/risingwave-legacy#2392
  • singularity-data/risingwave-legacy#2393
  • singularity-data/risingwave-legacy#2394
  • #217
  • singularity-data/risingwave-legacy#2422
  • singularity-data/risingwave-legacy#2423
  • #272
  • singularity-data/risingwave-legacy#2426
  • singularity-data/risingwave-legacy#2427
  • singularity-data/risingwave-legacy#2428
  • #276
  • #277
  • #278
  • #279

Streaming

  • Streaming: TPC-H Q1
  • #178
  • singularity-data/risingwave-legacy#1907
  • #179
  • singularity-data/risingwave-legacy#1908
  • Streaming: TPC-H Q6
  • singularity-data/risingwave-legacy#2638
  • singularity-data/risingwave-legacy#2644
  • Streaming: TPC-H Q9
  • singularity-data/risingwave-legacy#2556
  • #490
  • singularity-data/risingwave-legacy#2839
  • singularity-data/risingwave-legacy#2840
  • singularity-data/risingwave-legacy#2890
  • #496
  • #491
  • singularity-data/risingwave-legacy#2897
  • singularity-data/risingwave-legacy#1911
  • #497
  • #492
  • #495
  • #494

riselab: add s3 support

Basically there are two ways to support it:

    - use: compute-node
      port: 5689
      exporter-port: 1224
      state-backend: s3://bucket

If using the above way, we will need to add state-backend parameter to every compute-node. In some cases, we need to write it 3 times.

Another way is to make s3 as a mock service.

  default:
    - use: s3
       bucket: xxxx
    - use: meta-node
    - use: compute-node
    - use: prometheus
    - use: frontend

... and for compute-node:

  compute-node:
    address: "127.0.0.1"
    port: 5688
    exporter-address: "127.0.0.1"
    exporter-port: 1222
    id: compute-node-${port}
    provide-minio: "minio*"
    provide-meta-node: "meta-node*"
+   provide-s3: "s3*"
    user-managed: false

Batch: TPC-H Q22

Query:

select
	cntrycode,
	count(*) as numcust,
	sum(c_acctbal) as totacctbal
from
	(
		select
			substring(c_phone from 1 for 2) as cntrycode,
			c_acctbal
		from
			customer
		where
			substring(c_phone from 1 for 2) in
				(':1', ':2', ':3', ':4', ':5', ':6', ':7')
			and c_acctbal > (
				select
					avg(c_acctbal)
				from
					customer
				where
					c_acctbal > 0.00
					and substring(c_phone from 1 for 2) in
						(':1', ':2', ':3', ':4', ':5', ':6', ':7')
			)
			and not exists (
				select
					*
				from
					orders
				where
					o_custkey = c_custkey
			)
	) as custsale
group by
	cntrycode
order by
	cntrycode;

Tracking: state_store benchmark roadmap.

Reference:

Tracking:
Stage one (target 2022.1.5):

  • A command line executable tool ss_bench with following capabilities:
    • Configurable backends: (InMemoryStateStore, hummock+s3, hummock+minio, TikvStateStore, RocksDBStateStore)
    • Configurable hummock options: (SST size, block size, bloom filter, default bucket, checksum algo, etc..)
    • Configurable data sizes: (key length/value length).
    • Configurable mode: (simple get/ simple scan / simple write_batch / mixed).

Stage two (target 2022.1.10):

  • Launch Premetheus endpoint to expose internal storage counters.
  • Configurable concurrencies: (multithreaded get/scan/write_batch operations).
  • Review performance with ss_bench.

Stage three: more likely patterns as our internal states:

  • Sequencial writes within single key space.
  • Random writes within single key space.
  • (Sequencial, random) x (writes, deletes, updates) within single key space.
  • (Sequencial, random) x (writes, deletes, updates) x (single key space, multiple key spaces)
  • (Tiered, Leveled compaction) x (Sequencial, random ingestion) x (single, multiple key space)

Batch: TPC-H Q15

Query:

select
	s_suppkey,
	s_name,
	s_address,
	s_phone,
	total_revenue
from
	supplier,
	(
		select
		l_suppkey,
		sum(l_extendedprice * (1 - l_discount))
	from
		lineitem
	where
		l_shipdate >= date '1993-01-01'
		and l_shipdate < date '1993-01-01' + interval '3' month
	group by
		l_suppkey
	) as revenue0 (supplier_no, total_revenue)
where
	s_suppkey = supplier_no
	and total_revenue = (
		select
			max(total_revenue)
		from
			(
				select
				l_suppkey,
				sum(l_extendedprice * (1 - l_discount))
			from
				lineitem
			where
				l_shipdate >= date '1993-01-01'
				and l_shipdate < date '1993-01-01' + interval '3' month
			group by
				l_suppkey
			) as revenue0 (supplier_no, total_revenue)
	)
order by
	s_suppkey;

Remove single mode

"Single mode" was designed to generate and execute a plan without Exchange, which is only used in early development.

Note that the so-called "single mode" is NOT a mode designed for single node deployment. Exchange is necessary for parallel execution no mater on single node or cluster.

Currently, there is more and more code handling "single mode". It's time to remove these stuff.

batch: HashJoin: support non-equi condition

message HashJoinNode {
  JoinType join_type = 1;
  repeated int32 left_key = 2;
  repeated int32 left_output = 3;
  repeated int32 right_key = 4;
  repeated int32 right_output = 5;
+  expr.ExprNode other_condition = 6;
}

Tracking: unify `Table` and `MView` & use Hummock storage for `TABLE_V2`

There're a lot in common between a TableSouce and MView. In this tracking issue, we will unify their write paths and integrate batch table with Hummock storage. For more information, check this design doc.

  • singularity-data/risingwave-legacy#2412
  • singularity-data/risingwave-legacy#2391
  • singularity-data/risingwave-legacy#2464
  • singularity-data/risingwave-legacy#2492
  • singularity-data/risingwave-legacy#2584
  • Wait for #87 to be ready to support use TableSourceV2 as streaming source.
  • #212
  • #311
  • #387
  • #423
  • Bug fixes...

Tracking: Benckmarking RisingWave

This is to track the works related to benckmarking state store of RisingWave. The final goal is to do the experiments in this doc: https://singularity-data.larksuite.com/docs/docusF6l8lV5BuOCWoq7Cu71fEe

Stage 1

  • Dimensions and metrics for state management of RisingWave
  • Experiments configuration
  • Implement state backends for RocksDB singularity-data/risingwave-legacy#2292
  • Implement state backends for TiKV singularity-data/risingwave-legacy#2202
  • Implement MetricsManager for RisingWave singularity-data/risingwave-legacy#2198 singularity-data/risingwave-legacy#2234

Stage 2

  • singularity-data/risingwave-legacy#2285 singularity-data/risingwave-legacy#2359
  • singularity-data/risingwave-legacy#2410
  • singularity-data/risingwave-legacy#2436
  • #195
  • #194
  • singularity-data/risingwave-legacy#2411
  • #45 (Implement ss_bench tool for state store layer)
  • #127

Stage 3

  • Conduct StateStore benchmark using ss_bench on memory, RocksDB, TiKV and Hummock backends
  • E2E Performance experiments(TPC-H)
  • E2E Performance experiments(Nexmark)
  • Overhead & cost experiments

Tracking: Basic Built-in Functions

Reference: PostgreSQL: Documentation: 14: Chapterย 9.ย Functions and Operators

General for all types

  • TEXT format input / output (cast from / into string)
  • BINARY format input / output
  • singularity-data/risingwave-legacy#1916
  • singularity-data/risingwave-legacy#667
  • /issues/2284

Aggregate

  • count

Equality and Ordering for "most" types

  • = / <> / !=
  • #2684
  • /issues/2283
  • singularity-data/risingwave-legacy#666
  • < / <= / > / >=
  • Implement (NOT) BETWEEN AND functions
  • #12761

Aggregate

  • min / max

Boolean

  • explicit cast from / into int
  • Implement AND/OR/NOT operators
  • IS [NOT] [TRUE | FALSE]
  • #8933

Aggregate

Window functions

Misc

More

Olap: Tpch q7.

Query:

select
	supp_nation,
	cust_nation,
	l_year,
	sum(volume) as revenue
from
	(
		select
			n1.n_name as supp_nation,
			n2.n_name as cust_nation,
			extract(year from l_shipdate) as l_year,
			l_extendedprice * (1 - l_discount) as volume
		from
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2
		where
			s_suppkey = l_suppkey
			and o_orderkey = l_orderkey
			and c_custkey = o_custkey
			and s_nationkey = n1.n_nationkey
			and c_nationkey = n2.n_nationkey
			and (
				(n1.n_name = 'ROMANIA' and n2.n_name = 'INDIA')
				or (n1.n_name = 'INDIA' and n2.n_name = 'ROMANIA')
			)
			and l_shipdate between date '1995-01-01' and date '1996-12-31'
	) as shipping
group by
	supp_nation,
	cust_nation,
	l_year
order by
	supp_nation,
	cust_nation,
	l_year;

ci: e2e coverage for Rust

Currently, e2e doesn't take Rust codebase into account when calculating coverage. In the future, we should do this.

java frontend spends 6s+ to insert 6000 rows

Benchmark TPC-H Q1: insert statement with 6000+ rows is time consuming

Already tried to suppress logging.
Seems like each inserted value is processed multiple passes during planning/optimization.
(Thanks @zehaowei for the report and investigation.)

20220110-104709

Batch: TPC-H Q12

Query:

select
	l_shipmode,
	sum(case
		when o_orderpriority = '1-URGENT'
			or o_orderpriority = '2-HIGH'
			then 1
		else 0
	end) as high_line_count,
	sum(case
		when o_orderpriority <> '1-URGENT'
			and o_orderpriority <> '2-HIGH'
			then 1
		else 0
	end) as low_line_count
from
	orders,
	lineitem
where
	o_orderkey = l_orderkey
	and l_shipmode in ('FOB', 'SHIP')
	and l_commitdate < l_receiptdate
	and l_shipdate < l_commitdate
	and l_receiptdate >= date '1994-01-01'
	and l_receiptdate < date '1994-01-01' + interval '1' year
group by
	l_shipmode
order by
	l_shipmode;

Blocking:

OLAP: tpch q4.

Query:

select
	o_orderpriority,
	count(*) as order_count
from
	orders
where
	o_orderdate >= date '1997-07-01'
	and o_orderdate < date '1997-07-01' + interval '3' month
	and exists (
		select
			*
		from
			lineitem
		where
			l_orderkey = o_orderkey
			and l_commitdate < l_receiptdate
	)
group by
	o_orderpriority
order by
	o_orderpriority;

Blocker:

  • #1916
  • #154
  • singularity-data/risingwave-legacy#2316

Batch: TPC-H Q20

Query:

select
	s_name,
	s_address
from
	supplier,
	nation
where
	s_suppkey in (
		select
			ps_suppkey
		from
			partsupp
		where
			ps_partkey in (
				select
					p_partkey
				from
					part
				where
					p_name like ':1%'
			)
			and ps_availqty > (
				select
					0.5 * sum(l_quantity)
				from
					lineitem
				where
					l_partkey = ps_partkey
					and l_suppkey = ps_suppkey
					and l_shipdate >= date ':2'
					and l_shipdate < date ':2' + interval '1' year
			)
	)
	and s_nationkey = n_nationkey
	and n_name = ':3'
order by
	s_name;

Hummock: bench S3

The granularity of cache population and eviction depends on the performance of S3 put / multi part upload / get / part get / byte-range get. We need to bench S3 first before deciding some designs.

#198

Batch: TPC-H Q21

Query:

select
	s_name,
	count(*) as numwait
from
	supplier,
	lineitem l1,
	orders,
	nation
where
	s_suppkey = l1.l_suppkey
	and o_orderkey = l1.l_orderkey
	and o_orderstatus = 'F'
	and l1.l_receiptdate > l1.l_commitdate
	and exists (
		select
			*
		from
			lineitem l2
		where
			l2.l_orderkey = l1.l_orderkey
			and l2.l_suppkey <> l1.l_suppkey
	)
	and not exists (
		select
			*
		from
			lineitem l3
		where
			l3.l_orderkey = l1.l_orderkey
			and l3.l_suppkey <> l1.l_suppkey
			and l3.l_receiptdate > l3.l_commitdate
	)
	and s_nationkey = n_nationkey
	and n_name = ':1'
group by
	s_name
order by
	numwait desc,
	s_name;

Blocker:

ci: retire `start_cluster.sh`

RiseLAB is now capable of all tasks of start_cluster.sh, and the e2e-test-riselab is now running smoothly across main branch and PRs. After we added source support for RiseLAB e2e, it seems to be a good time to retire start_cluster.sh in CI.

Bug: table row id distributed generation

Currently when inserting tuples into distributed tables, each partition will allocate implicit ids according to their local row number independently. This is done at singularity-data/risingwave-legacy#1613 .

Duplication may occur in a distributed table when union all partitions together.

To fix this, a quick fix is to assign tuple id as a 16-digit parition id + 48 digit local row id.

Batch: TPC-H Q8

Query:

select
	o_year,
	sum(case
		when nation = 'INDIA' then volume
		else 0
	end) / sum(volume) as mkt_share
from
	(
		select
			extract(year from o_orderdate) as o_year,
			l_extendedprice * (1 - l_discount) as volume,
			n2.n_name as nation
		from
			part,
			supplier,
			lineitem,
			orders,
			customer,
			nation n1,
			nation n2,
			region
		where
			p_partkey = l_partkey
			and s_suppkey = l_suppkey
			and l_orderkey = o_orderkey
			and o_custkey = c_custkey
			and c_nationkey = n1.n_nationkey
			and n1.n_regionkey = r_regionkey
			and r_name = 'ASIA'
			and s_nationkey = n2.n_nationkey
			and o_orderdate between date '1995-01-01' and date '1996-12-31'
			and p_type = 'PROMO BRUSHED COPPER'
	) as all_nations
group by
	o_year
order by
	o_year;

Tracking: Frontend Catalog

MVP Features

  • singularity-data/risingwave-legacy#2261
  • singularity-data/risingwave-legacy#2572
  • singularity-data/risingwave-legacy#2669
  • singularity-data/risingwave-legacy#2347
  • singularity-data/risingwave-legacy#2473
  • #958

Refactor

  • singularity-data/risingwave-legacy#2560
  • #415
  • singularity-data/risingwave-legacy#2592
  • singularity-data/risingwave-legacy#2917

Table V2

Tracking: Implement e2e benchmark tool for RisingWave

Mainly benchmarking for state management of RisingWave, e2e throughput, price/performance, data freshness, e2e latency

Functionality

  • Cmd tool
  • Configurable source & workload generator
  • Execution pipeline(pg sql, Kafka, CDC)
  • Metrics service
  • Support using pg SQL & table source to benchmark RisingWave
  • Support using Kafka to benchmark RisingWave
  • Support using CDC(MySQL->Debezium->Kafka) to benchmark RisingWave

Source

  • Implement TPC-H as benchmarking source

Supports min/max aggregation on boolean types.

Min/Max on boolean types is not supported yet.

CREATE TABLE supplier (
        s_suppkey  INTEGER,
        s_name VARCHAR(25),
        s_address VARCHAR(40),
        s_nationkey INTEGER,
        s_phone VARCHAR(15),
        s_acctbal NUMERIC,
        s_comment VARCHAR(101));

select min(s_suppkey > 1) from supplier;

feat: streaming: supports multiple distribution key for HashDispatcher

Currently, Dispatcher only supports single distribution key:

// A dispatcher redistribute messages.
// We encode both the type and other usage information in the proto.
message Dispatcher {
  enum DispatcherType {
    SIMPLE = 0;
    HASH = 1;
    BROADCAST = 2;
  }
  DispatcherType type = 1;
  int32 column_idx = 2; // <--- HERE!!
}

[Tracking] streaming: async flush and operator merge support

Currently, we can only get output of HashAgg after flush is called. That is because states like Max requires:

  1. apply_batch generates a sequence of change log (flush_status)
  2. merge the status to the state store by flushing them in write batch
  3. read the real max from the state store

This will cause several problems:

  • flushing a large incremental state will lead to spike latency
  • in some of our fail-over designs, we can only add SST to the LSM tree when all nodes have processed the barrier

Therefore, managed states shouldn't require that all changes have been flushed before producing the correct output for this epoch.

To achieve this, we will need:

  • implement iterator on StateStore
  • implement state merge on get_output for ManagedExtremeState
  • implement state merge on ManagedStringAggState
  • ... when other ManagedStates are added, add new tasks here
  • support async flush in HashAgg
  • support async flush in SimpleAgg
  • ... when other executors are integrated into the state store, add new tasks here

riselab: support graceful stop

RiseLAB currently only kill the tmux session. In the future, we should support graceful exit.

  • Call tmux list-windows -a -F "#{pane_pid} #{window_name}" to get pid of the components.
  • Send kill -SIGINT -pgid to stop the components.

Negative num mod error

In pg, the result is :

postgres=# SELECT (-32768)::int2 % (-1)::int2;
 ?column?
----------
        0
(1 row)

In our system, the result is:

dev=>SELECT (-32768)::int2 % (-1)::int2;
Error: Out of range
(1 row)

Tracking: Minimal Frontend Framework in Rust

This issue tracks the works to rewrite a basic frontend in rust.

This new frontend is expected to be a replacement for the current Java frontend, but they will coexist for several months. During this period, we may need two end-to-end tests for each frond-end respectively, until the new frontend can cover all features.

Expected Features

In the first stage, we only support

  • CREATE TABLE
  • DROP TABLE
  • INSERT
  • SELECT
    • columns or constant values (no expressions allowed)
    • from one table (no joins allowed)
    • with very simple WHERE condition (column = constant)
  • CREATE MATERIALIZED VIEW
    • Same as SELECT

Components

  • Postgres wire protocol: singularity-data/risingwave-legacy#2019
  • Server: #293
  • Parser: singularity-data/risingwave-legacy#2170
  • Binder: #338
  • Optimizer: #109
  • Catalog: #959
  • Execution: #202
  • Query Manager: #1015
  • #838
  • #1219

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.