The cc-for-apache-flink-demos from martijnvisser

This repository contains Flink SQL demo queries that can be used on Confluent Cloud for Apache Flink®.

Running these queries requires 4 topics to be set up using the Datagen connector, using AVRO and Schema Registry.

Topic clickstream -> Datagen template SHOE_CLICKSTREAM
Topic customers -> Datagen template SHOE_ORDERS
Topic orders -> Datagen template SHOE_CUSTOMERS
Topic shoes -> Datagen template SHOES

Explore your data

SELECT * FROM orders;

Count number of unique orders per hour

SELECT window_start, window_end, COUNT(DISTINCT order_id) as nr_of_orders
  FROM TABLE(
    TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '1' HOUR))
  GROUP BY window_start, window_end;

Create table including underlying Kafka topic

CREATE TABLE sales_per_hour (
 window_start TIMESTAMP(3),
 window_end   TIMESTAMP(3),
 nr_of_orders BIGINT
);

Materialize number of unique orders per hour in newly created topic

INSERT INTO sales_per_hour
 SELECT window_start, window_end, COUNT(DISTINCT order_id) as nr_of_orders
    FROM TABLE(
      TUMBLE(TABLE `demo`.`webshop_team`.`orders`, DESCRIPTOR($rowtime), INTERVAL '1' HOUR))
    GROUP BY window_start, window_end;

Deduplicate table

SELECT id, name, brand
FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY $rowtime DESC) AS row_num
  FROM `shoes`)
WHERE row_num = 1

Enrich clickstream data by joining with deduplicated table

WITH uniqueShoes AS (
SELECT id, name, brand
FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY $rowtime DESC) AS row_num
  FROM `demo`.`lkc-6wk6y2`.`shoes`)
WHERE row_num = 1
)
SELECT 
  c.`$rowtime`,
  c.product_id,
  s.name, 
  s.brand
FROM 
  clickstream c
  INNER JOIN uniqueShoes s ON c.product_id = s.id;

NOTE: Make sure that you select the correct full qualified names, because they are specific for your setup

Measure average view time of less then 30

SELECT *
FROM clickstream
    MATCH_RECOGNIZE (
        PARTITION BY user_id
        ORDER BY `$rowtime`
        MEASURES
            FIRST(A.`$rowtime`) AS start_tstamp,
            LAST(A.`$rowtime`) AS end_tstamp,
            AVG(A.view_time) AS avgViewTime
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A+ B)
        DEFINE
            A AS AVG(A.view_time) < 30
    ) MR;

martijnvisser / cc-for-apache-flink-demos Goto Github PK

cc-for-apache-flink-demos's Introduction

Explore your data

Count number of unique orders per hour

Create table including underlying Kafka topic

Materialize number of unique orders per hour in newly created topic

Deduplicate table

Enrich clickstream data by joining with deduplicated table

Measure average view time of less then 30

cc-for-apache-flink-demos's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent