Git Product home page Git Product logo

cc-for-apache-flink-demos's Introduction

This repository contains Flink SQL demo queries that can be used on Confluent Cloud for Apache Flink®.

Running these queries requires 4 topics to be set up using the Datagen connector, using AVRO and Schema Registry.

  • Topic clickstream -> Datagen template SHOE_CLICKSTREAM
  • Topic customers -> Datagen template SHOE_ORDERS
  • Topic orders -> Datagen template SHOE_CUSTOMERS
  • Topic shoes -> Datagen template SHOES

Explore your data

SELECT * FROM orders;

Count number of unique orders per hour

SELECT window_start, window_end, COUNT(DISTINCT order_id) as nr_of_orders
  FROM TABLE(
    TUMBLE(TABLE orders, DESCRIPTOR($rowtime), INTERVAL '1' HOUR))
  GROUP BY window_start, window_end;

Create table including underlying Kafka topic

CREATE TABLE sales_per_hour (
 window_start TIMESTAMP(3),
 window_end   TIMESTAMP(3),
 nr_of_orders BIGINT
);

Materialize number of unique orders per hour in newly created topic

INSERT INTO sales_per_hour
 SELECT window_start, window_end, COUNT(DISTINCT order_id) as nr_of_orders
    FROM TABLE(
      TUMBLE(TABLE `demo`.`webshop_team`.`orders`, DESCRIPTOR($rowtime), INTERVAL '1' HOUR))
    GROUP BY window_start, window_end;

Deduplicate table

SELECT id, name, brand
FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY $rowtime DESC) AS row_num
  FROM `shoes`)
WHERE row_num = 1

Enrich clickstream data by joining with deduplicated table

WITH uniqueShoes AS (
SELECT id, name, brand
FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY $rowtime DESC) AS row_num
  FROM `demo`.`lkc-6wk6y2`.`shoes`)
WHERE row_num = 1
)
SELECT 
  c.`$rowtime`,
  c.product_id,
  s.name, 
  s.brand
FROM 
  clickstream c
  INNER JOIN uniqueShoes s ON c.product_id = s.id;

NOTE: Make sure that you select the correct full qualified names, because they are specific for your setup

Measure average view time of less then 30

SELECT *
FROM clickstream
    MATCH_RECOGNIZE (
        PARTITION BY user_id
        ORDER BY `$rowtime`
        MEASURES
            FIRST(A.`$rowtime`) AS start_tstamp,
            LAST(A.`$rowtime`) AS end_tstamp,
            AVG(A.view_time) AS avgViewTime
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A+ B)
        DEFINE
            A AS AVG(A.view_time) < 30
    ) MR;

cc-for-apache-flink-demos's People

Contributors

martijnvisser avatar

Watchers

 avatar

Forkers

jakebogie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.