Git Product home page Git Product logo

flink-faker's Introduction

Build Status

flink-faker

flink-faker is an Apache Flink table source that generates fake data based on the Java Faker expression provided for each column.

Checkout this demo web application for some example Java Faker expressions.

This project is inspired by voluble.

Package

mvn clean package

Adding flink-faker to Flink SQL Client

  1. Download Flink from the Apache Flink website.
  2. Download the flink-faker JAR from the Releases page (or build it yourself).
  3. Put the downloaded jars under FLINK_HOME/lib/.
  4. (Re)Start a Flink cluster.
  5. (Re)Start the Flink CLI.

Adding flink-faker to Ververica Platform

  1. Setup Ververica Platform.
  2. Get the link to the flink-faker JAR from the Releases.
  3. Start Ververica Platorm > SQL > Connectors > Create Connector, provide the external URL from step 2 and finish the setup.
    Howto add flink-faker to Ververica Platform

Usage

As ScanTableSource

CREATE TEMPORARY TABLE heros (
  `name` STRING,
  `power` STRING, 
  `age` INT
) WITH (
  'connector' = 'faker', 
  'fields.name.expression' = '#{superhero.name}',
  'fields.power.expression' = '#{superhero.power}',
  'fields.power.null-rate' = '0.05',
  'fields.age.expression' = '#{number.numberBetween ''0'',''1000''}'
);

SELECT * FROM heros;

As LookupTableSource

CREATE TEMPORARY TABLE location_updates (
  `character_id` INT,
  `location` STRING,
  `proctime` AS PROCTIME()
)
WITH (
  'connector' = 'faker', 
  'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
  'fields.location.expression' = '#{harry_potter.location}'
);

CREATE TEMPORARY TABLE characters (
  `character_id` INT,
  `name` STRING
)
WITH (
  'connector' = 'faker', 
  'fields.character_id.expression' = '#{number.numberBetween ''0'',''100''}',
  'fields.name.expression' = '#{harry_potter.characters}'
);

SELECT 
  c.character_id,
  l.location,
  c.name
FROM location_updates AS l
JOIN characters FOR SYSTEM_TIME AS OF proctime AS c
ON l.character_id = c.character_id;

Currently, the faker source supports the following data types:

  • CHAR
  • VARCHAR
  • STRING
  • TINYINT
  • SMALLINT
  • INTEGER
  • BIGINT
  • FLOAT
  • DOUBLE
  • DECIMAL
  • BOOLEAN
  • TIMESTAMP
  • ARRAY
  • MAP
  • MULTISET
  • ROW

Connector Options

Connector Option Default Description
number-of-rows None The number of rows to produce. If this is options is set, the source is bounded otherwise it is unbounded and runs indefinitely.
rows-per-second 10000 The maximum rate at which the source produces records.
fields.<field>.expression None The Java Faker expression to generate the values for this field.
fields.<field>.null-rate 0.0 Fraction of rows for which this field is null
fields.<field>.length 1 Size of array, map or multiset

On Timestamps

For rows of type TIMESTAMP, the corresponding Java Faker expression needs to return a timestamp formatted as EEE MMM dd HH:mm:ss zzz yyyy. Typically, you would use one of the following expressions:

CREATE TEMPORARY TABLE timestamp_example (
  `timestamp1` TIMESTAMP(3),
  `timestamp2` TIMESTAMP(3)
)
WITH (
  'connector' = 'faker', 
  'fields.timestamp1.expression' = '#{date.past ''15'',''SECONDS''}',
  'fields.timestamp2.expression' = '#{date.past ''15'',''5'',''SECONDS''}'
);

SELECT * FROM timestamp_example;

For timestamp1 Java Faker will generate a random timestamp that lies at most 15 seconds in the past. For timestamp2 Java Faker will generate a random timestamp, that lies at most 15 seconds in the past, but at least 5 seconds.

On Collection Data Types

The usage of ARRAY, MULTISET, MAP and ROW types is shown in the following example.

CREATE TEMPORARY TABLE hp (
  `character-with-age` MAP<STRING,INT>,
  `spells` MULTISET<STRING>,
  `locations` ARRAY<STRING>,
  `house-points` ROW<`house` STRING, `points` INT>
) WITH (
  'connector' = 'faker',
  'fields.character-with-age.key.expression' = '#{harry_potter.character}',
  'fields.character-with-age.value.expression' = '#{number.numberBetween ''10'',''100''}',
  'fields.character-with-age.length' = '2',
  'fields.spells.expression' = '#{harry_potter.spell}',
  'fields.spells.length' = '5',
  'fields.locations.expression' = '#{harry_potter.location}',
  'fields.locations.length' = '3',
  'fields.house-points.house.expression' = '#{harry_potter.house}',
  'fields.house-points.points.expression' = '#{number.numberBetween ''10'',''100''}'
);

SELECT * FROM hp;

"One Of" Columns

The Java Faker expression to pick a random value from a list of options is not straight forward to get right. Actually, I did not manage to get Options.option work at all. As a workaround, I recommend using regexify for this use case.

CREATE TEMPORARY TABLE orders (
  `order_id` INT,
  `order_status` STRING
)
WITH (
  'connector' = 'faker', 
  'fields.order_id.expression' = '#{number.numberBetween ''0'',''100''}',
  'fields.order_status.expression' = '#{regexify ''(RECEIVED|SHIPPED|CANCELLED){1}''}'
);

SELECT * FROM orders;

License

Copyright © 2020-2021 Konstantin Knauf

Distributed under Apache License, Version 2.0.

flink-faker's People

Contributors

alpinegizmo avatar anora10 avatar knaufk avatar martijnvisser avatar mend-bolt-for-github[bot] avatar renovate-bot avatar sv3ndk avatar tsreaper avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.