Git Product home page Git Product logo

barbeque's People

Contributors

rendybjunior avatar

barbeque's Issues

As bbq engineer, I want to have a script that generate sample data, so that I could test the functionality easily

Goal

Automate sample data generation to csv file.

Acceptance

When generate_data.py executed, it will produce a barbeque_sales.csv with schema as shown at #1 :

  • id : integer, auto increment transaction id
  • brand_id : string, brand of the barbeque
  • amount : integer, sales amount
  • _PARTITIONTIME : timestamp, when the sales happen
    For now, load to BQ will be done manually via UI

Out of scope

Automatically load data to BQ using script

As bbq engineer, I want to create sql builder, so that I could generate sql based on config

Goal

Create bigquery sql builder from config and cmd param

Acceptance

Given an object of job config and cmd param, necessary sql could be generated
Example:
See command and yaml at #1
It will produce BigQuery Standard SQL as per below:

SELECT
  DATE(_PARTITIONTIME) AS day,
  brand_id,
  SUM(amount) AS amount_sum,
  COUNT(*) AS cnt
FROM `barbeque.sales`
WHERE _PARTITIONTIME >= "2018-06-17 00:00:00" AND _PARTITIONTIME < "2018-06-30 00:00:00"
GROUP BY 1, 2

Out of Scope

  • config reader from file is out of scope, config and date param will be an in memory object
  • fire sql to bq is out of scope

As a sad data engineer, I want to have a simple way to do day-partition-preserving summarization so that I could focus more on data quality and be happier

Goal

MVP of barbeque, creating partition preserving summary, with assumptions:

  • Only support day-partitioned table
  • Partition time equal to event time. Any out of sync timestamp will be discarded. No feature for custom time field yet.
  • Only support count and sum aggregation. Count is there by default.

Acceptance

Given this command: bbq sales.yml --start_dt="2018-06-17" --end_dt="2018-06-30" barbeque will read source table and write into target table (in replace mode) with name equal to job name, with partition preserved.

sales.yml :

name: sales_count_by_brand_id_day
type: day_partition_preserving
table: barbeque.sales
keys:
- brand_id
aggr:
- field: amount
  func: sum

Sample data source: (timestamp is partition time)

id brand_id amount timestamp
1 super_bbq 123 2018-06-17 08:00:00
2 super_bbq 456 2018-06-17 14:00:00
3 super_bbq 142 2018-06-18 13:00:00
4 just_ok_bbq 542 2017-06-20 09:00:00

Sample data result:

day brand_id cnt amount_sum _PARTITIONTIME
2018-06-17 super_bbq 2 579 2018-06-17 00:00:00
2018-06-18 super_bbq 1 142 2018-06-18 00:00:00
2018-06-20 just_ok_bbq 1 542 2018-06-20 00:00:00

Out of Scope

Additional feature planned (written here for self-note purpose):

  • condition_sql: "entity_id IS NOT NULL AND entity_id <> 0"
  • partition_padding: day_1 // assume partition time and timestamp field in sync, or assume +- n day

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.