ActivitySchema

The Activity Schema data model is a standard designed for modern data warehouses. It is designed to make data modeling and analysis substantially simpler, faster, and more reliable than existing methodologies (e.g. a star schema)

The core approach is to separate modeling from querying. Models should be fully independent: asking new data questions should not require creating new models. Instead, queries can combine any number of existing models to build anything they need.

This is done by modeling all data in the warehouse as a single time series table.

For more info also see the home page or read the spec.

How it Works

At its core an activity schema consists of transforming raw tables into a single, time series table called an activity stream. All downstream plots, tables, materialized views, etc used for BI are built directly from that single table, with no other dependencies.

Traditional Data Modeling - many custom-structured tables

Activity Schema - one standard table

Business concepts are represented as entity doing an activity ('a customer completed an order') instead of facts or nouns (orders, products). Activities are built directly from source tables, store only their own data, and are the single source of truth for each concept.

All queries run against an activity stream table to assemble data for analysis, BI, and reporting. Instead of traditional foreign key joins, queries combine activities using relationships in time (e.g. all customers who completed an order and submitted a support ticket before their next completed order).

Advantages

Maintainable data models

fewer models — one (and only one) business concept per activity means fewer models to manage, understand, and maintain
easier to build - no joins between models means no need to tie disparate source systems together.
easily accommodate changes to source data — only need to update a single activity
simple data lineage — a single data layer makes tracing data provenance and debugging far easier
faster updates — time-series modeling means incremental updates (rather than full rebuilds) by default
lower data latency - no dependencies means data is available upon insertion, rather than waiting for cascading rebuilds
no data dictionaries — fewer models, with one concept each, makes them vastly easier to document

Faster analysis and querying

single source of truth — because activity represents a single concept (like a 'page view' or 'completed order'), it's always clear which activity to use
query across many source systems — time-based joins means any activity can be queried and combined with another without defining foreign keys
reusable analyses — a standard data model means that any analysis can be reused across companies. This means a customer acquisition cost calculation for one company can be shared with another
autogenerated queries — a standard data model means that queries don't have to be written by hand
true ad-hoc querying — because all activities are related in time, swapping one activity for another requires no structural changes to queries.
high performance — queries run substantially faster against an activity stream table, which has fewer columns, requires fewer joins, and can be easily partitioned / indexed by time.

Spec

Learn more by reading the full activity schema specification.

Resources

The activity schema home page has more info on the approach and its benefits

Building an activity schema

The implementation page covers how to build an activity schema in production.

A new activity schema dbt package has some helpful macros for building activity stream tables.

Known Implementations

Narrator provides a full implementation of the activity schema as a service. The Narrator team directly supports the growth of the activity schema, and is always happy to provide suggestions on how to build one yourself.

Community

The dbt Slack has a #modeling-activity-schema channel for discussion about the activity schema.

activityschema Goto Github PK

ActivitySchema

How it Works

Advantages

Spec

Resources

Building an activity schema

Known Implementations

Community

activityschema's Projects

activityschema

activityschema.github.io

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent