Git Product home page Git Product logo

The Activity Schema data model is a standard designed for modern data warehouses. It is designed to make data modeling and analysis substantially simpler, faster, and more reliable than existing methodologies (e.g. a star schema)

The core approach is to separate modeling from querying. Models should be fully independent: asking new data questions should not require creating new models. Instead, queries can combine any number of existing models to build anything they need.

This is done by modeling all data in the warehouse as a single time series table.

For more info also see the home page or read the spec.


How it Works

At its core an activity schema consists of transforming raw tables into a single, time series table called an activity stream. All downstream plots, tables, materialized views, etc used for BI are built directly from that single table, with no other dependencies.


Traditional Data Modeling - many custom-structured tables

image



Activity Schema - one standard table

image

Business concepts are represented as entity doing an activity ('a customer completed an order') instead of facts or nouns (orders, products). Activities are built directly from source tables, store only their own data, and are the single source of truth for each concept.

All queries run against an activity stream table to assemble data for analysis, BI, and reporting. Instead of traditional foreign key joins, queries combine activities using relationships in time (e.g. all customers who completed an order and submitted a support ticket before their next completed order).



Advantages

Maintainable data models

  • fewer models — one (and only one) business concept per activity means fewer models to manage, understand, and maintain
  • easier to build - no joins between models means no need to tie disparate source systems together.
  • easily accommodate changes to source data — only need to update a single activity
  • simple data lineage — a single data layer makes tracing data provenance and debugging far easier
  • faster updates — time-series modeling means incremental updates (rather than full rebuilds) by default
  • lower data latency - no dependencies means data is available upon insertion, rather than waiting for cascading rebuilds
  • no data dictionaries — fewer models, with one concept each, makes them vastly easier to document

Faster analysis and querying

  • single source of truth — because activity represents a single concept (like a 'page view' or 'completed order'), it's always clear which activity to use
  • query across many source systems — time-based joins means any activity can be queried and combined with another without defining foreign keys
  • reusable analyses — a standard data model means that any analysis can be reused across companies. This means a customer acquisition cost calculation for one company can be shared with another
  • autogenerated queries — a standard data model means that queries don't have to be written by hand
  • true ad-hoc querying — because all activities are related in time, swapping one activity for another requires no structural changes to queries.
  • high performance — queries run substantially faster against an activity stream table, which has fewer columns, requires fewer joins, and can be easily partitioned / indexed by time.


Spec

Learn more by reading the full activity schema specification.


Resources

The activity schema home page has more info on the approach and its benefits


Building an activity schema

The implementation page covers how to build an activity schema in production.

A new activity schema dbt package has some helpful macros for building activity stream tables.


Known Implementations

Narrator provides a full implementation of the activity schema as a service. The Narrator team directly supports the growth of the activity schema, and is always happy to provide suggestions on how to build one yourself.


Community

The dbt Slack has a #modeling-activity-schema channel for discussion about the activity schema.

activityschema's Projects

activityschema icon activityschema

Repository for the ActivitySchema spec and supporting materials

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.