data-making-guidelines's Introduction

Making Data, the DataMade Way

This is DataMade's guide to extracting, transforming and loading (ETL) data using Make, a common command line utility.

ETL refers to the general process of:

taking raw source data ("Extract")
doing some stuff to get the data in shape, possibly involving intermediate derived files ("Transform")
producing final output in a more usable form (for "Loading" into something that consumes the data - be it an app, a system, a visualization, etc.)

Having a standard ETL workflow helps us make sure that our work is clean, consistent, and easy to reproduce. By following these guidelines you'll be able to keep your work up to date and share it with the world in a standard format - all with as few headaches as possible.

Basic Principles

These five principles inform all of our data work:

Never destroy data - treat source data as immutable, and show your work when you modify it
Be able to deterministically produce the final data with one command
Write as little custom code as possible
Use standard tools whenever possible
Keep source data under version control

Unsure how to follow these principles? Read on!

The Guide

Code examples

Some Annotated ETL Code Examples with Make
Chicago Lead - data work with a clear README and Makefile
EITC Works - adding data attributes to Illinois House and Senate district shapefiles and outputting at GeoJSON

zhengyu-huang / data-making-guidelines Goto Github PK

data-making-guidelines's Introduction

Making Data, the DataMade Way

Basic Principles

The Guide

Code examples

Further reading

data-making-guidelines's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent