Git Product home page Git Product logo

databrewer's Introduction

DataBrewer

Documentation Status

image

image

Coverage Status

Code Quality Status

Requirements Status

The missing datasets manager.

DataBrewer preview

Databrewer let you search and discover datasets. Inspired by Homebrew, it creates and index of known datasets that you can download with a single command. It will provide an API to allow to do the same in, for example, a IPython notebook so you no longer have to manually download datasets.

Quickstart

Install databrewer:

pip install databrewer

Update the recipes index:

databrewer update

Search for some keywords:

databrewer search nyc taxi

Example output:

andresmh-nyc-taxi-trips - NYC Taxi Trips. Data obtained through a FOIA request
nyc-tlc-taxi            - This dataset includes trip records from all trips
                          completed in yellow and green taxis in NYC in 2014 and
                                                    select months of 2015.

Let's check the nyc-tlc-taxi dataset:

databrewer info nyc-tlc-taxi

We can either download the entire dataset (which is huge!):

databrewer download nyc-tlc-taxi

Or just a few files in the dataset, or select a subset:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Note

Note that * is the standard glob operator and [green] acts as selector. The selectors depends on how the recipe if defined. When using selectors you must enclose the name in quotes in most shells.

Finally you need to know where the files are located for further processing:

databrewer download "nyc-tlc-taxi[green][2014-*]"

Example output:

/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-01.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-02.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-03.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-04.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-05.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-06.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-07.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-08.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-09.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-10.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-11.csv
/Users/rolando/.databrewer/datasets/nyc-tlc-taxi/green_tripdata_2014-12.csv

Datasets

The aim is to index known and not-so-known datasets. There is no plans to standarize the dataset format as we want to keep it as published by the authors.

Recipes

Datasets are defined in recipes which contains information about the dataset and where to find it.

These recipes are community maintained and hosted in the databrewer-recipes repository.

Roadmap

  • Include an API. For now it only provides a CLI-interface but in the near future it will include an API so you can search, download and load datasets directly in your Python code.

Contributing

You can help by the following means:

See CONTRIBUTING.rst for more information.

databrewer's People

Contributors

rmax avatar cathalgarvey avatar rolando avatar

Watchers

James Cloos avatar Wall'e avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.