Git Product home page Git Product logo

tubular's Introduction

Tubular pre-processing for machine learning!


PyPI Read the Docs GitHub GitHub last commit GitHub issues Build Binder

tubular implements pre-processing steps for tabular data commonly used in machine learning pipelines.

The transformers are compatible with scikit-learn Pipelines. Each has a transform method to apply the pre-processing step to data and a fit method to learn the relevant information from the data, if applicable.

The transformers in tubular work with data in pandas DataFrames.

There are a variety of transformers to assist with;

  • capping
  • dates
  • imputation
  • mapping
  • categorical encoding
  • numeric operations

Here is a simple example of applying capping to two columns;

from tubular.capping import CappingTransformer
import pandas as pd
from sklearn.datasets import fetch_california_housing

# load the california housing dataset
cali = fetch_california_housing()
X = pd.DataFrame(cali['data'], columns=cali['feature_names'])

# initialise a capping transformer for 2 columns
capper = CappingTransformer(capping_values = {'AveOccup': [0, 10], 'HouseAge': [0, 50]})

# transform the data
X_capped = capper.transform(X)

Installation

The easiest way to get tubular is directly from pypi with;

pip install tubular

Documentation

The documentation for tubular can be found on readthedocs.

Instructions for building the docs locally can be found in docs/README.

Examples

To help get started there are example notebooks in the examples folder in the repo that show how to use each transformer.

To open the example notebooks in binder click here or click on the launch binder shield above and then click on the directory button in the side bar to the left to navigate to the specific notebook.

Issues

For bugs and feature requests please open an issue.

Build and test

The test framework we are using for this project is pytest. To build the package locally and run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/lvgig/tubular.git
cd tubular

Next install tubular and development dependencies;

pip install . -r requirements-dev.txt

Finally run the test suite with pytest;

pytest

Contribute

tubular is under active development, we're super excited if you're interested in contributing!

See the CONTRIBUTING file for the full details of our working practices.

tubular's People

Contributors

richardangell avatar munichpavel avatar clairef57 avatar nedwebster avatar bissoligiulia avatar lsumption avatar merve-alanyali avatar shreenapatel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.