Git Product home page Git Product logo

categorical_encoding's Introduction

categorical-encoding

Tests

categorical-encoding is a Python library for encoding categorical data, intended for use with Featuretools. categorical-encoding allows for seamless encoding of data and integration into Featuretools pipeline for automated feature engineering within the machine learning pipeline.

Install

python -m pip install "featuretools[categorical_encoding]"

Description

Install Demo Guide Requirements

python -m pip install demo-requirements.txt

For more general questions regarding how to use categorical encoding in a machine learning pipeline, consult the guides located in the categorical encoding github repository.

>>> feature_matrix
    product_id  purchased  value countrycode
id
0    coke zero       True    0.0          US
1    coke zero       True    5.0          US
2    coke zero       True   10.0          US
3          car       True   15.0          US
4          car       True   20.0          US
5   toothpaste       True    0.0          AL

Integrates into standard procedure of train/test split within applied machine learning processes.

>>> train_data = feature_matrix.iloc[[0, 1, 4, 5]]
>>> train_data
    product_id  purchased  value countrycode
id
0    coke zero       True    0.0          US
1    coke zero       True    5.0          US
4          car       True   20.0          US
5   toothpaste       True    0.0          AL
>>> test_data = feature_matrix.iloc[[2, 3]]
>>> test_data
   product_id  purchased  value countrycode
id
2   coke zero       True   10.0          US
3         car       True   15.0          US
>>> import categorical_encoding as ce
>>> encoder = ce.Encoder(method='leave_one_out')
>>> train_enc = encoder.fit_transform(train_data, features, train_data['value'])
>>> test_enc = encoder.transform(test_data)

Encoder fits and transforms to train data, and then transforms test data using its learned fitted encoding.

>>> train_enc
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
0                       5.00       True    0.0                      12.50
1                       0.00       True    5.0                      10.00
4                       6.25       True   20.0                       2.50
5                       6.25       True    0.0                       6.25
>>> test_enc
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
2                       2.50       True   10.0                   8.333333
3                       6.25       True   15.0                   8.333333

Supports easy integration into Featuretools through its support and use of features. First, learn features through fitting an encoder to data. Then, when new data comes in, easily prepare it for your trained machine learning model by using those features to seamlessly generate new tables of encoded data.

>>> features = encoder.get_features()
[<Feature: PRODUCT_ID_leave_one_out>,
 <Feature: purchased>,
 <Feature: value>,
 <Feature: COUNTRYCODE_leave_one_out>]
>>> features_encoded = enc.get_features()
>>> fm2_encoded = ft.calculate_feature_matrix(features_encoded, es, instance_ids=[6,7])
>>> fm2_encoded
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
6                       6.25       True    1.0                       6.25
7                       6.25       True    2.0                       6.25

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

categorical_encoding's People

Contributors

alexjwang avatar gsheni avatar jeff-hernandez avatar kmax12 avatar rwedge avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.