OHE error handling... casting string ass boolean?

dbt-ml-preprocessing

A package for dbt which enables standardization of data sets. You can use it to build a feature store in your data warehouse, without using external libraries like Spark's mllib or Python's scikit-learn.

The package contains a set of macros that mirror the functionality of the scikit-learn preprocessing module. Originally they were developed as part of the 2019 Medium article Feature Engineering in Snowflake.

Currently they have been tested in Snowflake, Redshift , BigQuery, SQL Server and PostgreSQL 13.2. The test case expectations have been built using scikit-learn (see *.py in integration_tests/data/sql), so you can expect behavioural parity with it.

⚠️ There are now several better alternatives to this package. If you're using Snowflake, they now offer the snowflake-ml-python package which is fully supported and much more comprehensive. Within dbt, the Python models feature allows Snowflake, BigQuery and Databricks users to use scikit-learn directly

The macros are:

scikit-learn function	macro name	Snowflake	BigQuery	Redshift	MSSQL	PostgreSQL
KBinsDiscretizer	k_bins_discretizer	Y	Y	Y	Y	Y
LabelEncoder	label_encoder	Y	Y	Y	Y	Y
MaxAbsScaler	max_abs_scaler	Y	Y	Y	Y	Y
MinMaxScaler	min_max_scaler	Y	Y	Y	Y	Y
Normalizer	normalizer	Y	Y	Y	Y	Y
OneHotEncoder	one_hot_encoder	Y	Y	Y	Y	Y
QuantileTransformer	quantile_transformer	Y	Y	N	N	Y
RobustScaler	robust_scaler	Y	Y	Y	Y	Y
StandardScaler	standard_scaler	Y	Y	Y	N	Y

* 2D charts taken from scikit-learn.org, GIFs are my own

Installation

To use this in your dbt project, create or modify packages.yml to include:

packages:
  - package: "omnata-labs/dbt_ml_preprocessing"
    version: [">=1.0.2"]

(replace the revision number with the latest)

Then run: dbt deps to import the package.

dbt 1.0.0 compatibility

dbt-ml-preprocessing version 1.2.0 is the first version to support (and require) dbt 1.0.0.

If you are not ready to upgrade to dbt 1.0.0, please use dbt-ml-preprocessing version 1.0.2.

Usage

To read the macro documentation and see examples, simply generate your docs, and you'll see macro documentation in the Projects tree under dbt_ml_preprocessing:

	{%- if handle_unknown=='error' %}
	case
	when {{ source_column }} = '{{ category }}' then true
	when {{ source_column }} in ('{{ category_values \| join("','") }}') then false
	else cast('Error: unknown value found and handle_unknown parameter was "error"' as boolean)
	end as is_{{ source_column }}_{{ no_whitespace_column_name }}
	{% endif %}

omnata-labs / dbt-ml-preprocessing Goto Github PK

dbt-ml-preprocessing's Introduction

dbt-ml-preprocessing

Installation

dbt 1.0.0 compatibility

Usage

dbt-ml-preprocessing's People

Contributors

Stargazers

Watchers

Forkers

dbt-ml-preprocessing's Issues

Example usage

goal compiled SQL

possible uses

Bug

Quick Solution

Recommend Projects

Recommend Topics

Recommend Org