Git Product home page Git Product logo

etl-workflow's Introduction

ETL-workflow

ETL workflow and data analysis. ETL-workflow using prefect and pygrametl (SCD, slow changing dimension). Product classification based on product name.

How to guide:

Please run the sql script to create db or please create manually database with name :product_sales.

CREATE DATABASE IF NOT EXISTS product_sales ;

I had security inplace with user:root and password root123. Please update the code accordingly.

All the tables will be created from the input CSV file. CSV file can found in the data repository.

Run this Program

  1. ETL-workflow.py:

    Dependency: Input filename:./data/Clothing_Sales_Data_Unique_category.csv Sql Database should exist.

Output: 4 distinct csv files for staging purpose of star schema and 1 csv for transformed file which will be created after all the transformation file can found in . ./data/stage/*

  1. Slowly-changing-dimension.py: Dependency: sql connection if the Databse and table is there then you can run this script. output update one column.

  2. Knn.py Dependency: dataframe-utils.py vectorize.py Output will be one csv file with result comparison. ./data/results/TFIDF_distinctProduct_result.csv

Refrences:

1. Dataset Related information https://demos.componentone.com/aspnet/adventureworks/Products.aspx

2. Prefect Library https://docs.prefect.io/core/tutorials/etl.html

3. Pygrametl  https://chrthomsen.github.io/pygrametl/doc/examples/dimensions.html

4. Classification https://github.com/gallib2/product-categorization

etl-workflow's People

Contributors

rajrohan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.