Git Product home page Git Product logo

adityakonda / spark-app-dailyproductrevenue Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 2.41 MB

This application will calculate the daily product revenue that displays date in ascending order and revenue in decending order in Spark & MySQL. It also demonstrates how to reduce Stages & Task in Spark using broadcast variables.

Scala 20.35% CSS 12.01% XSLT 57.68% Java 9.96%
spark scala mysql java multithreading

spark-app-dailyproductrevenue's Introduction

Daily Product Revenue Application

DailyProductRevenue in MySQL & Spark

This application will calculate the daily product revenue that displays date in ascending order and revenue in decending order in Spark & MySQL. It also demonstrates how to reduce Stages & Task in Spark using broadcast variables.

Retail Database Schema

MySQL - DailyProductRevenue


SELECT  o.order_date, sum(oi.order_item_subtotal), p.product_name
FROM 
 (
	retail_db.order_items oi JOIN retail_db.orders o ON oi.order_item_order_id = o.order_id
	JOIN
	retail_db.products p ON p.product_id = oi.order_item_product_id
  )
 GROUP BY o.order_date, p.product_name
 ORDER BY o.order_date, sum(oi.order_item_subtotal) DESC;

Spark - DailyProductRevenue Scala Application Link:

Spark - DailyProductRevenue - Joining Data without using Broadcast Variables

  • Stage 0: Reading orders data from HDFS and convering into (K,V) --> orderMap --> (orderID, orderDate)
  • Stage 1: Reading order_items data from HDFS and convering into (K,V) --> orderItemMap -> (orderID, (productID, order_itemSubTotal))
  • Stage 2: Joining orderMap & orderItemMap --> ordersJoin(K,V) --> (orderID, (orderDate ,(productID, order_itemSubTotal))) and convering into (K,V) --> orderJoinMap(K,V) --> ((orderDate, productID), order_itemSubTotal)
  • Stage 3: grouping orderJoinMap(K,V) and aggregating the oroduct revenue --> dailyRevenuePerProductID(K,V) --> ((orderDate, productID), sum(order_itemSubTotal)) and converting to (K,V) --> dailyRevenuePerProductIDMap(K,V) --> (productID, (orderDate, sum(order_itemSubTotal)))
  • Stage 4: Reading product data from Local File System and converting to (K,V) productRDDMap(K,V) --> (productID, productName)
  • Stage 5: Joining dailyRevenuePerProductIDMap(K,V) & productRDDMap(K,V) --> dailyRevenuePerProductNameLocal(K,V) --> ((orderDate, productName), sum(order_itemSubTotal))

Spark - DailyProductRevenue - Joining Data using Broadcast Variables

  • Stage 0: Reading orders data from HDFS and convering into (K,V) --> orderMap --> (orderID, orderDate)
  • Stage 1: Reading order_items data from HDFS and convering into (K,V) --> orderItemMap -> (orderID, (productID, order_itemSubTotal))
  • Stage 2: Joining orderMap & orderItemMap --> ordersJoin(K,V) --> (orderID, (orderDate ,(productID, order_itemSubTotal))) and convering into (K,V) --> orderJoinMap(K,V) --> ((orderDate, productID), order_itemSubTotal)
  • Stage 3: grouping orderJoinMap(K,V) and aggregating the oroduct revenue --> dailyRevenuePerProductID(K,V) --> ((orderDate, productID), sum(order_itemSubTotal)) and converting to (K,V) using broadcast-variable(product) --> dailyRevenuePerProductName(K,V) --> ((orderDate, sum(order_itemSubTotal)) , productName)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.