Git Product home page Git Product logo

just-ad-d's Introduction

Just (Ad)D : Adding Advertisements on the fly

A Distributed Streaming Data Pipeline

alt text

Table of Contents

  1. Introduction
  2. Problem Statement
  3. Data Pipeline
  4. Workflow
  5. Data Source
  6. Repo Structure
  7. Slides

Introduction

Just (Ad)D is a distributed streaming data pipeline for analyzing an ad performance in real-time to potentially add them onto websites that have high user traffic.

ProblemStatement

Advertisements are all about gaining user attention, which includes targeting the right consumers, at the right times, through the right channels. Therefore, optimizing user attention to improving conversion rates is crucial for advertisers.

DataPipeline

alt text

For ease of deployment and to avoid gruesome network configurations with individual EC2 instances, this pipeline design utilizes managed services like Confluent cloud for Kafka cluster and AWS EMR for Spark cluster.

Workflow

  • Sample dataset is stored in an EC2 instance.
  • In EC2, simulated messages are produced to page views and click event topics in Confluent Cloud which provides Kafka cluster as a service.
  • The messages are consumed by Spark to process the stream of messages.
  • Two main calculations:
    Counting the number of clicks/views for each advertisement within the event-time window
    Top 3 websites with user surge from which platform(Mobile/Tablet/Desktop) and source (Internal/Social/Search)
  • Windowing and watermark usages are demonstrated to handle late or out of order data.
  • Each stream processed data is stored in MySQL database with timestamp.
  • The continuous update on to the dB is queried and visualized on to live dashboard built using Plotly Dash.

DataSource

A subset of Outbrain Click Prediction Kaggle dataset

RepoStructure

Just-Ad-D/
├── dash_frontend
│   └── app.py
├── kafka_clicks.sh
├── kafka_ingestion
│   ├── ccloud_lib.py
│   ├── producer.py
│   ├── pvproducer.py
├── kafka_pv.sh
├── LICENSE
├── README.md
├── sparkjob_clicks.sh
├── sparkjob_pv.sh
├── spark_processing
│   ├── kafspar2.py
│   └── pvkafspar2.py

just-ad-d's People

Contributors

chaitanyaa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.