Git Product home page Git Product logo

kafka_wikipedia_spark_processing's Introduction

Kafka + Spark Structured Streaming + ElasticSearch (Kibana)

This project modifies Confluent WIKIPEDIA repo, changing the streamming processing engine from KSQL and Kafka Streams to Spark Structured Streaming.

Old Architecture
Old Architecture using KStreams and K-SQL to process data. Image from Confluent Repo
New Architecture
New Architecure using PySpark to process data (the ElasticSearch and Kibana still exists if you wanna use them)

Overview

I was studying about Kafka using videos and tutorials from Confluent, eventually, when digging around their tutorials and videos I've got into the Wikipedia repository. 

The Wikipedia repository lifts a Docker Confluent Kafka Infrastructure (as the old architecture image). It has Kafka Connect sending Wikipedia data to topics, and real-time processing done by K-SQL and a KStreams App, and lastly the transformed data is sent to other topics which are consumed by ElasticSearch Sink Connector.

When I looked at this I thought "What if I transform all this data using PySpark Structured Streamming instead of KStreams and K-SQL? This seems a nice home project to do!!"… and that's what this repository is about.

UIs:

After you start the Docker Containers with ./scripts/start.sh some UIs will be available:

  • localhost:9091 : Confluent Central (user: superUser, pwd: superUser);
  • localhost:5601 : Kibana Dashboards;
  • localhost:4040 : Spark Server Web UI (BotApp);
  • localhost:4041 : Spark Server Web UI (NoBotApp);
  • localhost:4042 : Spark Server Web UI (DomainCountApp);
  • localhost:4043 : Spark Server Web UI (CountGT1App).

kafka_wikipedia_spark_processing's People

Contributors

confluentjenkins avatar ybyzek avatar andrewegel avatar rspurgeon avatar chuck-confluent avatar joel-hamill avatar maxzheng avatar xiangxin72 avatar javabrett avatar jimgalasyn avatar vplentz avatar xli1996 avatar kelvinl3 avatar dnozay avatar sdandu-gh avatar awalther28 avatar elismaga avatar gracechensd avatar rmoff avatar jeqo avatar tobydrake7 avatar vdesabou avatar cjmatta avatar robcowart avatar jssnipes avatar imcdo avatar framiere avatar theturtle32 avatar johnarok avatar cyan-confluent avatar

Stargazers

Welbert Hime avatar

Forkers

chaithanya0808

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.