Git Product home page Git Product logo

openmrs-etl's Introduction

Openmrs Real-time Streaming Topology

[DEPRECATED] Note that this repo has been moved to https://github.com/kimaina/openmrs-elt alt text

  • The motivation of this project is to provide ability of processing data in real-time from various sources like openmrs, eid, e.t.c

Requirements

Make sure you have the latest docker and docker compose

  1. Install Docker.
  2. Install Docker-compose.
  3. Clone this repository

Getting started

You will only have to run only 3 commands to get the entire cluster running. Open up your terminal and run these commands:

# this will install  5 containers (mysql, kafka, connect (dbz), openmrs, zookeeper, portainer and cAdvisor)
# cd /media/sf_akimaina/openmrs-etl
export DEBEZIUM_VERSION=0.8
docker-compose -f docker-compose.yaml up

# Start MySQL connector (VERY IMPORTANT)
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-mysql.json


# Realtime streaming and processing
Please use either spark(scala)/pyspark/ksql. For this project I'll demo using ksql

In order to avoid crashing of containers i.e code 137, please increase memory size and cpu of your docker VM to > 8gb and >4 cores as shown in the figure below

alt text

If everything runs as expected, expect to see all these containers running:

alt text

You can access this here: http://localhost:9000

Openmrs

Openmrs Application will be eventually accessible on http://localhost:8080/openmrs. Credentials on shipped demo data:

  • Username: admin
  • Password: Admin123

Example Batch using Jupyter Notebook (Spark Standalone Mode)

conda install pyspark=2.4.5

jupyter notebook encounter_job.ipynb 

Spark Master and Worker Nodes

alt text

Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md

for spark on kubernetes deployment: https://github.com/big-data-europe/docker-spark/blob/master/README.md

Docker Container Manager: Portainer

http://localhost:9000

MySQL client

docker-compose -f docker-compose.yaml exec mysql bash -c 'mysql -u $MYSQL_USER -p$MYSQL_PASSWORD inventory'

Schema Changes Topic

docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh     --bootstrap-server kafka:9092     --from-beginning     --property print.key=true     --topic schema-changes.openmrs

How to Verify MySQL connector (Debezium)

curl -H "Accept:application/json" localhost:8083/connectors/

Shut down the cluster

docker-compose -f docker-compose.yaml down

Debezium Topics

alt text

Consume messages from a Debezium topic [obs,encounter,person, e.t.c]

  • All you have to do is change the topic to --topic dbserver1.openmrs.
   docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh \
    --bootstrap-server kafka:9092 \
    --from-beginning \
    --property print.key=true \
    --topic dbserver1.openmrs.obs

Consume messages using KSQL

alt text

Start KSQL CLI

  docker run --network openmrs-etl_default --rm --interactive --tty \
      confluentinc/cp-ksql-cli:5.2.2 \
      http://ksql-server:8088

After running the above command, a KSQL CLI will be presented interactively

Run KSQL Streaming SQL

You can call any KSQL streaming sql command as highlighted here https://docs.confluent.io/current/ksql/docs/tutorials/index.html Here are a few examples:

  SHOW TOPICS;

alt text

For more KSQL streaming command please visit https://docs.confluent.io/current/ksql

Cluster Design Architecture

  • This section attempts to explain how the clusters work by breaking everything down

  • Everything here has been dockerized so you don't need to do these steps

Directory Structure

project
│   README.md 
│   kafka.md  
│   debezium.md
│   spark.md
│   docker-compose.yaml
│   
│
template
│   │   java
│   │   python
│   │   scala
│   └───subfolder1
│       │   file111.txt
│       │   file112.txt
│       │   ...

Writing batch/streaming jobs

Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md

openmrs-etl's People

Contributors

kimaina avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

openmrs-etl's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.