Git Product home page Git Product logo

teamcca's Introduction

Team30 Project LeCloud

Run the cluster ( Playaround with Docker Compose)

Note: Run below commands from the directory where docker-compose.yml file is present.

bring up the cluster in disconnected mode

docker-compose up -d

stop the cluster

docker-compose stop

restart the stopped cluster

docker-compose start

remove containers

docker-compose rm -f

Running Instructions

There are two modes to run the Producer and Consumer routines:

  • Single topic mode
  • Two-topic batch mode

Single topic run is the simple mode where the producer pushes the data into Kafka to one Topic ("aminer1"). While in 2-topic mode, the producer pushes the data alternately, per the batch size set, to two topics ("aminer0" and "aminer1").

Regardless of the run mode, first you must spin up the containers. A. Load docker Images from docker-compose file

docker-compose up
or 
docker-compose up -d

Single Topic Mode

B.i Producer Code: ( make sure file exists: project\kafka\data\aminer_papers_0.txt)

cd project\kafka
python producer.py

C.i Consumer Code: Just before running the consumer, run the producer, so that messages are published to Kafka Queue

  1. Simple Consumer Test: Connect to Spark Master docker and run
python /opt/spark/code/consumer.py
  1. Spark Streaming Consumer:
docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077 /opt/spark/code/consumerSpark.py

Two-Topic Mode

For 2-topic run mode, you must copy the producer_batch.py in the batch_mode folder to the kafka folder. You also need to copy the consumerSpark.py and consumerSpark2.py in the batch_mode folder to the spark/code folder.

B.ii Producer Code: ( make sure file exists: project\kafka\data\aminer_papers_0.txt)

cd project\kafka
python producer_batch.py

C.ii Consumer Code: Just before running the consumer, run the producer, so that messages are published to Kafka Queue.

  1. Open up two separate terminal shells. Now, in the terminal, go to the /spark/code folder.

  2. Run Spark Streaming Consumer 1 in one of the terminal:

docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077  --executor-memory 1g --num-executors 2 --executor-cores 1 --total-executor-cores 2  /opt/spark/code/consumerSpark.py
  1. Run Spark Streaming Consumer 2 in the other terminal:
docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077  --executor-memory 1g --num-executors 2 --executor-cores 1 --total-executor-cores 2  /opt/spark/code/consumerSpark2.py

D. Visualization:

1. Run local http server
```python
   cd project\guide
   python http-server.py
```
This will be running against localhost:8081 port pointing to guide folder
(Check) Try to navigate http://localhost:18001/AMiner.html



2. Connect to Neo4j browser using http://localhost:7474/browser with username: neo4j and password: password
    This will load the above AMiner.html tutorial page by default after connecting
    OR
    run this code in the query window 
    ```
        play: http://localhost:18001/AMiner.html    
    ```

Notes: If you see that above port is being used and not able to launch above url, then you can change the port in project\guide\http-server.py and launch this from neo4j browser with above command ( play: http://localhost:/AMiner.html ). If you want it automatic launch then you need to update docker\db\config\neo4j.conf and restart the container.

Happy Learning Kafka ( Producer, Consumer), Spark-Streaming, Neo4j and binding docker images enables scaling for distributed processing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.