Git Product home page Git Product logo

bigdataapplication's Introduction

About this APP

This is a big data application that stream Covid data from Chicago area
and join it with weather data by manually submit from web form. And the weather data is different from what we used in the class. It includes:
Temperature:Int,Humidity:Int,Rain:Int


In the check-covid.html, you could check the data by entering the date.
In the submit-weather.html, you could submit the latest weather data by date.

Notice that the data will only update once you both submit the weather and also the Covid website has updated their data.

How to run this BigDataApplication:

1. Open the website that has been deployed to LoadBalancer on Amazon:

	submit weather: http://mpcs53014-loadbalancer-217964685.us-east-2.elb.amazonaws.com:3181/submit-weather.html
	query covid data: http://mpcs53014-loadbalancer-217964685.us-east-2.elb.amazonaws.com:3181/check-covid.html

2. Run the .jar file for streaming weather data from form:
	spark-submit --master local[2] --class StreamWeather nian/1_WeatherSpeedLayer/target/uber-SpeedLayer-1.0-SNAPSHOT.jar b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

3. Run the .jar file for streaming covid data from website:
	java -cp nian/2_StreamCovid/target/uber-Speedlayercovid-1.0-SNAPSHOT.jar org.example.RealTimeCovid b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

4. Run the .jar file for reading data from kafka and writing to Hbase:
	spark-submit --master local[2] --class IngestCovid nian/3_IngestCovidWeather/target/uber-IngestCovid-1.0-SNAPSHOT.jar b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

Step by Step instruction(already done):

1.Download CSV data from website and change the name to weather.csv
and COVID.csv:

Weather data:
	https://data.cityofchicago.org/Parks-Recreation/Beach-Weather-Stations-Automated-Sensors/k7hf-8y75

Covid data:
	https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Person/t4hh-4ku9

Weather and covid data are available in this folder.


2. Preprocess csv file using python and pandas(need to put .py in the same directory with .csv file):

Weather data:
	preprocess_weather.py

Covid data:
	preprocess_covid.py

processed data are available in this folder.

3.Copy data from local to cluster(already in cluster: /nian)

	scp -i nian.pem ~/Desktop/BIGDATAFINAL/weather+covid/COVID_processed.csv [email protected]:~/nian

	scp -i nian.pem ~/Desktop/BIGDATAFINAL/weather+covid/weather_processed.csv [email protected]:~/nian


4.Put data to hdfs

	hdfs dfs -put ./nian/weather_processed.csv /tmp/nian/weather_nian
	hdfs dfs -put ./nian/COVID_processed.csv /tmp/nian/covid_nian


5.Go to beeline, create weather and covid table by running hql file: 	 	 create_weather.hql
	create_COVID.hql
	create_covid_weather.hql

6.Go to Hbase, create Hbase by command:

	create 'processed_weather_covid_nian','weather'

7.Create topic for Kafka:

	1.	Go to Kafka executable

		cd kafka_2.12-2.2.1/bin

	2.	Create a topic named  nian_final for stream weather

	./kafka-topics.sh --create --zookeeper z-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181,z-3.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181,z-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181 --replication-factor 2 --partitions 1 --topic nian_final

	3.  Create a topic named nian_final_covid

	./kafka-topics.sh --create --zookeeper z-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181,z-3.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181,z-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:2181 --replication-factor 2 --partitions 1 --topic nian_final_covid

8.Creating Hbase table for streaming weather

	create 'latest_weather_nian','weather'

9.Deploy the webserver folder to LoadBalancer with port 3181

10.Maven install the SpeedLayerWeather and deploy to cluster.
Run it by:

	spark-submit --master local[2] --class StreamWeather nian/1_WeatherSpeedLayer/target/uber-SpeedLayer-1.0-SNAPSHOT.jar b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

11. We can check the speedlayer of weather data by checking the Hbase:

	get 'latest_weather_nian','12/05/2020'

12. Maven install the SpeedLayerCovid and deploy to cluster.
Run it by:

	java -cp nian/2_StreamCovid/target/uber-Speedlayercovid-1.0-SNAPSHOT.jar org.example.RealTimeCovid b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

13. Maven install the IngestCovid and deploy to cluster.
Run it by:

	spark-submit --master local[2] --class IngestCovid nian/3_IngestCovidWeather/target/uber-IngestCovid-1.0-SNAPSHOT.jar b-2.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092,b-1.mpcs53014-kafka.fwx2ly.c4.kafka.us-east-2.amazonaws.com:9092

bigdataapplication's People

Contributors

langzippkk avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.