Twitter has become an important communication channel in the terms of emergency. With the help of smartphones people announce an emergency theyβre observing in real-time. Therefore, more agencies are interested in monitoring Twitter. Itβs not always clear whether the tweets are actually announcing a disaster or not. In this project we build a machine learning model that predicts which tweets are danger and which oneβs arenβt.
List of things we tackled in this project:
presentation β
airflow (prediction job and ingestion job) β
great expectations (implemented in the Ingestion DAG) β
predictions job (uses the API to predict) β
github branches (each one has their own branch) β
documentation β
model as a service β
user interface (filling a form + uploading a file) β
predcitons saved in db β
ingestion job (gets a file checks its content then move it to the prediction_folder to be used by the prediction job) β
monitoring dashboard β
- We host our postgres Database on AWS RDS.
- We used AWS CloudWatch to monitor the Database performance (on of the grafana dashboards is related to that).
- We packaged the whole Backend to have more flexibility in imporing services + classes (like the databse class + Tweet class + our model).
- to run the project kindly run the following first.
pip install -r requirements.txt
then run the below to download our Backend Package:
pip install -i https://test.pypi.org/simple/ back-package-dsp2.
we used Streamlit to create a form to be filled by users, in addition to the option of uploading a file to make several predictions.
we also have a History page to see all the Tweets in our Database.
to run the Sreamlit server use go to this directory src/FrontEnd/streamlit/apps/
the run the following:
streamlit run Streamlit.py
we used Flask to build our APIs and to have the model as a service, we have three APIs that we will explain in details, but now in order to run flask server you need to set the FLASK_APP variable using:
export FLASK_APP='src/Backend_APIs/app.py'.
then you run the server using:
flask run
the three APIs we have are:
it takes a csv file called data_file and it sends it to the model then the model predicts the whether the the user who tweeted is in danger or not. after that, it sends this data to the database to be stored Bulk Storing.
you can reach this API on the route /SubmitFile
.
it handles the form part, which takes the user input and sends it to the backend as JSON
, we send this object to the model, it predicts it, then we store the data with its prediction in the database.
you can reach this API on the route /Submit
.
in case you would like to use postman or other API testing tools,you can use this following JSON
object:
{
"YourEmail": "an email",
"EmeEmail": "an email",
"Location": "New York",
"text": "HELPPPP, I am in DANGER!"
}
we use this API to get all the tweets we have in the databse, we created this mainly to retrieve this data for the history page we have in the frontend.
you can reach this API on the route /getAllTweets
.
- Data setup(load the data, train and test split)
- Main feature preprocessing(text normalization and vectorization)
- Model training and saving the model in the
model.joblib
file - Model evaluation(accurancy, precision, recall and F-score)
- For text normalization and vectorization -
preprocess.py
- For model building and training -
train.py
- For making a prediction -
inference.py