Docker Image for Cloud Assignment 2 - Wine Prediction System

This repository contains the Docker image for Cloud Assignment 2. You can pull the image from Docker Hub using the following command:

Setup AWS Cluster for Training the Model
- Setup of EMR on AWS
- Connect to EMR Instance
Setup of Prediction Application on EC2 Instance on AWS
Docker Image for Cloud Assignment 2

Setup AWS Cluster for training the model

Setup of EMR on AWS

1. Go to EC2 -> Create Cluster

2. Give cluster name

3. Select Amazon EMR release -> emr-6.15.0

4. Click Cluster Configuration -> Add an instance group to add one more instance to create a 4 cluster group

5. Security configuration and EC2 key pair -> Add Amazon EC2 key pair for SSH to the cluster

6. Select Role

Amazon EMR service role: EMR_DefaultRole
EC2 instance profile for Amazon EMR: EMR_DefaultRole

7. Create Cluster

Once the cluster is created, go to the security of EC2 instances and open port 22 and custom IP Address.

Connect to EMR Instance

1. Connect to SSH Server

ssh -i "CS643-Cloud.pem" [email protected]

2. Installing Apache Spark on Ubuntu

Overview

This guide will walk you through the steps to install Apache Spark on Ubuntu using the standard package manager apt.

Prerequisites

Ubuntu operating system
sudo privileges

Installation Steps

2.1: Update Package List

Ensure that your package list is up-to-date:

sudo apt update

2.2: Install Java Development Kit (JDK)

Apache Spark requires Java. Install OpenJDK 8 or later:

sudo apt install openjdk-8-jdk

2.3: Download Apache Spark

Visit the Apache Spark Downloads page and copy the link to the latest pre-built version. Replace with the actual version number.

wget https://archive.apache.org/dist/spark/spark-<version>/spark-<version>-bin-hadoop2.7.tgz

2.4: Extract Spark Archive

Extract the downloaded archive:

tar -xvzf spark-<version>-bin-hadoop2.7.tgz

2.5: Move Spark to /opt

Move the extracted Spark directory to the /opt directory (you may need sudo):

sudo mv spark-<version>-bin-hadoop2.7 /opt/spark

2.6: Configure Environment Variables

Add Spark's binaries to the PATH and set the SPARK_HOME variable. Open your shell configuration file (e.g., ~/.bashrc or ~/.zshrc) and add the following lines:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_PYTHON=python3

Source the updated configuration:

source ~/.bashrc

2.7: Verify Installation

Run the following command to check if Spark is installed successfully:

spark-shell

2.8 Set Environment Variables

To properly configure your environment for Apache Spark and Hadoop, add the following lines to your shell configuration file. Depending on your shell, this file may be .bash_profile, .zshrc, or another relevant file. Open the file in a text editor and add the following lines:

export SPARK_HOME=/usr/local/opt/apache-spark/libexec
export HADOOP_HOME=/usr/local/opt/hadoop

3. Installing AWS CLI on Ubuntu

3.1. Overview

This guide will walk you through the steps to install the AWS Command Line Interface (AWS CLI) on Ubuntu.

3.2. Prerequisites

Ubuntu operating system
sudo privileges

3.3: Update Package List

Ensure that your package list is up-to-date:

sudo apt update

3.4: Install AWS CLI

Install the AWS CLI using the package manager:

sudo apt install awscli

3.5: Verify Installation

After the installation is complete, you can verify it by checking the AWS CLI version:

aws --version

3.6: Configure AWS CLI

To use AWS CLI, you need to configure it with your AWS credentials. Run the following command and follow the prompts:

aws configure

4. Copy the file from local to EMR Instance

Exit from the EMR instance and use the below command:

scp -i CS643-Cloud.pem ~/Desktop/ProgrammingAssignment2-main/training.py [email protected]:~/trainingModel

Reconnect to the server using the SSH command.

5. Create a Virtual Environment

Navigate to your project folder and create a virtual environment (replace "venv" with your preferred name):

python -m venv venv

6. Activate the Virtual Environment

source venv/bin/activate

7. Install Project Dependencies

pip install -r requirements.txt

8. Execute the command

spark-submit --packages org.apache.hadoop:hadoop-aws:3.2.2 training.py

Setup of Prediction Application on EC2 Instance on AWS

Go to EC2
Click on Launch Instance
Use the steps provided below the image

Once done, you can now connect with the SSH command:

SSH Command

ssh -i your-key.pem ec2-user@your-instance-ip

Install Java

Install AWS-CLI

Configure AWS CLI

Install Hadoop

Install Spark

Setup Python in AWS EC2

Python Environment Setup This document provides instructions on setting up the Python environment for this project.

Install Python Download and install the latest version of Python from python.org.

Install virtualenv If you don't have virtualenv installed, run the following command:

pip install virtualenv

Create a Virtual Environment Navigate to your project folder and create a virtual environment (replace "venv" with your preferred name):

python -m venv venv

Activate the Virtual Environment

source venv/bin/activate

Install Project Dependencies

source venv/bin/activate

Execute the command

spark-submit --packages org.apache.hadoop:hadoop-aws:3.2.2 predict.py

Docker Image for Cloud Assignment 2

This Dockerfile sets up an environment for Cloud Assignment 2, including Python, Java, Spark, and Hadoop.

Build Docker Image

To build the Docker image, navigate to the directory containing the Dockerfile and run:

docker build -t cloud-assignment-2 .

Run Docker Container

After building the image, you can run a Docker container with the following command:

docker run -it cloud-assignment-2

Build Docker Image

To build the Docker image, use the following command in the directory containing your Dockerfile:

docker build -t your-dockerhub-username/cloud-assignment-2:latest .

Login to Docker Hub

Login to Docker Hub

Tag the Image

docker tag your-dockerhub-username/cloud-assignment-2:latest your-dockerhub-username/cloud-assignment-2:version-tag

Replace version-tag with the desired version or tag for your Docker image.

Push the Image

docker push your-dockerhub-username/cloud-assignment-2:version-tag

swapnilshah5889 / cloudassignment2 Goto Github PK

cloudassignment2's Introduction

Docker Image for Cloud Assignment 2 - Wine Prediction System

Table of Contents

Setup AWS Cluster for training the model

Setup of EMR on AWS

1. Go to EC2 -> Create Cluster

2. Give cluster name

3. Select Amazon EMR release -> emr-6.15.0

4. Click Cluster Configuration -> Add an instance group to add one more instance to create a 4 cluster group

5. Security configuration and EC2 key pair -> Add Amazon EC2 key pair for SSH to the cluster

6. Select Role

7. Create Cluster

Connect to EMR Instance

1. Connect to SSH Server

2. Installing Apache Spark on Ubuntu

Overview

Prerequisites

Installation Steps

2.1: Update Package List

2.2: Install Java Development Kit (JDK)

2.3: Download Apache Spark

2.4: Extract Spark Archive

2.5: Move Spark to /opt

2.6: Configure Environment Variables

2.7: Verify Installation

2.8 Set Environment Variables

3. Installing AWS CLI on Ubuntu

3.1. Overview

3.2. Prerequisites

3.3: Update Package List

3.4: Install AWS CLI

3.5: Verify Installation

3.6: Configure AWS CLI

4. Copy the file from local to EMR Instance

5. Create a Virtual Environment

6. Activate the Virtual Environment

7. Install Project Dependencies

8. Execute the command

Setup of Prediction Application on EC2 Instance on AWS

SSH Command

Install Java

Install AWS-CLI

Configure AWS CLI

Install Hadoop

Install Spark

Setup Python in AWS EC2

Docker Image for Cloud Assignment 2

Build Docker Image

Run Docker Container

Build Docker Image

Login to Docker Hub

Tag the Image

Push the Image

cloudassignment2's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org