This repository contains a simple Apache Airflow DAG for fetching weather data from the OpenWeather API and storing it in AWS S3.
-
Enable Windows Subsystem for Linux (WSL) and Virtualization:
- Go to
Control Panel/Program and Feature
and enable these features.
- Go to
-
Install Ubuntu from Microsoft Store:
- Set a username and password during installation.
-
Set Up Ubuntu:
- Open Ubuntu using
wsl
command or directly from the start menu. - Run the following commands:
sudo apt update sudo apt install python3-pip sudo apt install python3.10-venv python3 -m venv airflow-venv
- Activate the virtual environment:
- For Linux:
source airflow-venv/bin/activate
- For Windows:
.\airflow-venv\Scripts\activate
- For Linux:
- Install required Python packages:
pip install pandas s3fs apache-airflow
- Open Ubuntu using
-
Start Airflow:
airflow standalone
- Open a new tab and go to
Public IPv4 DNS:8080
. - Navigate to Admin > Connections and add a new connection with the following details:
http_conn_id
HTTP
- Your OpenWeather API endpoint
- Open a new tab and go to
-
Configure Airflow:
- Create a
dags
directory in Airflow. - Create or import the
weather_dag.py
file provided in this repository. - Edit
airflow.cfg
to set the DAG directory.
- Create a
-
AWS Setup:
- Install AWS CLI:
sudo apt install awscli
- Set up security credentials:
Follow the prompts and enter your AWS access key ID, secret access key, and region.
aws configure
- Run the following command to get a session token:
aws sts get-session-token
- Install AWS CLI:
The DAG code weather_dag.py
included in this repository fetches weather data from the OpenWeather API, transforms it, and loads it into AWS S3.
For any questions or concerns, please contact [email protected].