Git Product home page Git Product logo

telemetry_data_system's Introduction

Environmental Sensor Data Management System

Introduction

The analytics department of a municipality in my country asked me to design and implement a data system capable of storing sensor data generated by sensors that they installed throughout the city to measure environmental metrics. The objective is to provide valuable insights to urban planners to enhance the city's environmental conditions. Additionally, the collected data will be leveraged to develop an application that promptly alerts citizens when measurements exceed recommended thresholds. While the specifics of the implementation cannot be disclosed, this overview outlines the fundamental structure of the system.

Project Description

1. Sample Data Source

To initiate the project, I have chosen an open dataset from Kaggle as the sample data source. This dataset contains environmental sensor data, encompassing metrics such as temperature, humidity, and air quality. The structured format of this dataset mirrors the initial data gathered by the municipality's sensors. The dataset originates from custom-built sensor arrays connected to Raspberry Pi devices. Each device recorded multiple readings from various sensors, including temperature, humidity, carbon monoxide (CO), liquid petroleum gas (LPG), smoke, light, and motion. The data spans a timeframe from 07/12/2020 to 07/19/2020, comprising a total of 405,184 records. The sensor readings were transmitted using the MQTT network protocol.

2. Overall Goal

The primary objective of the data system is to effectively manage and store the environmental sensor data acquired from diverse sensors dispersed across the city. The system's architecture must accommodate substantial data volumes, diverse environmental metrics, and guarantee reliability and scalability. Additionally, the system should be adaptable for migration to larger setups, including cloud-based solutions.

3. Database Solution

1.1.1. Conception Phase

For this project, Apache Cassandra was selected as the preferred database solution. Apache Cassandra is a distributed NoSQL database renowned for handling extensive time-series data efficiently. It boasts scalability, fault-tolerance, and performance capabilities. Cassandra's adaptable data model obviates the need for predefined schemas, enabling seamless integration of future sensor data with varying structures.

4. Justification and Alternatives

The selection of Apache Cassandra is well-justified, given its ability to handle large distributed deployments, ensuring scalability and reliability. Traditional relational databases, such as MySQL or PostgreSQL, may not be as suitable due to challenges associated with scalability and high write loads. Furthermore, Cassandra's dynamic data model alleviates complexities related to schema management, a crucial aspect when dealing with diverse and evolving sensor data.

5. Implementation Plan

  • Develop scripts for database schema setup and configuration.
  • Craft a script to establish a connection with the Cassandra database and load sample data into relevant tables.
  • Set up a Docker container housing Apache Cassandra from Docker Hub.
  • Create a Dockerfile incorporating all necessary steps for automated setup, including Cassandra installation, script execution, and data loading.
  • Establish a GitHub repository for storing code and associated files.
  • Upload Dockerfile and pertinent files to the repository, ensuring straightforward container building and execution across different environments.

Files

This project comprises a variety of files and scripts that collectively lay the foundation for the Environmental Sensor Data Management System. These files are essential for various stages, from setup and data population to smooth system operation. Together, they offer a comprehensive solution for efficiently managing the municipality's environmental data.

  • create_tables.py: This script is responsible for dropping and creating tables. It's intended to reset the tables each time before executing the ETL scripts.
  • etl.py: This script reads and processes data from the telemetry_source_data folder, subsequently loading it into the designated tables.
  • sql_queries.py: Within this file, you'll find all the SQL queries necessary for the project's database interactions. It is imported into the three aforementioned scripts to ensure consistency.
  • requirements.txt: Listed within this file are all the essential dependencies and libraries required for the project.
  • Dockerfile: Contained in this file are the instructions for constructing a Docker image for the project, simplifying the process of containerization and deployment.
  • README.md: You are reading it!!

Extracting and Transforming the Data

The ETL pipeline extracts data from files in the telemetry_source_data directory:

It then transforms and loads the data into the four tables of the iot keyspace. This is handled by four files using Python and SQL:

  • Running create_tables.py creates and initializes the tables for the iot keyspace.
  • sql_queries.py contains all SQL queries and is imported into create_tables.py and etl.py

Want to use the system?

Take the following steps:

  • Clone the repository: git clone repo-link
  • Build the docker image: docker build . -t sensor-cassandra-image
  • Create docker container: docker run -v ./cassandra_data:/usr/src/app -it --name sensor-cassandra-container sensor-cassandra-image:latest /bin/bash
  • You will be taken to the terminal of the docker container. From there you can now run the pipeline

Running the pipeline is as follows:

  • In the container terminal, run python create_tables.py to reset the tables in the iot keyspace.
  • In the container terminal, run python etl.py to process all the datasets.

telemetry_data_system's People

Contributors

edbali avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.