Git Product home page Git Product logo

ventilation-for-hospitality's Introduction

Ventilation for Hospitality

This is part of the Sheffield City Council Ventilation for Hospitality project. This is a data pipeline from Datacake to Research Storage area.

See also the Vent page on the ITS Wiki.

Data pipeline

This code is a Python executable module that execute a data pipeline (defined in vent/workflow.py) and runs on a regular schedule (defined by systemd/vent.timer) and has the following steps:

  • Retrieve sensor metadata from Datacake in JSON format
  • Download raw data (historical sensor data) from Datacake in JSON format
  • Transform (clean) the raw data

The data and metadata are saved to a Standard Research Storage area.

The data source is a GraphQL API. Queries in the GraphQL language are defined in vent/templates/*.j2 as Jinja templates, which allow for variables to be inserted.

Installation

This code is designed to be installed on an IT Services Virtual Server.

Make sure the operating system (OS) packages are up-to-date.

sudo apt update
sudo apt upgrade

Clone the code repository.

Run the installation script as a superuser:

sudo sh install.sh

Install the environment file, which contains options specific to each deployment. There is an example file in this repository called example.env. It should have strict file permissions to prevent unauthorised access.

sudo vi /home/vent/.env

Enable the systemd timer (this runs the data pipeline on a regular schedule)

sudo systemctl enable vent.timer

Do not enable the service unit vent.service because that would mean to start the service at boot time (independent of any timer settings).

To test that it's installed correctly, see the monitoring commands below, and run these commands:

# Check installed Python version
/opt/vent/venv/bin/python --version

# Check the pipeline is installed
/opt/vent/venv/bin/python -m vent --help

Configuration

The main way to configure the pipeline is to set the environment variables in the file /home/vent/.env.

The following options are available:

  • WORKSPACE_ID is the Datacake workspace identifier (it looks like a UUID)
  • FIELDS is a JSON array of strings for the names of the fields on Datacake to be downloaded
  • ROOT_DIR is the directory where data should be saved
  • FREQ is the time resolution for the clean data. Timestamps will be rounded down to the nearest x minutes as determined by this variable.
  • TEMPLATE_DIR is the directory containing Jinja templates for the queries to run against the GraphQL API.
  • The Datacake API access token maybe specified in one of several ways:
    • DATACAKE_TOKEN is the access token (keep this secure)
    • DATACAKE_TOKEN_FILE is the path of a text file containing the secret (again, keep it secret, keep it safe)
  • LOGLEVEL determines the verbosity of the messages sent to the standard output and may have one of the Python logging levels such as WARNING or INFO.
  • GRAPHQL_URL is the URL of the web-based GraphQL API.

Usage

View the service status:

sudo systemctl status vent.timer
sudo systemctl status vent.service

View the logs:

sudo journalctl -u vent.service --since "1 hour ago"

To log in as the server user in a new shell:

sudo su - vent --shell /bin/bash

Development

Run the pipeline:

python -m vent --help

Maintenance

The following steps should be performed on a regular schedule to keep the system up-to-date and secure.

  • Ensure OS packages are up to date using the APT package manager:
    • sudo apt update
    • sudo apt upgrade
  • Update Python packages:
    • Run security scan using Safety: /opt/vent/venv/bin/safety check
    • Install any minor version upgrades: sudo /opt/vent/venv/bin/pip install --upgrade -r ./requirements.txt
    • Check for out-of-date Python packages using the Python Package Installer (PIP) list command to find outdated packages: /opt/vent/venv/bin/pip list --outdated
      • Upgrade these packages (you should test any major version updates in a development environment before installing them on the production environment): sudo /opt/vent/venv/bin/pip install <package> --upgrade.
    • Check storage space using df --human-readable and ncdu
      • Some ways to clear storage space:
        • Delete old system logs: sudo journalctl --vacuum-size=500M
        • Clean up APT caches:
          • sudo apt autoclean
          • sudo apt autoremove

ventilation-for-hospitality's People

Contributors

joe-heffer-shef avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.