Git Product home page Git Product logo

abqaq_parser's Introduction

AbqAq_Parser

Parser for Albuquerque, N.M., Air Quality data format

Release

abqaq.py version 0.1.0

Abstract

This file: abqaq.py will parse a file downloaded from Albuquerque, N.M.'s web site for air quality readings.

If the file parses correctly, it will write a YAML formatted file to stdout. If there are any problems or the input source does notparse, it will print a message to stderr and exit with a non-zero exit code.

Setup

Note: If running on a Debian-derived system, replace python and pip with python3 and pip3.

This program works best with Python version 3.10. It may work with older versions, but they have not been tested.

Dependencies

  • parsy : Parser cobinator library
  • pyyaml : YAML package for Python

Parsy

This program relies on the 'parsy' Python package.

That package requires Python 3.6 or greater.

PyYaml

YAML support is provided by pyyaml:

Note: This step is optional. If you do not use virtual environment, 'parsy' and 'pyyml' will be installed in your current Python global packages.

  1. Create a virtual environment.
python -m venv .venv
source ./.venv/bin/activate

Note: You can deactivate the virtual environment anytime by:

deactivate
  1. Install the dependancies
pip install -r requirements.txt

Running the parser

  1. Dowload a data file
curl http://data.cabq.gov/airquality/aqindex/history/042222.0017 > data.dat
  1. Run the program
python abqaq.py data.dat

Note: data.dat can also be redirected to stdin. Or you can use curl:

curl http://data.cabq.gov/airquality/aqindex/history/042222.0017 | python abqaq.py -

Note: the format of the output is in YAML. The Python package used is:

  • pyyaml

https://pyyaml.org/wiki/PyYAMLDocumentation

The flow_style is set to always use nested block syntax.

Here is a page that describes the 2 styles (block vs. flow):

Sample output data

IR (intermediate Representation) from first stage parser

data.ir

Transformed dictionary after second phase

data.dct

Final YAML output

data.yml

JSON format

If you pass the 'j' or '--json' flag, you will get the output in JSON format written to stdout.

data.json

Usage

The program abqaq.py takes some optional flags and a possible path to a data file. If you want to have abqaq.py read from a pipe or stdin, pass '-' as the file path.

Flags

  • '-q, --quiet' : Suppresses informational output on stderr like note regarding CrLf line endings.
  • '-j, --json' : Outputs JSON instead of YAML (the default)
  • '-p', '--pretty' : Data is more human friendly. Default is to be consumed by other programs.
  • '-c', '--config' : Command line options are written to ./.abqaq.yml and the program exits.
  • '-h, --help' : Prints the usage and help message and exits without actually doing anything.
  • '-V', '--version' : Prints the version number and exits.

Configuration file

If the current directory contains a file : '.abqaq.yml', then that file is read and processed before any other action takes place. This file is in the YAML format and can be editted by any any text editor. It controls the following options:

  • '-q', '--quiet'
  • '-j', '--json'
  • '-p', '--pretty'

Creating the configuration file

The configuration file: './.abqaq.yml' can be generated by the '--config' option.

# Say we want to always suppress info messages and and write JSON output:
python abqaq.py  --quiet --json --config

Configuration written to .abqaq.yml

Priority of configuration file and supplied options

The following matrix specifies the action of which source of options is in effect.

  • No config file and no options on command line:
    • All options are false and the default serialization format is YAML.
  • No configuration file and command line options given
    • Command line options override the defaults.
  • Configuration file exists and no command line options
    • Options are read from config file
  • Configuration file and command line options given
    • Command line options override corresponding options in config file
    • Command line options not supplied default to values read from config file.

Data Sources

City of Albuquerque, NM, website

Historical air quality data directory

Note: Only 7 files from this directory have been checked. Various changes have been noted and been addressed by making the parser more forgiven. Most of these items have been due to either differences in the format of data section value lists or some garbage data.

E.g. Instead of '0.9923', a single field might have '.9923'.

Grammar

The grammar was reversed engineered from the sample files. This is not ideal due to unforseen changes in the actual data in other files. See note above.

This grammar is the 3rd attempt, at least. The grammar is represented in Extended Backus Naur Format or EBNF. The varient used is the type that uses operators from RegExpLand. E.g. '*', '+', '?' and parens for grouping.

This grammar is extracted from the file: 'abqaq.py'. It employs the strategy capturing all the terminals (that are not commas or line endings) in elements of a list and capturing the nonterminals as lists of lists of lists.

Terminals

  • comma ","
  • el ("\r\n" | "\n")
  • value /[A-Za-z0-9._-/ ]+/
<CSVLine> ::= value (comma value)+

<DataSection> ::= "BEGIN_DATA" (el <CSVLine>)+ el "END_DATA"

<GroupSection> ::= "BEGIN_GROUP" (el <CSVLine>)+ <DataSection> el "END_GROUP"

<FileSection> ::= "BEGIN_FILE" (el <CSVLine>)+ (el <GroupSection>)+ el "END_FILE" el

BNF Playground

This site was used to develop the EBNF used above.

abqaq_parser's People

Contributors

edhowland avatar rwcitek avatar

Watchers

 avatar  avatar

Forkers

rwcitek

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.