Parser for Albuquerque, N.M., Air Quality data format
abqaq.py version 0.1.0
This file: abqaq.py will parse a file downloaded from Albuquerque, N.M.'s web site for air quality readings.
If the file parses correctly, it will write a YAML formatted file to stdout. If there are any problems or the input source does notparse, it will print a message to stderr and exit with a non-zero exit code.
Note: If running on a Debian-derived system, replace python and pip with python3 and pip3.
This program works best with Python version 3.10. It may work with older versions, but they have not been tested.
- parsy : Parser cobinator library
- pyyaml : YAML package for Python
This program relies on the 'parsy' Python package.
That package requires Python 3.6 or greater.
YAML support is provided by pyyaml:
Note: This step is optional. If you do not use virtual environment, 'parsy' and 'pyyml' will be installed in your current Python global packages.
- Create a virtual environment.
python -m venv .venv
source ./.venv/bin/activate
Note: You can deactivate the virtual environment anytime by:
deactivate
- Install the dependancies
pip install -r requirements.txt
- Dowload a data file
curl http://data.cabq.gov/airquality/aqindex/history/042222.0017 > data.dat
- Run the program
python abqaq.py data.dat
Note: data.dat can also be redirected to stdin. Or you can use curl:
curl http://data.cabq.gov/airquality/aqindex/history/042222.0017 | python abqaq.py -
Note: the format of the output is in YAML. The Python package used is:
- pyyaml
https://pyyaml.org/wiki/PyYAMLDocumentation
The flow_style is set to always use nested block syntax.
Here is a page that describes the 2 styles (block vs. flow):
If you pass the 'j' or '--json' flag, you will get the output in JSON format written to stdout.
The program abqaq.py takes some optional flags and a possible path to a data file. If you want to have abqaq.py read from a pipe or stdin, pass '-' as the file path.
- '-q, --quiet' : Suppresses informational output on stderr like note regarding CrLf line endings.
- '-j, --json' : Outputs JSON instead of YAML (the default)
- '-p', '--pretty' : Data is more human friendly. Default is to be consumed by other programs.
- '-c', '--config' : Command line options are written to ./.abqaq.yml and the program exits.
- '-h, --help' : Prints the usage and help message and exits without actually doing anything.
- '-V', '--version' : Prints the version number and exits.
If the current directory contains a file : '.abqaq.yml', then that file is read and processed before any other action takes place. This file is in the YAML format and can be editted by any any text editor. It controls the following options:
- '-q', '--quiet'
- '-j', '--json'
- '-p', '--pretty'
The configuration file: './.abqaq.yml' can be generated by the '--config' option.
# Say we want to always suppress info messages and and write JSON output:
python abqaq.py --quiet --json --config
Configuration written to .abqaq.yml
The following matrix specifies the action of which source of options is in effect.
- No config file and no options on command line:
- All options are false and the default serialization format is YAML.
- No configuration file and command line options given
- Command line options override the defaults.
- Configuration file exists and no command line options
- Options are read from config file
- Configuration file and command line options given
- Command line options override corresponding options in config file
- Command line options not supplied default to values read from config file.
Note: Only 7 files from this directory have been checked. Various changes have been noted and been addressed by making the parser more forgiven. Most of these items have been due to either differences in the format of data section value lists or some garbage data.
E.g. Instead of '0.9923', a single field might have '.9923'.
The grammar was reversed engineered from the sample files. This is not ideal due to unforseen changes in the actual data in other files. See note above.
This grammar is the 3rd attempt, at least. The grammar is represented in Extended Backus Naur Format or EBNF. The varient used is the type that uses operators from RegExpLand. E.g. '*', '+', '?' and parens for grouping.
This grammar is extracted from the file: 'abqaq.py'. It employs the strategy capturing all the terminals (that are not commas or line endings) in elements of a list and capturing the nonterminals as lists of lists of lists.
- comma ","
- el ("\r\n" | "\n")
- value /[A-Za-z0-9._-/ ]+/
<CSVLine> ::= value (comma value)+
<DataSection> ::= "BEGIN_DATA" (el <CSVLine>)+ el "END_DATA"
<GroupSection> ::= "BEGIN_GROUP" (el <CSVLine>)+ <DataSection> el "END_GROUP"
<FileSection> ::= "BEGIN_FILE" (el <CSVLine>)+ (el <GroupSection>)+ el "END_FILE" el
This site was used to develop the EBNF used above.