peopleforbikes / brokenspoke Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 7.57 MB

A collection of tools for the BNA.

Home Page: https://peopleforbikes.github.io/brokenspoke/

License: MIT License

Shell 2.37% Rust 93.39% Python 0.37% JavaScript 2.78% CSS 0.03% Just 1.06%

docs etl python rust

brokenspoke's People

Contributors

Stargazers

Watchers

Forkers

rgreinho lalver1

brokenspoke's Issues

PFB favicon missing

Bug Report

Current Behavior

The documentation site does not use the PFB favicon.

Expected Behavior

As a user I would like to see the PFB favicon rather than the default one.

[pipelines][brochure] Pipeline fails to complete with City Ratings 2022 data

Bug Report

Current Behavior

The pipeline fails while generating the PDFs.

Expected Behavior

The pipeline should complete successfully.
There should be better logging information in case of a crash to provide enough information for a developer to attempt to fix the problem.

Consider adding brokenspoke to SeaORM showcase!

Hey @rgreinho, feel free to submit a PR to showcase your project!

SeaQL/sea-orm#403

[bnacore] Make the bundler module less BNA specific

Feature Request

Current Behavior

The current bundling function is based on the assumption that we are only dealing with files following the BNA naming convention:
<country>-<state>-<city>[-<filename>].<extension>. See in the group_files() function for reference.

Expected Behavior

The bundler spoke proved useful for 2 pipelines so far, and some other ideas could potentially leverage it for other use cases. However this would only be possible if the regex used to lookup for the files was not hard-coded inside the function.

If this change would affect the pipelines, we must make sure that the pipeline are update accordingly.

Update to clap 4

Feature Request

Current Behavior

Clap 3 is being used.

Expected Behavior

As clap is central to the spokes, it makes sense to keep this library current. The migration from 3 to 4 contains breaking changing.

Create a BNA Mechanics glossary

Feature Request

Current Behavior

There are a few terms which are confusing amongst the words we are using within the BNA Mechanics projects. Several terms are using to define the same things or different things.

For instance:

Should a "city rating" or a "scorecard" be used to define the information collected about a specific city?
Does a "scorecard" or a "brochure" define the SVG/PDF file which is generated from the city information?
etc.

Expected Behavior

There should not be any confusion when using BNA vocabulary.

Possible Solution

Identify the confusing words.
Consolidate with their meaning from the BNA program. Since the vocabulary cannot be easily changed there, it should be our source of truth and we should always try to match it.
Adjust the BNA Mechanics vocabulary accordingly.
Add a new "Glossary" page in the documentation site.
Update the code and the comments once the new vocabulary has been determined.

City Rating entries should include the year

Feature Request

Current Behavior

The city rating entries currently do not contain the year. It is only known as it is part of the name of the CSV file.

Expected Behavior

There should be an extra column which contains the year, at least in the shortcode city rating file.

[spokes] Improve error handling

Feature Request

Current Behavior

The current error handling implementation in the spokes are not very advanced, and mostly bubble up the errors to the caller.

Expected Behavior

While bubbling up the errors already helps, it is often not sufficient to provide the information required to debug a problem. See issue #34 for instance.

Possible Solution

The error should be more detailed and provide mot specific context. In the case of issue #34, we should be able to identify which file caused the problem.

[incubator] Implement missing GitHub workflow

Bug Report

Current Behavior

The incubator projects don't have a CI, which is preventing the dependabot PRs to get merged.

Expected Behavior

The dependabot PRs should get merged automatically.

Possible Solution

While the incubator projects don't necessary need a workflow, there should be one implemented with placeholder tasks to make sure the CI does not get stale.

The required tasks are:

build
lint
test

[spokes][bundler] Improve compression logic

Feature Request

Current Behavior

When running the retrieve pipeline, the bundling part is the longest. It takes 100-120 min to complete with the City Ratings 2021 data.

Expected Behavior

While we understand that archiving such an amount of data takes time, it should not take close to 2 hours.

For reference, it takes 45s to compress a single 11GB file:

$ cd /tmp 
$ dd if=/dev/urandom of=11GB.img bs=1 count=0 seek=11G
0+0 records in
0+0 records out
0 bytes transferred in 0.000014 secs (0 bytes/sec)
$ zip 11GB-zipfile 11GB.img
  adding: 11GB.img (deflated 100%)

/tmp ⌛ 45s
$

The comparison it not really fair in the sense that we are creating 700+ zip archives and writing 5 datasets inside each one, but it definitely reinforces the idea that the operation should not takes as much time as it currently does.

Possible Solution

Try other compression libraries:

[pipelines][retrieve] Implement missing GitHub workflow

Bug Report

Current Behavior

There is no workflow implemented for this pipeline.

Expected Behavior

There should be a workflow implemented for this pipeline.

Possible Solution

Copy and adjust pipelines-brochures.yml.

[pipelines] Pipelines should be parametrized

Feature Request

Current Behavior

The parameters are hard-coded in the pipeline file itself.

Expected Behavior

The same pipeline could run with difference input parameters without have to recompile it.

Possible Solution

Create a json or toml file which would contain the context:

{
   "output_directory": "./output"
   "city_rating_file": "city_rating_2021_v15.csv"
}

peopleforbikes / brokenspoke Goto Github PK

brokenspoke's People

Contributors

Stargazers

Watchers

Forkers

brokenspoke's Issues

Bug Report

Current Behavior

Expected Behavior

Bug Report

Current Behavior

Expected Behavior

Feature Request

Current Behavior

Expected Behavior

Feature Request

Current Behavior

Expected Behavior

Feature Request

Current Behavior

Expected Behavior

Possible Solution

Feature Request

Current Behavior

Expected Behavior

Feature Request

Current Behavior

Expected Behavior

Possible Solution

Bug Report

Current Behavior

Expected Behavior

Possible Solution

Feature Request

Current Behavior

Expected Behavior

Possible Solution

Bug Report

Current Behavior

Expected Behavior

Possible Solution

Feature Request

Current Behavior

Expected Behavior

Possible Solution

Recommend Projects

Recommend Topics

Recommend Org