peopleforbikes / brokenspoke Goto Github PK
View Code? Open in Web Editor NEWA collection of tools for the BNA.
Home Page: https://peopleforbikes.github.io/brokenspoke/
License: MIT License
A collection of tools for the BNA.
Home Page: https://peopleforbikes.github.io/brokenspoke/
License: MIT License
The documentation site does not use the PFB favicon.
As a user I would like to see the PFB favicon rather than the default one.
Hey @rgreinho, feel free to submit a PR to showcase your project!
The current bundling function is based on the assumption that we are only dealing with files following the BNA naming convention:
<country>-<state>-<city>[-<filename>].<extension>
. See in the group_files() function for reference.
The bundler spoke proved useful for 2 pipelines so far, and some other ideas could potentially leverage it for other use cases. However this would only be possible if the regex used to lookup for the files was not hard-coded inside the function.
If this change would affect the pipelines, we must make sure that the pipeline are update accordingly.
Clap 3 is being used.
As clap is central to the spokes, it makes sense to keep this library current. The migration from 3 to 4 contains breaking changing.
There are a few terms which are confusing amongst the words we are using within the BNA Mechanics projects. Several terms are using to define the same things or different things.
For instance:
There should not be any confusion when using BNA vocabulary.
The city rating entries currently do not contain the year. It is only known as it is part of the name of the CSV file.
There should be an extra column which contains the year, at least in the shortcode city rating file.
The current error handling implementation in the spokes are not very advanced, and mostly bubble up the errors to the caller.
While bubbling up the errors already helps, it is often not sufficient to provide the information required to debug a problem. See issue #34 for instance.
The error should be more detailed and provide mot specific context. In the case of issue #34, we should be able to identify which file caused the problem.
The incubator projects don't have a CI, which is preventing the dependabot PRs to get merged.
The dependabot PRs should get merged automatically.
While the incubator projects don't necessary need a workflow, there should be one implemented with placeholder tasks to make sure the CI does not get stale.
The required tasks are:
When running the retrieve
pipeline, the bundling part is the longest. It takes 100-120 min to complete with the City Ratings 2021 data.
While we understand that archiving such an amount of data takes time, it should not take close to 2 hours.
For reference, it takes 45s to compress a single 11GB file:
$ cd /tmp
$ dd if=/dev/urandom of=11GB.img bs=1 count=0 seek=11G
0+0 records in
0+0 records out
0 bytes transferred in 0.000014 secs (0 bytes/sec)
$ zip 11GB-zipfile 11GB.img
adding: 11GB.img (deflated 100%)
/tmp โ 45s
$
The comparison it not really fair in the sense that we are creating 700+ zip archives and writing 5 datasets inside each one, but it definitely reinforces the idea that the operation should not takes as much time as it currently does.
Try other compression libraries:
There is no workflow implemented for this pipeline.
There should be a workflow implemented for this pipeline.
Copy and adjust pipelines-brochures.yml
.
The parameters are hard-coded in the pipeline file itself.
The same pipeline could run with difference input parameters without have to recompile it.
{
"output_directory": "./output"
"city_rating_file": "city_rating_2021_v15.csv"
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.