theturnout / court-data-pipeline Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Reminder that the licensor is not The Turnout
Currently, the JSON-LD files that are scraped are saved locally using the url of the site on which they originated. As an example, a JSON hosted on https://www.courts.ca.gov/los-angeles-county.html
would be saved by the scraper as https://www.courts.ca.gov/los-angeles-county.json
. This file name is used during the validation process. Validated files are renamed using string manipulation to extract the just the jurisdiction name (los-angeles-county.json
). This file is then used to import new data into the DB.
For the moment, this is fine, but I am using dev data with a standardized naming scheme. It's unlikely that urls encountered in the wild will be so easily parsed given the lack of standardization among court sites. So at the moment, I can think of two solutions:
{url}.json
.I think (1) is probably the way to go. There was a benefit to using simpler names earlier in development that is lost now that everything is automated. While more information is always good, (2) puts additional burden on the administrators running the script and I think a goal is to make this process as easy as possible.
Regardless of approach, there is one other issue: URLS make bad filenames due to their use of the /
symbol. They get parsed as directories by the file system and throw errors when they are accessed by the script. Is there a standard replacement character or can we choose one? I'm currently replacing /
with .
but that's also a symbol used by the file system and might have unintended consequences.
@jungshadow and @JDziurlaj, I'd appreciate your input.
Do you mind clarifying a bit on how to implement this in this context? Would using a .config or .env file in which the connection string is defined and referencing it in that section of code accomplish this?
As the definitions and SHACL file have to be kept in parity, and because the definitions will be hosted remotely, I need to add logic to download the SHACL file from the same remote source, save it locally, then use it with pyshacl in validator.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.