Comments (36)
@Subzidion It's certainly possible.
Is there anyone else interested in this type of programmatically-readable schema definition for GTFS?
from transit.
This Data Package Specification is exactly what I was looking for. Is there any way we can make this specification a part of the main GTFS package? I would think defining GTFS in terms of the JSON schema would help clarify ambiguity instead of attempting to dfeine the JSON schema from markdown.
from transit.
I am new to this community, so there are likely many nuances that I am missing. However I have some questions and comments about this issue. For some background I am working, in collaboration with CALTRANS, on an application to implement software to support V1 of the GTFS "Grading Standard."
A canonical, machine-readable version of the standard would be most helpful. I would like to be able to abstract away as much of the standard as possible from this application, so I don't need to make updates to this application each time the standard changes. Having a machine readable version of the standard, with at least an agreed upon format\structure, defined types, and enumerations, would be central to this goal.
So I have a couple of questions:
- Generally, is the version of the schema in the feature/json-schema branch suitable to begin development from? Or are their potential issues with it I should be aware of?
- If I were to start developing from the JSON-Schema file, would it be maintained in the future?
- How much might this file change in the future?
I also wanted to voice my support for developing a canonical JSON-Schema for the following reasons: it has already been developed (assuming the draft is up to date and there aren't any significant issues with it), it is the most widely used of the proposed standards, and to my knowledge it supports most of objectives that have been mentioned so far. However as I said there are likely many nuances that I am missing.
More importantly, I wanted to express that not having a standard machine readable standard creates a significant issue right at the beginning of any new development effort: How do I model the standard and how do I keep that model up to date as it evolves? Providing a schema file would help alleviate many of these issues and free up developer time for other work.
from transit.
As a new community member, I have no context or background to weigh the pros or cons of JSON-Schema vs Frictionless.
My concern with the JSON-Schema is that we'd be introducing an entirely new encoding-specific concept to GTFS that doesn't currently exist there. I think it would also tempt some producers and consumers to "JSON-ize" GTFS data, and I see that further complicating an already complex ecosystem.
Frictionless Table Schema format was designed to represent tabular data, which is the current representation/encoding of static GTFS data (CSV files in a ZIP file). IMHO it seems a better fit to the existing GTFS spec, unless there is a limitation that that I don't know of.
Why not just use a SQL database definition file? This can include unique
constraints, foreign key constraints, enums, and so on.
@jamespfennell Could you give me an example for a table in GTFS?
As @skinkie says there are some situations that won't be easy to model, like service_id
in calendar_dates.txt, which in some cases is a primary key but in others is a foreign key (potentially within the same GTFS dataset, as evidenced by MobilityData/gtfs-validator#397):
https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#calendar_datestxt
from transit.
After talking with folks, I am starting work to update and expand the Frictionless Schema, Stephen created for Queensland. I have forked @barbeau branch and will be working on here: https://github.com/wesleyi23/GTFS-Frictionless.
from transit.
Thank you to @wesleyi23 for creating a fairly complete definition of GTFS here: https://github.com/wesleyi23/GTFS-Frictionless
It would be great if all who are interested could add issues, contribute to, and improve this definition.
I'm also interested in if the community would be amenable to using this type of definition as the canonical GTFS definition such that we can generate Markdown/HTML from the programatic definition in JSON rather than visa-versa.
from transit.
Using GTFSdb for this would still requiring updating the Python code to reflect any changes to the spec, running it to create your database schema, then using some other tool to map from the schema to classes in whatever language you want to use. Feels a bit cumbersome. I understand there's probably some class representation for most languages already created, but needing to check and hope they get updated if the spec gets changed seems annoying. If there was some definition, similar to the Hibernate one, that could be used in any language, it would make any of those language-specific GTFS-Static ORMs a lot easier.
from transit.
Similar to the discussion in #244, I strongly bias towards achieving consensus on the problem statement before proposing a standard.
It sounds like the core objective is to further codify GTFS to meet these criteria (broken into bullets for easier visual parsing):
- Machine-readable instructions that specify
- in a language-agnostic, storage-agnostic manner
- the correct structure, syntax, and relationships
- of GTFS static data
- such that GFTS data can be processed and stored
- in a backwards and forwards compatible manner
- that minimizes the need for implementation updates by individuals.
It would be helpful to know if that's an accurate characterization, or if criterial like human-readable or CSV-specific need to be appended.
from transit.
Really interesting discussion!
Does anyone have any additions/edits to the following problem statement #127 (comment)
I’d like to add something that was in the original issue as well: that it should be easy, once the spec was processed, for a system using the spec to keep in sync with the latest additions to the spec. If an optional field was added for example, I’d want my codebase to create that new class on the next run, or I’d want a JSON schema I defined to add that property.
My own case: I created an RDF/Linked Data vocabulary for GTFS back in 2015. Today it’s horribly out of date, but we just started updating it to the latest spec: OpenTransport/linked-gtfs#20
I wonder whether we should re-iterate the problem scope towards: what programmatic description should we use to make sure everyone can keep up to date their own technology-specific schema they can use in their own technology to import or validate GTFS static files? If this problem would be solved, then we also can have automaric translations towards commonly used schema languages like JSON and XML Schema, SQL, protobuf, RDF/SHACL/ShEx, etc.
The problem is thus not choosing the one schema language to rule them all, but it is choosing the one that will best express the things decided in the GTFS specification, so that it can be automatically translated to all others.
from transit.
@Subzidion There isn't anything official, but the closest thing I'm aware of in concept to what you're looking for is this Data Package specification:
- GTFS Data Package Specification - A Data Package specification with validation accomplished with Good Tables. Includes a data package, schemas, tests, and uses South East Queensland GTFS data as an example.
I started generalizing this to any GTFS:
https://github.com/CUTR-at-USF/GTFS
I think I have some work stashed somewhere beyond what's currently in the above branch...
from transit.
Does anyone have any additions/edits to the following problem statement?
- Machine-readable instructions that specify
- in a language-agnostic, storage-agnostic manner
- that is relatively standardized itself (such that there are existing tools and a potential ecosystem for testing as well as rendering in a "front-end" form)
- is human legible in its native form (to allow for easy git-diffs + increase likelihood of catching errors)
- which articulates the correct structure, syntax, bounds, and relationships
- of GTFS static data
- as well as the file and field descriptions
- such that GFTS data can be processed and stored
- and validated
- in a backwards and forwards compatible manner
from transit.
@e-lo My understanding is that https://github.com/Stephen-Gates/GTFS was created specifically for validating
the South East Queensland GTFS data and was never intended to be a canonical schema for the general GTFS spec. For example, some of the location constraints defined for stop location lat/longs are specific to Queensland.
I started expanding Stephen's work in this branch to represent the entire spec a while back, but other priorities pulled my attention away:
https://github.com/CUTR-at-USF/GTFS/tree/full-spec
You can see my changes in these two commits:
Here were the remaining TODOs I noted in 2016:
- Review TODOs and FIXMEs - some constraints will break extensibility
- Add missing files
If someone would want to pick up this work I'd certainly welcome the contribution.
from transit.
from transit.
@jamespfennell : To my knowledge SQL definitions aren't designed be 'read in' as data other than for SQL – but I would be curious if somebody more familiar with various options could evaluate this option vis-a-vis the 10 points above.
from transit.
@MuckT - @LeoFrachet developed a full JSON Schema for GTFS which is in the PR linked to this issue. See the discussion in that PR and above problem statement for why frictionless seemed to fit the bill btter.
Note that validators are fairly easy to create once the schema is in a parsable format. You can also use goodtables.io to do data validation "as a service" in frictionless' format.
from transit.
Ack! Still relevant.
Wonder if we could use https://linkml.io for this. Seems to do what I described above
from transit.
It is called SQL ;-) and I think it does exists via GTFSdb.
from transit.
GTFSdb is all Python, nothing that's just SQL. I was thinking something more along the lines of what OneBusAway does with Hibernate, but usable in any language.
from transit.
GTFSdb produces SQL tables in different flavors. You could use that in your agnostic definition.
from transit.
Note that there is a proposal and discussion related to GTFS schemas happening at #244.
from transit.
criterial like human-readable
I think human readability is an important consideration for transparency and maintainability since changes to the spec will be represented and discussed within the context of a pull-request and vote.
such that GFTS data can be processed and stored
AND
- validated...adding in the potentiality for conditions beyond Type.
- documented...reducing errors and friction.
from transit.
@wesleyi23 do you have any additions/mods to @devadvance 's summary of the problem statement ?
from transit.
@e-lo and @devadvance the only thing, I would add is that it would be nice if the solution not only included the correct structure, syntax, and relationships of GTFS static data, but also the file and field descriptions. I would make a pitch that these be represented in an HTML format, because there are some order lists, paragraphs, and other similar items. I think this would provide a more or less complete reproduction of the current standard documents.
from transit.
BTW - I saw that @Stephen-Gates started developing a Frictionless data package for GTFS and would be curious why it seems to have been abandoned?
from transit.
@barbeau Awesome and thanks for background. In your opinion is frictionless "the right" spec for achieving the objectives above? my main hesitation is lack of progress/movement recently in the organization.
@wesleyi23 It seems like Sean's repo is a good place to start.
from transit.
@e-lo It looked very promising to me, and the above work was mainly an experiment to see if it panned out. Unfortunately I don't have any experience with frictionless outside of the above so I can't say for sure.
from transit.
@e-lo @barbeau At the moment I have a need for a schema document, so I am happy to put time in to developing one further.
Sean I reviewed your repo and I agree it could be a good place to start. There is also @LeoFrachet JSON-Schema file referenced in #244 which would also make a good starting place. As a new community member, I have no context or background to weigh the pros or cons of JSON-Schema vs Frictionless.
From a technical perspective they both appear to meet the the satisfy the identified problem statement, unless there is something I am missing.
Any guidance on which path to follow would be greatly appreciated.
from transit.
@jamespfennell one reason could be that some constraints are "OR".
from transit.
I found the old version of schemas to not work well with popular schema validators so I've started converting them to JSON schema v7. I've created a simple Nx app that converts .txt or .csv files into JSON objects and then validates them in the browser. My UI abilities are a bit lacking, currently the results are in console logs, but you can see my progress here: MuckT/gtfs-tools
As of writing this I have only rewritten the agency.txt schema; any help in the UI or schema development would be appreciated.
from transit.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from transit.
@github-actions Don't close.
from transit.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from transit.
This issue has been closed due to inactivity. Issues can always be reopened after they have been closed.
from transit.
This is still relevant.
from transit.
Re-opening :)
from transit.
📢 The participants in this conversation might want to look at issue #391 to discuss adding the GeoJSON format in GTFS as part of the GTFS-Flex extension proposal.
from transit.
Related Issues (20)
- Codes for stations and stop points / line codes / line pictograms HOT 5
- Deeplinks now supported via Google Transit
- Best Practice Suggestion: Permalink to GTFS feeds should be on same domain as the transit agency's main website HOT 7
- Required type of transportation at `routes` level may lack flexibility (multi-modal routes) HOT 10
- Modifications to the GTFS Governance: Phasing Plan HOT 22
- Use Entity-Relationship Model as Definitive Reference HOT 3
- [Governance] Phase 1: GTFS Digest Release HOT 1
- [GTFS-Fares v2] Multi-leg Transfer: Same product/media transfer behavior HOT 1
- Migration of Outstanding Best Practices issues and PRs
- Update translations.txt after Fares v2 addition into GTFS
- Why is it recommeded that short term service modifications are excluded from GTFS? HOT 4
- [GTFS-Fares v2] Non-sequential Legs Transfer HOT 2
- stops.zone_id conditional requirement with presence of route-based fare_rules? HOT 3
- Integration of carpooling lines HOT 3
- Clarification on language code data standards used in translations.txt HOT 2
- [Governance] Phase 2: Enhancing Voting and Reviews HOT 14
- Clarifying constraints on pathways.stair_count HOT 3
- Missing functionality to define "conceptual grouping of stops/stations" in existing GTFS HOT 11
- Refinement of GTFS Terminology: Transitioning from "Schedule" to "Static" HOT 9
- Make UTF-8 the mandatory GTFS encoding HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transit.