Git Product home page Git Product logo

Comments (36)

barbeau avatar barbeau commented on May 3, 2024 5

@Subzidion It's certainly possible.

Is there anyone else interested in this type of programmatically-readable schema definition for GTFS?

from transit.

Subzidion avatar Subzidion commented on May 3, 2024 3

This Data Package Specification is exactly what I was looking for. Is there any way we can make this specification a part of the main GTFS package? I would think defining GTFS in terms of the JSON schema would help clarify ambiguity instead of attempting to dfeine the JSON schema from markdown.

from transit.

wesleyi23 avatar wesleyi23 commented on May 3, 2024 3

I am new to this community, so there are likely many nuances that I am missing. However I have some questions and comments about this issue. For some background I am working, in collaboration with CALTRANS, on an application to implement software to support V1 of the GTFS "Grading Standard."

A canonical, machine-readable version of the standard would be most helpful. I would like to be able to abstract away as much of the standard as possible from this application, so I don't need to make updates to this application each time the standard changes. Having a machine readable version of the standard, with at least an agreed upon format\structure, defined types, and enumerations, would be central to this goal.

So I have a couple of questions:

  • Generally, is the version of the schema in the feature/json-schema branch suitable to begin development from? Or are their potential issues with it I should be aware of?
  • If I were to start developing from the JSON-Schema file, would it be maintained in the future?
  • How much might this file change in the future?

I also wanted to voice my support for developing a canonical JSON-Schema for the following reasons: it has already been developed (assuming the draft is up to date and there aren't any significant issues with it), it is the most widely used of the proposed standards, and to my knowledge it supports most of objectives that have been mentioned so far. However as I said there are likely many nuances that I am missing.

More importantly, I wanted to express that not having a standard machine readable standard creates a significant issue right at the beginning of any new development effort: How do I model the standard and how do I keep that model up to date as it evolves? Providing a schema file would help alleviate many of these issues and free up developer time for other work.

from transit.

barbeau avatar barbeau commented on May 3, 2024 3

As a new community member, I have no context or background to weigh the pros or cons of JSON-Schema vs Frictionless.

My concern with the JSON-Schema is that we'd be introducing an entirely new encoding-specific concept to GTFS that doesn't currently exist there. I think it would also tempt some producers and consumers to "JSON-ize" GTFS data, and I see that further complicating an already complex ecosystem.

Frictionless Table Schema format was designed to represent tabular data, which is the current representation/encoding of static GTFS data (CSV files in a ZIP file). IMHO it seems a better fit to the existing GTFS spec, unless there is a limitation that that I don't know of.

Why not just use a SQL database definition file? This can include unique
constraints, foreign key constraints, enums, and so on.

@jamespfennell Could you give me an example for a table in GTFS?

As @skinkie says there are some situations that won't be easy to model, like service_id in calendar_dates.txt, which in some cases is a primary key but in others is a foreign key (potentially within the same GTFS dataset, as evidenced by MobilityData/gtfs-validator#397):
https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#calendar_datestxt

from transit.

wesleyi23 avatar wesleyi23 commented on May 3, 2024 3

After talking with folks, I am starting work to update and expand the Frictionless Schema, Stephen created for Queensland. I have forked @barbeau branch and will be working on here: https://github.com/wesleyi23/GTFS-Frictionless.

from transit.

e-lo avatar e-lo commented on May 3, 2024 3

Thank you to @wesleyi23 for creating a fairly complete definition of GTFS here: https://github.com/wesleyi23/GTFS-Frictionless

It would be great if all who are interested could add issues, contribute to, and improve this definition.

I'm also interested in if the community would be amenable to using this type of definition as the canonical GTFS definition such that we can generate Markdown/HTML from the programatic definition in JSON rather than visa-versa.

from transit.

Subzidion avatar Subzidion commented on May 3, 2024 2

Using GTFSdb for this would still requiring updating the Python code to reflect any changes to the spec, running it to create your database schema, then using some other tool to map from the schema to classes in whatever language you want to use. Feels a bit cumbersome. I understand there's probably some class representation for most languages already created, but needing to check and hope they get updated if the spec gets changed seems annoying. If there was some definition, similar to the Hibernate one, that could be used in any language, it would make any of those language-specific GTFS-Static ORMs a lot easier.

from transit.

devadvance avatar devadvance commented on May 3, 2024 2

Similar to the discussion in #244, I strongly bias towards achieving consensus on the problem statement before proposing a standard.

It sounds like the core objective is to further codify GTFS to meet these criteria (broken into bullets for easier visual parsing):

  • Machine-readable instructions that specify
  • in a language-agnostic, storage-agnostic manner
  • the correct structure, syntax, and relationships
  • of GTFS static data
  • such that GFTS data can be processed and stored
  • in a backwards and forwards compatible manner
  • that minimizes the need for implementation updates by individuals.

It would be helpful to know if that's an accurate characterization, or if criterial like human-readable or CSV-specific need to be appended.

from transit.

pietercolpaert avatar pietercolpaert commented on May 3, 2024 2

Really interesting discussion!

Does anyone have any additions/edits to the following problem statement #127 (comment)

I’d like to add something that was in the original issue as well: that it should be easy, once the spec was processed, for a system using the spec to keep in sync with the latest additions to the spec. If an optional field was added for example, I’d want my codebase to create that new class on the next run, or I’d want a JSON schema I defined to add that property.

My own case: I created an RDF/Linked Data vocabulary for GTFS back in 2015. Today it’s horribly out of date, but we just started updating it to the latest spec: OpenTransport/linked-gtfs#20

I wonder whether we should re-iterate the problem scope towards: what programmatic description should we use to make sure everyone can keep up to date their own technology-specific schema they can use in their own technology to import or validate GTFS static files? If this problem would be solved, then we also can have automaric translations towards commonly used schema languages like JSON and XML Schema, SQL, protobuf, RDF/SHACL/ShEx, etc.

The problem is thus not choosing the one schema language to rule them all, but it is choosing the one that will best express the things decided in the GTFS specification, so that it can be automatically translated to all others.

from transit.

barbeau avatar barbeau commented on May 3, 2024 1

@Subzidion There isn't anything official, but the closest thing I'm aware of in concept to what you're looking for is this Data Package specification:

I started generalizing this to any GTFS:
https://github.com/CUTR-at-USF/GTFS

I think I have some work stashed somewhere beyond what's currently in the above branch...

from transit.

e-lo avatar e-lo commented on May 3, 2024 1

Does anyone have any additions/edits to the following problem statement?

  1. Machine-readable instructions that specify
  2. in a language-agnostic, storage-agnostic manner
  3. that is relatively standardized itself (such that there are existing tools and a potential ecosystem for testing as well as rendering in a "front-end" form)
  4. is human legible in its native form (to allow for easy git-diffs + increase likelihood of catching errors)
  5. which articulates the correct structure, syntax, bounds, and relationships
  6. of GTFS static data
  7. as well as the file and field descriptions
  8. such that GFTS data can be processed and stored
  9. and validated
  10. in a backwards and forwards compatible manner

from transit.

barbeau avatar barbeau commented on May 3, 2024 1

@e-lo My understanding is that https://github.com/Stephen-Gates/GTFS was created specifically for validating
the South East Queensland GTFS data and was never intended to be a canonical schema for the general GTFS spec. For example, some of the location constraints defined for stop location lat/longs are specific to Queensland.

I started expanding Stephen's work in this branch to represent the entire spec a while back, but other priorities pulled my attention away:
https://github.com/CUTR-at-USF/GTFS/tree/full-spec

You can see my changes in these two commits:

Here were the remaining TODOs I noted in 2016:

  • Review TODOs and FIXMEs - some constraints will break extensibility
  • Add missing files

If someone would want to pick up this work I'd certainly welcome the contribution.

from transit.

jamespfennell avatar jamespfennell commented on May 3, 2024 1

from transit.

e-lo avatar e-lo commented on May 3, 2024 1

@jamespfennell : To my knowledge SQL definitions aren't designed be 'read in' as data other than for SQL – but I would be curious if somebody more familiar with various options could evaluate this option vis-a-vis the 10 points above.

from transit.

e-lo avatar e-lo commented on May 3, 2024 1

@MuckT - @LeoFrachet developed a full JSON Schema for GTFS which is in the PR linked to this issue. See the discussion in that PR and above problem statement for why frictionless seemed to fit the bill btter.

Note that validators are fairly easy to create once the schema is in a parsable format. You can also use goodtables.io to do data validation "as a service" in frictionless' format.

from transit.

pietercolpaert avatar pietercolpaert commented on May 3, 2024 1

Ack! Still relevant.

Wonder if we could use https://linkml.io for this. Seems to do what I described above

from transit.

skinkie avatar skinkie commented on May 3, 2024

It is called SQL ;-) and I think it does exists via GTFSdb.

from transit.

Subzidion avatar Subzidion commented on May 3, 2024

GTFSdb is all Python, nothing that's just SQL. I was thinking something more along the lines of what OneBusAway does with Hibernate, but usable in any language.

from transit.

skinkie avatar skinkie commented on May 3, 2024

GTFSdb produces SQL tables in different flavors. You could use that in your agnostic definition.

from transit.

barbeau avatar barbeau commented on May 3, 2024

Note that there is a proposal and discussion related to GTFS schemas happening at #244.

from transit.

e-lo avatar e-lo commented on May 3, 2024

@devadvance:

criterial like human-readable

I think human readability is an important consideration for transparency and maintainability since changes to the spec will be represented and discussed within the context of a pull-request and vote.

such that GFTS data can be processed and stored

AND

  • validated...adding in the potentiality for conditions beyond Type.
  • documented...reducing errors and friction.

from transit.

e-lo avatar e-lo commented on May 3, 2024

@wesleyi23 do you have any additions/mods to @devadvance 's summary of the problem statement ?

from transit.

wesleyi23 avatar wesleyi23 commented on May 3, 2024

@e-lo and @devadvance the only thing, I would add is that it would be nice if the solution not only included the correct structure, syntax, and relationships of GTFS static data, but also the file and field descriptions. I would make a pitch that these be represented in an HTML format, because there are some order lists, paragraphs, and other similar items. I think this would provide a more or less complete reproduction of the current standard documents.

from transit.

e-lo avatar e-lo commented on May 3, 2024

BTW - I saw that @Stephen-Gates started developing a Frictionless data package for GTFS and would be curious why it seems to have been abandoned?

from transit.

e-lo avatar e-lo commented on May 3, 2024

@barbeau Awesome and thanks for background. In your opinion is frictionless "the right" spec for achieving the objectives above? my main hesitation is lack of progress/movement recently in the organization.

@wesleyi23 It seems like Sean's repo is a good place to start.

from transit.

barbeau avatar barbeau commented on May 3, 2024

@e-lo It looked very promising to me, and the above work was mainly an experiment to see if it panned out. Unfortunately I don't have any experience with frictionless outside of the above so I can't say for sure.

from transit.

wesleyi23 avatar wesleyi23 commented on May 3, 2024

@e-lo @barbeau At the moment I have a need for a schema document, so I am happy to put time in to developing one further.

Sean I reviewed your repo and I agree it could be a good place to start. There is also @LeoFrachet JSON-Schema file referenced in #244 which would also make a good starting place. As a new community member, I have no context or background to weigh the pros or cons of JSON-Schema vs Frictionless.

From a technical perspective they both appear to meet the the satisfy the identified problem statement, unless there is something I am missing.

Any guidance on which path to follow would be greatly appreciated.

from transit.

skinkie avatar skinkie commented on May 3, 2024

@jamespfennell one reason could be that some constraints are "OR".

from transit.

MuckT avatar MuckT commented on May 3, 2024

I found the old version of schemas to not work well with popular schema validators so I've started converting them to JSON schema v7. I've created a simple Nx app that converts .txt or .csv files into JSON objects and then validates them in the browser. My UI abilities are a bit lacking, currently the results are in console logs, but you can see my progress here: MuckT/gtfs-tools

As of writing this I have only rewritten the agency.txt schema; any help in the UI or schema development would be appreciated.

from transit.

github-actions avatar github-actions commented on May 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from transit.

derhuerst avatar derhuerst commented on May 3, 2024

@github-actions Don't close.

from transit.

github-actions avatar github-actions commented on May 3, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from transit.

github-actions avatar github-actions commented on May 3, 2024

This issue has been closed due to inactivity. Issues can always be reopened after they have been closed.

from transit.

derhuerst avatar derhuerst commented on May 3, 2024

This is still relevant.

from transit.

isabelle-dr avatar isabelle-dr commented on May 3, 2024

Re-opening :)

from transit.

eliasmbd avatar eliasmbd commented on May 3, 2024

📢 The participants in this conversation might want to look at issue #391 to discuss adding the GeoJSON format in GTFS as part of the GTFS-Flex extension proposal.

from transit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.