Git Product home page Git Product logo

Comments (5)

derhuerst avatar derhuerst commented on September 23, 2024 1

I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.

Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.

This is very similar to what I've been doing with match-gtfs-rt-to-gtfs: It tries to match data from a HAFAS API (e.g. the DB one) to a GTFS dataset by matching their stop/trip/route IDs/names/locations.

Over time, I've invested quite a lot of effort to make the matching logic fast and flexible enough. For example, it can match a HAFAS stop with a GTFS stop even when they don't share an ID (IBNR), have slightly different names, and slightly different geolocations.

Unfortunately, the code has many indirections and isn't well-documented. Also, it's been a while since I've tested it with the DB HAFAS endpoint. But if you're interested, take a look!

do we form a new "proprietary" ID that "masks" the underlying DB/SNCF IDs?
This will be done w/ a proprietary combination of some proprietary prefixes and the API's original ID.

You might also want to look into Multiformats as a generalized and future-proof mechanism for "combining IDs".

[…] how do we make sure the UX is not confusing. […] How do we make sure users can find the train/trip they're looking for if they're used to a very specific naming scheme (e.g. "RE 1" vs "RE 73793", "TGV INOUI 123" vs "TGV 123")?
We need to keep track of which APIs should be used for which station. […] A general primary identifier could be IFOPT as the parent station with the APIs internal station ID and a reference to the station as children. […]

The Trainline stations database might be very helpful with this.

from traewelling.

derhuerst avatar derhuerst commented on September 23, 2024

The transport-apis project has many transit APIs listed; It intends to be the "source of truth" for basic information about these APIs (their endpoints, authentication mechanisms, licensing scheme, etc.), so that projects don't need to keep track of these changes each individually. If there is anything missing over there, please create an Issue or submit a PR!

from traewelling.

derhuerst avatar derhuerst commented on September 23, 2024

Regarding the actual idea being discussed here:
I think that many tricky technical and UX questions arise once starts having >1 underlying data source:

  • Shall the data sources be completely separate? E.g. when I check into a train/trip as represented by the DB HAFAS, and another person checks into that (same real-world) train/trip as represented by an SNCF data source, will we see each other as being on the same train/trip?
  • If we have built a mechanism to identify two data items as being about the same (one real-world) train/trip, do we form a new "proprietary" ID that "masks" the underlying DB/SNCF IDs? If we do this, then we need to either a) keep a mapping between them for a long time, or b) make the new ID contain the underlying data source IDs somehow.
  • If we have tackled the above items, how do we make sure the UX is not confusing. Let's assume we have decided to either a) merge the properties from both data sources about one real-world item, or b) to decide to show only one set of properties. How do we make sure users can find the train/trip they're looking for if they're used to a very specific naming scheme (e.g. "RE 1" vs "RE 73793", "TGV INOUI 123" vs "TGV 123")?

I have brainstormed more about some technical aspects topic in Why linked open transit data?, stable-public-transport-ids, and experimented with fusing >1 (HAFAS-like) data source in pan-european-public-transport.

TLDR: Adding another data source is technically feasable, but how do we create a usable UX from that?

from traewelling.

HerrLevin avatar HerrLevin commented on September 23, 2024

I'm currently working on a really hacky POC to inject GTFS data into the DB-Rest response so that we might be able to combine multiple data sources without having to drastically change our internal project's structure. The repo will be made publically available around the start of the GPN next week.

Currently, it's forwarding the departure request directly to db-rest v5 while simultaneously searching for departures on that IBNR. The departures provided via GTFS are then injected into the JSON. To determine what endpoint to call when we're getting a journey request, I simply took inspiration from the current HAFAS-Trip-IDs and added a "GTFS|{gtfs-id}" prefix to the trip IDs. This might be extended to combine multiple APIs from multiple (overlapping) data sources, but the first step might be, to add ÖBB, SNCF, SBB, etc., and restrict them to regular public transport like busses and trams, which are not covered by DB's HAFAS system.

I might have a few ideas to combat your above-mentioned problems:

  • In our case: (mostly) yes. We want to use the "official" data endpoint for one vehicle, e.g. Karlsruhe public transport uses their open data endpoint, ICEs use DB Hafas, TGVs use SNCF's and so on (This adds one bigger question: What do we do with trains crossing borders? Is the TGV-Data provided by SNCF more or less accurate than the DB's? Just guessing by the DB's polylines, everything outside of state lines is "bad data")

  • This will be done w/ a proprietary combination of some proprietary prefixes and the API's original ID.

  • This is the biggest question in my opinion b/c it just opens even more questions. My current ideas are the following:

    • We need to keep track of which APIs should be used for which station. This could be done by using a modified version of the GTFS stops table. A general primary identifier could be IFOPT as the parent station with the APIs internal station ID and a reference to the station as children. Maybe even additional information such as "only long-distance trains" could be added.

    • In my opinion, the "correct way" of displaying the line name, etc. is using what the "correct" API is providing. However, this could be extended by providing additional information in some sort of translation schema since it will indeed be confusing to end users in some situations. I'm not completely happy with this approach but it's the best I came up with until now.

This is all in its infancy at the moment but already describes the rough direction I'd like to go.


P.S.: speaking of GPN - will we see you there? 👀

from traewelling.

vainamov avatar vainamov commented on September 23, 2024

It's unfortunately limited to trains within Finland, but the Fintraffic API is awesome: https://www.digitraffic.fi/en/railway-traffic/

from traewelling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.