The gtfs-realtime-validator from mobilitydata

Speed up gtfs-validator component build

Issue by barbeau
Tuesday Apr 25, 2017 at 16:43 GMT
Originally opened as CUTR-at-USF#152

Summary:

mvn package can take a while to execute - most of this time is due to pulling the components needed to build gtfs-validator-webapp (https://github.com/conveyal/gtfs-validator/tree/master/gtfs-validator-webapp) so we can display the results in our webapp.

We should try to speed this up. We are using https://github.com/eirslett/frontend-maven-plugin to build the webapp via npm and gulp + browserfy.

Possible options for speedup:

Examine caching - does frontend-maven-plugin/npm/gulp/browserfy allow caching so it doesn't do a clean build every time?
Examine execution phase for frontend-maven-plugin - in pom.xml, we have the <phase> as compile, but the default according to https://github.com/eirslett/frontend-maven-plugin#usage the default <phase> is generate-resources. Would changing to generate-resources speed up the build?
Include an option to allow people to turn off building the gtfs-validator component if they don't need static GTFS validation.

Steps to reproduce:

Execute mvn package

Expected behavior:

Build fairly quickly

Observed behavior:

Builds take a while to complete - around 3 min 16 sec for mvn clean package, and about 2 min 9 sec for mvn package after that initial build on my dual CPU Xeon @ 2.5 GHz and 16 GB of RAM.

VehiclePositions - Check that there is at most one vehicle assigned to each trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:54 GMT
Originally opened as CUTR-at-USF#38

For VehiclePosition - there should be at most one Vehicle assigned to each trip_id in TripDescriptor. If there is more than one VehiclePosition with the same trip_id, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this is different from vehicle.id being unique, which is ticketed at CUTR-at-USF#254.

Check that there is at most one TripUpdate per scheduled trip_id

Issue by barbeau
Wednesday Nov 30, 2016 at 18:51 GMT
Originally opened as CUTR-at-USF#33

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this only applies to normally scheduled trips, not trips appearing in GTFS frequencies.txt.

Check that there is at most one StopTimeUpdate for each stop in a trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:52 GMT
Originally opened as CUTR-at-USF#35

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

start_time for frequency-based trips exact_times=0 is immutable

Issue by barbeau
Wednesday Apr 26, 2017 at 00:17 GMT
Originally opened as CUTR-at-USF#154

Summary:

From the spec (https://developers.google.com/transit/gtfs-realtime/reference/TripDescriptor):

If the frequency-based trip corresponds to exact_times=0, then its start_time may be arbitrary and it is expected to be the first departure of the trip.

Once established, the start_time of a frequency-based trip (exact_times=0) should be considered immutable, even if the first departure time changes; that time change may instead be reflected in a StopTimeUpdate.

If start_time changes for a trip instance that's exact_times=0 after it's created (for same start_date and vehicle_id), this is an error.

This might be tricky to detect, as you need to look at the current and past updates to detect changes. We currently have access to both the current and previous feed message in the validator classes, and I'd like to try to stick to examining only these two messages if possible.

We may not be able to detect this problem in feeds with 100% confidence, so in that case we need to decide if we're comfortable still calling this an error or if we should drop it down to a warning.

GTFS-rt feed should meet a minimum HTTP request success rate

Issue by barbeau
Friday Apr 07, 2017 at 20:59 GMT
Originally opened as CUTR-at-USF#113

Summary:

For a few GTFS-rt feeds I've seen (MBTA being one), they will occasionally fail to return a result to the consumer when an HTTP request is sent to the GTFS-rt feed endpoint. In this tool, there should be a warning generated on each failure, and an error generated when the success rate falls below a certain threshold. For the summary section of each feed in the UI, and the total summary, we should also include a count of failures, and the current success rate.

For "Overview", UI should look like:

Total HTTP requests: 50
Total unique responses: 25
Total request failures: 5
Total request success rate: 90%

For "Feed - XYZ", the UI should look like:

HTTP requests: 50
Unique responses: 25
Request failures: 5
Request success rate: 90%

One open question for the GTFS-rt community is "what should this threshold be"?

For this tool, let's start with an error threshold of 5% - so if more than 5% of requests are failures, the tool will generate an error in the log, and will turn the text for "HTTP Request success rate:" red.

Steps to reproduce:

Start the tool

Expected behavior:

Show me a summary of HTTP errors, and log warnings/errors as appropriate for HTTP failures

Observed behavior:

HTTP failures are output to console, but not counted or shown in UI, and they do not generate warnings or errors.

Should StopTimeUpdates be propagated across trips in the same block?

Issue by barbeau
Monday Mar 27, 2017 at 17:14 GMT
Originally opened as CUTR-at-USF#90

Summary:

To my knowledge, currently it's not clear whether or not consumers should propagate StopTimeUpdates across trips in the same block. It IS clear that updates should be propagated within the same trip.

See my comment on item 4 here:
https://groups.google.com/d/msg/gtfs-realtime/Ua8f2AFQ9U4/EDTPDuEcAgAJ

Excerpt:

In my option the best practice is to propagate delays down the block, but being sure to honor any layovers (i.e., a layover might be able to absorb small delays, which would result in the vehicle departing the layover stop on time). IMHO early arrivals also shouldn't be propagated past stops with timepoint=1, although neither of these behaviors are currently specified in GTFS-rt, and as a result consumers will have different behavior.

For example, OneBusAway (https://onebusaway.org/) propagates delays across blocks in the same trips, but OpenTripPlanner (http://www.opentripplanner.org/) does not. We're using OTP for trip planning within OneBusAway, and we'd like to change OTP to match the behavior of OBA and propagate delays across trips in the same block. We're finding that often we know a bus is running really late (e.g., 20 minutes), and a user tries to plan a journey for a trip further down the block, but OTP prevents that real-time delay from showing up in the user's trip plan until the vehicle actually running that trip. So we show no real-time info until only a few minutes before the bus will actually arrives, at which point we suddenly show a huge delay.

I don't think this affects our validator at all, but I wanted to capture this here with the "GTFS-rt spec clarification" so I can revisit this in the GTFS-rt community.

Check that there is at most one StopTimeUpdate for each stop in a trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:52 GMT
Originally opened as CUTR-at-USF#35

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Frequency-based exact_times = 1 trips must contain trip_id, start_time, and start_date

Issue by barbeau
Wednesday Apr 26, 2017 at 14:41 GMT
Originally opened as CUTR-at-USF#163

Summary:

TripDescriptor field must contain trip_id, start_time, and start_date if the trip is defined in GTFS frequencies.txt with exact_times=1. If the TripDescriptor doesn't contain trip_id, start_time, and start_date, this is an error. See https://developers.google.com/transit/gtfs-realtime/reference/TripDescriptor.

Note we have E006 for frequency-based exact_times = 0 trips implemented in FrequencyTypeZeroValidator which is this same concept but for exact_times=0 trips, but I'd prefer to separate these into two different errors to draw out the distinctions for the two types of trips. So, this new rule would be implemented in FrequencyTypeOneValidator, with the unit tests being implemented in FrequencyTypeOneValidatorTest.

Check that there is at most one StopTimeUpdate for each stop in a trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:52 GMT
Originally opened as CUTR-at-USF#35

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

VehiclePositions - Check that there is at most one vehicle assigned to each trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:54 GMT
Originally opened as CUTR-at-USF#38

For VehiclePosition - there should be at most one Vehicle assigned to each trip_id in TripDescriptor. If there is more than one VehiclePosition with the same trip_id, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this is different from vehicle.id being unique, which is ticketed at CUTR-at-USF#254.

trip_id not provided for blocks with reoccurring stop_ids

Issue by barbeau
Wednesday Apr 26, 2017 at 13:55 GMT
Originally opened as CUTR-at-USF#157

Summary:

If a GTFS block contains multiple references to the same stopId (i.e., the bus visits the same stopId more than once in the same block), but in different trips, then in the GTFS-rt data the tripId for each TripUpdate.TripDescriptor must be provided. If not, this is an error. In this case, the bus wouldn't visit the same stopId more than once in the same trip.

This is currently defined as E008 in ValidationRules but is not implemented.

Should header timestamps always be the most recent timestamp in the feed?

Issue by barbeau
October 11, 2017
Originally opened as CUTR-at-USF#288

We currently throw an error E012 (https://github.com/MobilityData/gtfs-realtime-validator/blob/master/RULES.md#e012---header-timestamp-should-be-greater-than-or-equal-to-all-other-timestamps) if any entity timestamps are greater than the header timestamp, as I believe this is the most common interpretation of the spec.

However, I tried to formalize this in the spec via google/transit#55 and wasn't able to get any agencies or consumers attention to comment on it (google/transit#55). This would be a good candidate for GTFS-realtime best practices discussion.

How long should TripUpdates appear before start of trip?

Issue by barbeau
Saturday Feb 18, 2017 at 20:52 GMT
Originally opened as CUTR-at-USF#67

Summary:

From an OBA deployer:

a gtfs-realtime question: is there guidance somewhere around how long before a trip is scheduled to start that a trip update for it should appear in the feed? any idea what's normal for this? is it normal for a trip update to not show up until the trip has actually started (regardless of whether it was on schedule or not)?

As if now there isn't any official guidance. It's better for consumers if the TripUpdate is published before the bus starts rolling so riders have advanced notice of the trip status.

New rule - Check Alert EntitySelector field integrity

Issue by barbeau
Thursday Oct 06, 2016 at 14:12 GMT
Originally opened as CUTR-at-USF#19

See discussion at https://groups.google.com/forum/#!topic/gtfs-realtime/jamsDygrcSk.

Protocol Buffer currently allows you to specify route_id and route_type, and .proto indicates that they should be joined as an AND. However, the documentation suggests they are treated as an OR.

Check that there is at most one TripUpdate per scheduled trip_id

Issue by barbeau
Wednesday Nov 30, 2016 at 18:51 GMT
Originally opened as CUTR-at-USF#33

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this only applies to normally scheduled trips, not trips appearing in GTFS frequencies.txt.

Check that there is at most one TripUpdate per scheduled trip_id

Issue by barbeau
Wednesday Nov 30, 2016 at 18:51 GMT
Originally opened as CUTR-at-USF#33

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this only applies to normally scheduled trips, not trips appearing in GTFS frequencies.txt.

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

Check that there is at most one TripUpdate per scheduled trip_id

Issue by barbeau
Wednesday Nov 30, 2016 at 18:51 GMT
Originally opened as CUTR-at-USF#33

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this only applies to normally scheduled trips, not trips appearing in GTFS frequencies.txt.

Fill in "Help" and "Contact Us" menu items

Issue by barbeau
October 13, 2017
Originally opened as CUTR-at-USF#293

Summary:

In the upper-right corner of the landing page, we have "About", "Help", and "Contact Us" menu options that currently don't do anything when clicked.

EDIT - I'm moving "About" to a new issue as it's a different implementation - see #52.

We need to fill these in, or link them somewhere, TBD.

Steps to reproduce:

Go to landing page

Expected behavior:

Be able to find out more in ~~"About"~~, "Help", and "Contact Us" menu items by clicking on them.

Observed behavior:

Nothing happens when you click on them.

Add "About" menu in webapp

Issue by barbeau
Oct 25, 2017
Originally opened as CUTR-at-USF#299

Summary:

In the upper-right corner of the landing page, we have "About" menu option that currently doesn't do anything when clicked.

We need to fill this in. For "About", I'd like it to be a small drop-down that includes the Maven version number of the build and the Git commit ID for the head commit on which it was built.

Steps to reproduce:

Go to landing website page

Expected behavior:

Be able to find out more in "About" menu item by clicking on it

Observed behavior:

Nothing happens when you click on it

Should early arrivals/departures be propagated across timepoints?

Issue by barbeau
Monday Mar 27, 2017 at 17:18 GMT
Originally opened as CUTR-at-USF#91

Summary:

Doesn't affect our GTFS-rt validator tool, but I wanted to capture this here so I can revisit with GTFS-rt community.

IMHO, early arrivals/departures should not be propagated past stops labeled as timepoints, as the vehicle would be expected to wait at the timepoint until it catches up with the schedule.

I previously posted about this to the GTFS-rt list, but only one operator responded (he did agree with me):
https://groups.google.com/forum/#!searchin/gtfs-realtime/timepoint|sort:relevance/gtfs-realtime/IjLaWmLnvbk/F65uYPzjCwAJ

In GTFS, we now have a timepoint field in stop_times.txt, which has the definition:

The timepoint field can be used to indicate if the specified arrival and departure times for a stop are strictly adhered to by the transit vehicle or if they are instead approximate and/or interpolated times.

If an agency specifies "timepoint=1" for a stop in stop_times.txt, should a GTFS-rt consumer propagate negative delays (i.e., buses running ahead of schedule) downstream through these timepoints*?

In theory, if the bus operator is adhering to the timepoint, they should hold the bus until they are back on schedule (i.e., a 0 delay). In this situation, it would make sense to change to a 0 delay value at the timepoint stop, and propagate this 0 delay down the line from the timepoint on. It's likely that this would be a more reasonable estimate of when the bus would arrive for stops downstream of the timepoint, rather than a negative delay propagated from further upstream down through the timepoint.

I'm also interested to hear from any GTFS producers providing timepoint values if their vehicles do indeed follow this behavior for stops marked with "timepoint=1".

Assuming that a delay or time value is provided by the GTFS-rt producer for a stop upstream of the timepoint, but not at the timepoint or downstream of the timepoint.

New rule - Check Alert EntitySelector field integrity

Issue by barbeau
Thursday Oct 06, 2016 at 14:12 GMT
Originally opened as CUTR-at-USF#19

See discussion at https://groups.google.com/forum/#!topic/gtfs-realtime/jamsDygrcSk.

Protocol Buffer currently allows you to specify route_id and route_type, and .proto indicates that they should be joined as an AND. However, the documentation suggests they are treated as an OR.

VehiclePositions - Check that there is at most one vehicle assigned to each trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:54 GMT
Originally opened as CUTR-at-USF#38

For VehiclePosition - there should be at most one Vehicle assigned to each trip_id in TripDescriptor. If there is more than one VehiclePosition with the same trip_id, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this is different from vehicle.id being unique, which is ticketed at CUTR-at-USF#254.

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

StopTimeUpdate stop/stop_sequence pairing must match GTFS?

Issue by barbeau
October 20, 2017
Originally opened as CUTR-at-USF#297

Summary:

We currently have rule "E045 - GTFS-rt stop_time_update stop_sequence and stop_id do not match GTFS" for this, but after this discussion it doesn't look like this is explictlly mentioned in the spec - see:
https://groups.google.com/forum/#!topic/gtfs-realtime/BZOfsVeI2Cc

And StopTimeUpdate docs:
https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#message-stoptimeupdate

A point to clarify in the current spec, and potential part of a proposal to add trips with new geometries dynamically, or a proposal to identify child stops of stations.

Note that changes in stop_sequence would also affect VehiclePosition.current_stop_sequence:
https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#message-vehicleposition

Can routes be ADDED in GTFS-realtime?

Issue by barbeau
October 11, 2017
Originally opened as CUTR-at-USF#290

Summary:

Right now we have rule E004, which flags route_ids that appear in the GTFS-realtime feed but not the GTFS data.

I saw the following ADDED trip in STIF's feeds (caveat - it's actually a 3rd party, not their official feed - see https://groups.google.com/d/msg/onebusaway-developers/3lDMoMeF2zQ/u5UXvx5TAgAJ), which has a route_id that doesn't appear in RATP's GTFS data:

GTFS - http://ratp.spiralo.net/stif_gtfs_enhanced_rer_latest.zip
GTFS-realtime Trip Updates - http://stif.spiralo.net/STIF

"id": "3",
"trip_update": {
"trip": {
  "trip_id": "SNCF-ACCES:VehicleJourney::11102017-24-TLN-106-7-233000:LOC",
  "start_time": "1507757580",
  "start_date": "20171011",
  "schedule_relationship": "ADDED",
  "route_id": "800850012:T11"
},
"stop_time_update": [
  {
	"arrival": {
	  "time": 1507757560
	},
	"departure": {
	  "time": 1507757580
	},
	"stop_id": "StopPoint:8769734:800:T11"
  }
],
"vehicle": {
  "id": "Unmatched_SNCF-ACCES:VehicleJourney::11102017-24-TLN-106-7-233000:LOC"
},
"timestamp": 1507695818
}

From a quick scan, it looks like the only reference in GTFS-realtime for ADDED is for trips:
https://developers.google.com/transit/gtfs-realtime/reference/#enum_schedulerelationship_1

...says:

ADDED - An extra trip that was added in addition to a running schedule, for example, to replace a broken vehicle or to respond to sudden passenger load.

Is adding a new route via GTFS-realtime allowed?

I think we need community clarification on this. If it is allowed, we need to modify E004 to check if the ScheduleRelationship is ADDED, and if so no error should be logged.

Steps to reproduce:

Run the normal server validator on RATP's data:

GTFS - http://ratp.spiralo.net/stif_gtfs_enhanced_rer_latest.zip
GTFS-realtime Trip Updates - http://stif.spiralo.net/STIF
Expected behavior:

All route_ids that appear in the GTFS-realtime feed should appear in the GTFS data

Observed behavior:

There is an ADDED trip that includes a route_id that doesn't appear in the GTFS data - is this ok?

When two feeds are being monitored, show both in Iteration Details UI

Issue by barbeau
Jul 6, 2017
Originally opened as CUTR-at-USF#244

Summary:

We currently have rules "W003 - ID in one feed missing from the other" and "E047 - VehiclePosition and TripUpdate ID pairing mismatch" in CrossFeedDescriptorValidator that compare a TripUpdates feed against a VehiclePositions feed. However, we don't allow the user to see more than one protocol buffer message in the Iteration Details page - this means you can see what the validator logged for the VehiclePositions feed that generated this error, or the TripUpdate that generated this error, but not both.

We should support viewing both the VehiclePositions message and TripUpdate message side-by-side if both types of feed are being monitored.

For example, below is an example of W003 that was logged for the MBTA feeds - it says that:

vehicle_id 10139 is in VehiclePositions but not in TripUpdates feed

I can see the VehiclePositions feed and confirm that vehicle_id 10139 is in the VehiclePositions feed, but I have no way of pulling up the TripUpdates feed to view the message and confirm that it wasn't included.

IMPORTANT - it's possible, and likely, that in the future VehiclePositions and TripUpdates entities may be mixed within the same message - in other words, a "combined feed" at the same URL, instead of a separate URL for TripUpdates and a different URL for VehiclePositions (see #85). Therefore, we should try to implement a solution that works both for feeds with two separated URLs, as well as combined feeds.

Steps to reproduce:

Start the validator
Enter both MBTA VehiclePositions and TripUpdates feeds (and GTFS):
GTFS - http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip (~13.4MB)
TripUpdates - http://developer.mbta.com/lib/GTRTFS/Alerts/TripUpdates.pb (~8.6MB)
VehiclePositions - http://developer.mbta.com/lib/GTRTFS/Alerts/VehiclePositions.pb (~44KB)
Look for an W003 or E047 error, and try to pull up the details for the TripUpdates and VehiclePositions feeds
See #85 for possible combined feeds to test with.

Expected behavior:

If I'm monitoring two feeds (or a combined feed with both TripUpdates and VehiclePostions entities), I should be able to see both messages that were compared for rules like W003 and E047 side-by-side.

Observed behavior:

I can't view both the TripUpdate and VehiclePosition messages.

Check that the delay field is consistent with difference between the scheduled and predicted times

Issue by barbeau
Wednesday Nov 30, 2016 at 18:56 GMT
Originally opened as CUTR-at-USF#41

If not, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this applies to stop_time_update.arrival/departure.delay, as well as trip_update.delay. I noticed that SDMTS is providing stop_time_update.departure.time, as well as trip_update.delay:

"entity": [
{
  "id": "1",
  "trip_update": {
	"trip": {
	  "trip_id": "12341185",
	  "route_id": "30"
	},
	"stop_time_update": [
	  {
		"departure": {
		  "time": 1498664460
		},
		"stop_id": "95034"
	  }
	],
	"vehicle": {
	  "id": "911"
	},
	"timestamp": 1498664286,
	"delay": -120
  }
},

New rule - Check Alert EntitySelector field integrity

Issue by barbeau
Thursday Oct 06, 2016 at 14:12 GMT
Originally opened as CUTR-at-USF#19

See discussion at https://groups.google.com/forum/#!topic/gtfs-realtime/jamsDygrcSk.

Protocol Buffer currently allows you to specify route_id and route_type, and .proto indicates that they should be joined as an AND. However, the documentation suggests they are treated as an OR.

Allow multiple *Validator classes to generate occurrences for the same rule

Issue by barbeau
Jan 14, 2017
Originally opened as CUTR-at-USF#226

Summary:

In CUTR-at-USF#225 we found that currently all occurrences for the same rule need to be generated from the same *Validator class.

For optimization purposes, some of W009 checks for stop_time_updates were implemented in StopTimeUpdateValidator, while the checks specific to trips were implemented in TripDescriptorValidator. However, this currently causes problems when the *Validator classes are executed in BackgroundTask, as each class inserts a set of occurrences for the rules into the database, which caused a duplicate entry for occurrences of W009 (instead of all W009 occurrences properly being inserted in a single entry). Moving all checks for the same rule into the same class was the workaround solution implemented for W009 in 3bca917.

In the future we should examine allowing more than one *Validator class to check for the same rule, which would require modifications to BackgroundTask to combine occurrence lists from multiple *Validator classes before they are inserted into the database.

Steps to reproduce:

Have more than one Validator class generate occurrences for the same rule (for example W009 - revert this commit 3bca917).
Expected behavior:

The tool should log all occurrences correctly no matter what *Validator class generates them

Observed behavior:

If more than one *Validator class generates occurrences for the same rule, it results in more than one database record for that rule for a given iteration, which results in duplicate iterations being shown in the Log and IterationDetails page, and an invalid timestamp value of 1970... - see CUTR-at-USF#225 for details.

Platform:

Windows 7 Enterprise w/ jdk1.8.0_73 and Chrome Version 58.0.3029.110 (64-bit)

Trips with the same vehicle_id should belong to the same block

Issue by barbeau
Wednesday Apr 26, 2017 at 13:50 GMT
Originally opened as CUTR-at-USF#156

Summary:

If several trips have the same vehicle_id, the trips should belong to the same GTFS block_id (defined in trips.txt). If not, it is an warning.

This is currently defined as E007 in ValidationRules but not implemented.

Log connection failures as errors

Issue by barbeau
May 10, 2017
Originally opened as CUTR-at-USF#278

Summary:

Currently, if the GTFS-rt server does not respond with a valid message, we don't show any indication in the GUI and we log something like this to output:

[pool-4-thread-1] ERROR edu.usf.cutr.gtfsrtvalidator.background.BackgroundTask - The URL 'https://gtfsrt.api.translink.com.au/Feed/SEQ' does not contain valid Gtfs-Rt data
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
	at java.net.URL.openStream(URL.java:1045)
	at edu.usf.cutr.gtfsrtvalidator.background.BackgroundTask.run(BackgroundTask.java:114)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

or:

[pool-4-thread-1] ERROR edu.usf.cutr.gtfsrtvalidator.background.BackgroundTask - The URL 'https://gtfsrt.api.translink.com.au/Feed/SEQ' does not contain valid Gtfs-Rt data
java.net.UnknownHostException: gtfsrt.api.translink.com.au
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668)
	at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
	at java.net.URL.openStream(URL.java:1045)
	at edu.usf.cutr.gtfsrtvalidator.background.BackgroundTask.run(BackgroundTask.java:114)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

We should log this as an error so it's visible in the log. We may also want to surface this in the GUI in the HTTP metrics section as "HTTP failures".

Steps to reproduce:

Monitor a feed that results in HTTP failures

Expected behavior:

Log failures as errors, perhaps show them in HTTP metrics section

Observed behavior:

No indication of failures are shown in the log or HTTP metrics in GUI

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

New rule - Check Alert EntitySelector field integrity

Issue by barbeau
Thursday Oct 06, 2016 at 14:12 GMT
Originally opened as CUTR-at-USF#19

See discussion at https://groups.google.com/forum/#!topic/gtfs-realtime/jamsDygrcSk.

Protocol Buffer currently allows you to specify route_id and route_type, and .proto indicates that they should be joined as an AND. However, the documentation suggests they are treated as an OR.

Remove rule E014 related to TripUpdates being in block order

Issue by barbeau
Wednesday Apr 26, 2017 at 13:58 GMT
Originally opened as CUTR-at-USF#158

Summary:

trip_updates for each trip in the feed must match the sequential order for the trips in the block. For example, if we have trip_ids 1, 2, and 3 that all belong to the same block, and the vehicle travels trip 1, then trip 2, and then trip 3, the trip_updates should occur in the GTFS-rt feed in the order trips 1, 2, and 3. For example, trip 3 predictions shouldn't occur in the feed prior to trip 2 predictions. If they are out-of-order, this is an error.

This is currently defined as E014 in ValidationRules but is not implemented.

Show map-based visuals for rules with geographic information

Issue by barbeau
June 26, 2017
Originally opened as CUTR-at-USF#238

Summary:

Following CUTR-at-USF#236 we now have the ability to include html like <a href="http://geojson.io/#map=15/27.995876/-82.44294">(27.995876,-82.44294)</a> in the prefix of a rule, and it will appear clickable in the iterations details page.

I'd like to hyperlink E029 and other rules with geographic data to show that data on a map using URLs like http://geojson.io/#map=15/27.995876/-82.44294 - we could even do buffers using GeoJSON like the below:

Steps to reproduce:

See an occurrence of E029

Expected behavior:

Give me some visual representation of the info

Observed behavior:

No visual (map) representation of the information

Per stop predictions - providers shouldn't drop arrivals until after the bus passes the stop

Issue by barbeau
Tuesday Jan 12, 2016 at 21:45 GMT
Originally opened as CUTR-at-USF#16

In the "Stop Time Updates" section, the GTFS-realtime spec (https://developers.google.com/transit/gtfs-realtime/trip-updates#stop-time-updates) says:

A trip update consists of one or more updates to vehicle stop times, which are referred to as StopTimeUpdates. These can be supplied for past and future stop times. You are allowed, but not required, to drop past stop times. When doing this, be aware that you shouldn't drop a past update if it refers to a trip that isn't yet scheduled to have finished (i.e. it finished ahead of schedule) as otherwise it will be concluded that there is no update on this trip.

My interpretation of the highlighted portion is that when multiple StopTimeUpdates exist in a trip (i.e., per-stop predictions), individual StopTimeUpdates shouldn't be dropped from the GTFS-rt feed if the vehicle is running ahead of schedule until after the scheduled arrival time for that stop (from GTFS stop_times.txt).

For example, if the following data appears in the GTFS-rt feed:

Stop 4 –Predicted at 10:18am (scheduled at 10:20am – 2 min early)
Stop 5 –Predicted at 10:30am (scheduled at 10:30am – on time)

...the prediction for Stop 4 cannot be dropped from the feed until 10:21am, even if the bus actually passes the stop at 10:18am. If the StopTimeUpdates for Stop 4 was dropped from the feed at 10:18am or 10:19am, and the scheduled arrival time is 10:20am, then the consumer should assume that no real-time information exists for Stop 4 at that time.

A vendor is arguing that this text applies only to the TripUpdate, not to the StopTimeUpdates, and they are allowed to drop updates for vehicles running early as soon as the vehicle passes the stop (no matter what the scheduled arrival time is). If the vendor's interpretation is correct, it results in a very poor end user experience for consumers - for example, if a vehicle was running 5 minutes early, and the user checks their app 4 minutes before the scheduled arrival time, the app would only show scheduled information, and would show that the vehicle was expected to arrive in 4 minutes (even though at a system level we know that the vehicle already passed the stop).

Also, in OneBusAway, riders like to see negative ETAs that indicate that a bus just left, so they know if they just missed a bus (vs. facing the unknown and wondering if they actually just missed the bus, or if the system doesn't have information about that bus). However, we can't show these negative arrivals (early, on time, or late) if the producer drops the update as soon as the vehicle passes, as we would only show scheduled negative arrival times. Technically dropping on time or late arrivals after the vehicle passes the stop is allowed by the GTFS-rt spec, but in terms of best practices I would recommend that they remain in the feed for at least another few minutes and/or stops.

Related proposal to clarify the GTFS-rt spec here - google/transit#16.

The problems this created in OneBusAway are outlined here - OneBusAway/onebusaway-application-modules#162.

Discussion on the GTFS-realtime group - https://groups.google.com/forum/#!topic/gtfs-realtime/3rAf6UIhAsQ.

So, new rule that's required behavior:

Producers should not drop a past StopTimeUpdate if it refers to a stop with a scheduled arrival time in the future for the given trip (i.e. the vehicle has passed the stop ahead of schedule)

New rule that's optional behavior (best practice):

Providers should not drop late or on time arrivals for a stop until several minutes and/or stops after the bus passes the stop

Frequency type 1 GTFS-rt trip start_date should match GTFS data

Issue by barbeau
Wednesday Apr 26, 2017
Originally opened as CUTR-at-USF#165

Summary:

Frequency type 1 (trips in frequencies.txt with exact_times=1) GTFS-rt start_date should match the service date in the GTFS data (from calendar.txt and calendar_dates.txt). If not, this is an error.

Optimize fetching feeds - compression and HTTP header "if modified"

Issue by barbeau
Wednesday Apr 12, 2017 at 15:58 GMT
Originally opened as CUTR-at-USF#124

Summary:

We should be sure to look at the following, which will optimize the fetching of feeds (both GTFS and GTFS-realtime):

If-Modified-Since header
- https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25
Last-Modified header - allows us to see if the data has changed since our last request -
accept-content: gzip - allows us to download compressed content
- http://www.rgagnon.com/javadetails/java-HttpUrlConnection-with-GZIP-encoding.html

I believe we'd want to first look at If-Modified-Since header, then Last-Modified header.

Change static GTFS validator

Issue by barbeau
May 10, 2017
Originally opened as CUTR-at-USF#193

Summary:

For static GTFS validation, the Conveyal team has moved away from using gtfs-validator and focused on a newer project gtfs-lib.

We should look at integrating this into our project, in addition to, and perhaps eventually in place of, gtfs-validator.

My comments from conveyal/gtfs-validator#40 (comment):

I'm definitely open to integrating gtfs-lib in addition to (or eventually in place of) gtfs-validator into our gtfs-rt-validator - I just opened CUTR-at-USF#193 for this. Given that gtfs-lib doesn't have a web UI for initial integration and our current focus, we'd probably just spit out the JSON file and build a UI for the output later. I believe Transitland is now showing gtfs-lib in their web UI - for example, see https://transit.land/dispatcher/feed-versions/eb0cbe5ab41c9cfde0ebae42471ab5b3f712b008.

Should Service Alert IDs change when alert content changes?

Issue by barbeau
Tuesday Dec 06, 2016 at 21:05 GMT
Originally opened as CUTR-at-USF#47

It's not clear to me whether there is currently a best practice surrounding identifying changes and maintaining IDs within published GTFS-rt service alerts.

For example, it's common for an agency to publish an alert, and then make changes or updates to that same alert. In OneBusAway, we show new alerts to riders, and then mark them as "read" after the user sees it, and we completely hide it from view when the user requests that. This is all based on the situation_id from the OBA REST API response, which looks like this:

"id": "Hillsborough Area Regional Transit_ee2415bb-8c2b-4eb7-b9a0-b268110254ea"

API request:
http://api.tampa.onebusaway.org/api/api/where/arrivals-and-departures-for-stop/Hillsborough%20Area%20Regional%20Transit_6497.json?minutesAfter=35&version=2&key=TEST

API response including situation (alert):

"references": {
"agencies": [],
"routes": [],
"situations": [
{
"activeWindows": [],
"allAffects": [
{
"agencyId": "Hillsborough Area Regional Transit",
"applicationId": "",
"directionId": "",
"routeId": "",
"stopId": "",
"tripId": ""
}
],
"consequences": [],
"creationTime": 1478195507024,
"description": {
"lang": "en",
"value": "Travel between HART and PSTA with one fare. Download the Flamingo Fares app for mobile device and pay your fare via smartphone. 3-Day unlimited regional pass for $11. Check out www.gohart.org or www.psta.net for more information."
},
"id": "Hillsborough Area Regional Transit_ee2415bb-8c2b-4eb7-b9a0-b268110254ea",
"publicationWindows": [],
"reason": "OTHER_CAUSE",
"severity": "noImpact",
"summary": {
"lang": "en",
"value": "Flamingo Fares"
},
"url": null
},
...

I need to look into how we're generating the UUID of this alert for OBA - I'm guessing if any of the content changes then this UUID changes. From what I can tell there isn't any guidance in GTFS-rt for maintaining the same ID in the GTFS-rt feed for the same message. For example, if HART made a spelling error in the above alert and fixed it, I believe it would show up as a new UUID from OBA and users would need to acknowledge it as being read again.

Should we recommend that the same GTFS entity ID be maintained if there are no significant changes to the message, so users wouldn't need to acknowledge a new alert that has minor differences from a past alert?

HART GTFS-rt service alerts examples:
http://api.tampa.onebusaway.org/api/api/gtfs_realtime/alerts-for-agency/Hillsborough%20Area%20Regional%20Transit.pbtext?key=TEST

Service Alert reference:
https://developers.google.com/transit/gtfs-realtime/reference/Alert

Add tests for big feeds

Issue by barbeau
Wednesday Apr 12, 2017 at 15:21 GMT
Originally opened as CUTR-at-USF#123

Summary:

We need to make sure that as we add new rules, the validator can continue to run in real-time on production-sized feeds for major cities.

I posted a question on the GTFS-rt list asking for examples of very large feeds:
https://groups.google.com/forum/#!topic/gtfs-realtime/mM8cQIIV_-Y

These have been suggested to me so far, with largest coming first:

Dutch feed (http://gtfs.openov.nl/ - apparently OpenTripPlanner instances with 24-32GB of memory are used for this)
- GTFS - http://gtfs.openov.nl/gtfs-rt/gtfs-openov-nl.zip (~261MB)
- TripUpdates - http://gtfs.openov.nl/gtfs-rt/tripUpdates.pb (~8.4MB)
- VehiclePositions - http://gtfs.openov.nl/gtfs-rt/vehiclePositions.pb (~617K)
MBTA
- GTFS - https://cdn.mbta.com/MBTA_GTFS.zip (~13.4MB)
- TripUpdates - https://cdn.mbta.com/realtime/TripUpdates.pb (~8.6MB)
- VehiclePositions - https://cdn.mbta.com/realtime/VehiclePositions.pb (~44KB)
SEQ (Translink)
- GTFS - https://gtfsrt.api.translink.com.au/GTFS/SEQ_GTFS.zip (~28MB)
- Combined (TripUpdates + VehiclePositions) feed - https://gtfsrt.api.translink.com.au/Feed/SEQ (~2.2MB)
BART (http://www.bart.gov/schedules/developers)
- GTFS - http://www.bart.gov/sites/default/files/docs/google_transit_20170325_v3.zip (427KB)
- TripUpdates - http://api.bart.gov/gtfsrt/tripupdate.aspx (3.1KB - it's small because only 1 stop_time_update per trip)
- VehiclePositions - BART doesn't have this
NYC (but they are likely split by borough)
LA Metro (not publicly shared)
MTC for SF Bay Area (http://511.org/developers/list/apis/) (According to http://assets.511.org/pdf/nextgen/developers/Open_511_Data_Exchange_Specification_v1.0_Transit.pdf, it doesn't seem that you can pull out more than one agency at a time, so no feed that includes all bay area transit agencies exists)
CTA (Doesn't seem to be public? http://www.transitchicago.com/developers/)
HART
- GTFS - http://gohart.org/google/google_transit.zip (~2KB)
- TripUpdates - http://api.tampa.onebusaway.org:8088/trip-updates (~9KB)
- VehiclePositions - http://api.tampa.onebusaway.org:8088/vehicle-positions (~9KB)

We should add some unit tests that do basic benchmarking to ensure we're not exceeding a given duration when processing feeds. I think 2 seconds may be reasonable, but we'll need to test. We'll also need to figure out how this works for CI, as Travis is significantly underpowered when compared to a typical desktop.

Move StopLocationTypeValidator to gtfs-validator library

Issue by barbeau
Thursday Apr 13, 2017 at 16:27 GMT
Originally opened as CUTR-at-USF#126

Summary:

We currently have a rule E10 implemented in edu.usf.cutr.gtfsrtvalidator.validation.gtfs.StopLocationTypeValidator that is specific to GTFS, not GTFS-rt.

Description:

If location_type is used in stops.txt, all stops referenced in stop_times.txt must have location_type of 0

Now that we have the gtfs-validator GTFS validation project integrated into our workflow, we should move the StopLocationTypeValidator rule to that library.

From a quick look it appears that this is implemented in StopLocationTypeValidator, but StopLocationTypeValidator actually isn't referenced anywhere (other than unit tests).

Check that there is at most one StopTimeUpdate for each stop in a trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:52 GMT
Originally opened as CUTR-at-USF#35

For TripUpdate. This would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Check that start_date is consistent with GTFS

Issue by barbeau
Wednesday Apr 27, 2017
Originally opened as CUTR-at-USF#170

Summary:

For normal scheduled trips (i.e., not defined in frequencies.txt), make sure that the TripDescriptor start_date matches the GTFS trip start_date - in other words, it must be a valid start date given calendar.txt and calendar_dates.txt. If not, it's an error.

Moved here from #37 (comment).

VehiclePositions - Check that there is at most one vehicle assigned to each trip

Issue by barbeau
Wednesday Nov 30, 2016 at 18:54 GMT
Originally opened as CUTR-at-USF#38

For VehiclePosition - there should be at most one Vehicle assigned to each trip_id in TripDescriptor. If there is more than one VehiclePosition with the same trip_id, this would generate an error. See https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

Note that this is different from vehicle.id being unique, which is ticketed at CUTR-at-USF#254.

Check whether trips that are predicted to be over are included in the feed

Issue by barbeau
Wednesday Nov 30, 2016 at 18:56 GMT
Originally opened as CUTR-at-USF#42

This would generate a warning, but only if the currently time is after the scheduled arrival/departure time (see CUTR-at-USF#16).

Originally mentioned at https://groups.google.com/d/msg/gtfs-realtime/qNc7ButQbW8/tDwanvClBQAJ.

mobilitydata / gtfs-realtime-validator Goto Github PK

gtfs-realtime-validator's People

Contributors

Stargazers

Watchers

Forkers

gtfs-realtime-validator's Issues

Recommend Projects

Recommend Topics

Recommend Org