Git Product home page Git Product logo

Comments (25)

abyrd avatar abyrd commented on May 3, 2024 2

Thanks for all the responses. To clarify, I am neither for or against adding these fields at this point. I just have an interest in the quality and sustainability of GTFS related specs, so I'm trying to ask relevant questions to make sure that we think through all the implications and alternatives before making changes that will affect many people into the future.

In cases like this one, a particular solution is presented, rather than starting from a problem statement for which different solutions can be compared. @ibi-group-team you said that "there is a need for these timestamp fields", but I see these fields as one proposed solution to the stated underlying need: allowing existing notifications to be revised and controlling whether end users are notified of these changes.

The other need, that some apps / websites simply want to display when an alert was created or changed to the end user, is a more convincing argument for timestamp fields (although persistent alert IDs would also allow showing timestamps with accuracy equal to the polling interval, or complete accuracy in streaming RT).

About emailing or alerting end users, @ibi-group-team said:

Consumers could trigger similar functionality when an alert is removed from a feed, but this relies on an implicit assumption that the alert was removed because it was closed rather than due to a data processing error that caused the producer to temporarily leave out the alert from the feed.

Can you give a concrete example of the kind of "data processing error"? In theory any component of a software system could fail and lose arbitrary data, but in a production system that should be a rare exception. Of course a format may include redundant information, checksums, etc. to detect such errors, but I'm not sure the timestamp achieves this goal. If an alert is lost or deleted in one component, some component downstream could still apply a "closed" timestamp when it disappears from the feed. And in any case, does seeing that an alert was "closed" in the past imply that a notification should be sent? Does something prevent a producer from adding a "closed" timestamp to an alert that was deleted or replaced because it was erroneous, rather than a real disruption or situation that was resolved? Should such deleted erroneous alerts not have a closed timestamp? If they should have timestamps, how do we prevent "situation resolved" emails being sent? And when should the notification be sent if the closed date is in the future? Should it be a boolean or enum flag rather than a timestamp to avoid such incoherencies?

To meet the stated needs, it seems that more so than timestamps, there is a need for declaring qualitatively why a new alert has appeared (is it a typo correction, or a correction of factual information on an existing situation, or a completely new situation) or why an existing alert has disappeared (it was created by mistake, the disruption is resolved, it was replaced by another alert with corrected information).

@ibi-group-team also said:

While maintaining consistent ids may address some of these needs, this is a fairly data-intensive process that requires applications to compare every feed with all previous feeds to determine if any alert has changed.

I disagree with both statements. Even for a feed with a large number of alerts, I would expect processing to detect alert changes to be trivial. A reasonable implementation might involve one hash table lookup and a couple of string comparisons per alert. This should scale well and I would not expect it to be a major contribution to load even on embedded platforms. This comparison should not be significantly more complex or data-intensive than comparing timestamps. I also would not expect any system to retain and compare to all previous feeds. A reasonable implementation would compare new incoming data with a single buffer of the full system state, accumulated over all previously received data sets.

This would also require consumers to assign created/last modified/closed timestamps based on the time that a consumer processed a feed, which will lead to inconsistency of these times between different consumers rather than all consumers using the timestamps for these values that are provided by the producer.

it doesn't seem like an inherent problem for a consumer to apply its own timestamps to reflect when it received new information, or for different consumers to have slightly different timestamps reflecting when they became aware of changes.

There seem to be a lot of unstated assumptions or rules about how the timestamp information is supposed to affect consumers' behavior. If these fields are adopted they would need to be accompanied by a fair amount of guidance on what they mean and what effects they should have, as well as some invariants to be enforced. Can the closed_timestamp be before the active_period.end of the alert? For that matter, can it be after the active_period.end of the alert? As @skinkie mentioned, which combinations of active_period, timestamps, and clock time are expected to suppress display of an alert? Are these consumer actions and producer invariants requirements or just suggestions?

@gcamp, you say that we "usually don't want to send notification for updates on the same alerts. Often it's typo fixes, or really small changes" (emphasis mine). But on some occasions it will be a notification-worthy change. How do you know when the change is a typo versus a correction in meaning? I suppose in one case you could delete and recreate the alert, while in the other you just update an existing alert, but these supplemental rules and semantics have not been spelled out.

Is it sufficient to know that a particular alert has been updated (in the reusable-IDs variant) or that a new alert updates an old one (in the no-ID-reuse variant) to decide whether to push a notification? Correcting a typo is considered an update, but isn't a substantive change to the information also considered an update?

@barbeau I would disagree that agencies producing experimental fields and other people consuming them is sufficient grounds to include something in a specification. The change must be acceptable to all the other people and organizations who will be required to produce and consume the new data elements. And the changes must be shown to be coherent with a bigger picture, lest GTFS/RT become an ever larger collection of ad-hoc patches.

Finally, we definitely need to resolve the question of whether or not alert IDs may (or must) be reused for corrected alerts, otherwise interpretations of closed/replaced/fixed alerts will differ wildly.

from transit.

abyrd avatar abyrd commented on May 3, 2024 1

I added a comment over on the PR #134 - I don't think voting on these fields one by one will be the solution. I think we just need full definitions and semantics before adding something to the spec. I realize some people are already producing and consuming these fields, but official adoption should require more clarity.

from transit.

skinkie avatar skinkie commented on May 3, 2024

We materialise in GTFS-RT an entire valid situation. Can you refer to any version management on any current structure?

from transit.

ibi-group-team avatar ibi-group-team commented on May 3, 2024

@skinkie can you please clarify what you mean by "any version management on any current structure"?

from transit.

skinkie avatar skinkie commented on May 3, 2024

I would like to introduce three attributes for reasons I do not understand. When an alert is removed from the feed it can be removed from the consumer, there is no ambiguity about that. So I would like to know, is there any location in GTFS-RT at this moment that would retain 'historic information' after the data has been expired, or announcing something in the future.

from transit.

abyrd avatar abyrd commented on May 3, 2024

I think I get where @skinkie is coming from. You are proposing the addition of three fields and explain their characteristics, but it's not clear why they are needed. I would expect a GTFS-RT feed to simply include all currently valid alerts, and no others. I would not expect an alert that is "closed" to continue to appear in the feed. If an alert is modified, I would simply expect its contents to change in the feed.

You say that this allows "feed consumers to monitor the lifecycle of an alert", but why would a consumer want to perform such monitoring?

You say that "consumers must monitor every incremental file to infer these values, which may cause errors". However as far as I know most consumers would not use these values, so they would not need to infer them. Even if they did want to infer these values, I do not see why a consumer would need to monitor every incremental file - the most recently fetched one should contain a fully valid dataset even if intermediate files were skipped.

There appears to be some relationship between the "active period" of an alert and the concept of created/closed timestamps, but this relationship is difficult to understand.

If a feed producer wants to remove an alert before the active period end time, then they could just remove that alert from the feed. The next time the consumer polls, the alert is no longer in the feed. It no longer exists and should not be displayed.

from transit.

gcamp avatar gcamp commented on May 3, 2024

We have the same values internally so I get where this is coming from. When an alert is modified, currently it's impossible to know if a previous alert is modified or it's a new one created.

However a way simpler way to get the same result is simply to require producers to have the same IDs for alerts that get modified, and different ones for newly created alerts.

from transit.

skinkie avatar skinkie commented on May 3, 2024

@gcamp in the Dutch standardisation body we have just forbidden to recycle alert ids when changing them, just to be clear that a new number is a changed message. What I don't get is the use case that is being solved, do consumers have problems applying GTFS-RT into their internal data models?

from transit.

LeoFrachet avatar LeoFrachet commented on May 3, 2024

If you think of the set of alerts as an isolate state at time t, I agree that those values are not useful. But some consuming apps, like Transit, do more things that just showing the current state to their users, for which the difference between a new alert and an updated one is needed:

  • new alerts are pushed as notification
  • new alerts are flagged as unread (updated one could be flagged as "updated")

Some alerts are even just "service is back to normal" and you have to read the remaining other alerts to get what's back to normal (Is there no delay anymore? Are the moved stops still moved?). Knowing that this is the update of a specific alert can be useful, or just being able to say that such alert is now ended with the closed_timestamp.

So I see the use cases and the values of such fields. But as Guillaume (@gcamp) said, keeping the same ID for the different iteration would mostly the same job. The Best Practices around stable alert ID is currently unclear (see CUTR-at-USF/gtfs-realtime-validator#47). But I understand the argument of Stefan (@skinkie) which is that an alert ID should identify a unique (aka non-mutable) object.

So I see currently 2+1 options on the table:

Option A (IBI's): An "Alert" can be updated through time, with description of its updates in created_timestamp, last_modified_timestamp, closed_timestamp.

Option B (Transit's): An "Alert" can be updated through time, the updates will be trackable by comparing the different versions of alerts with the same ID.

But if we want to have unique IDs, none of them work. To have unique IDs and to keep track of which alerts replace which, we need a way to link them together. One way to do it would be something like:

Option C: Alerts are immutable, but if an alert A with A.id=0001 is replaced by an alert B with B.id=0002, the replacing alerts should have a field called replaces defined as b.replaces=0001.

from transit.

gcamp avatar gcamp commented on May 3, 2024

One small detail, I would add that Option C would work for us, but it needs to be a list of ids that it replaces, since you can't assume the refresh rate of consumers and to avoid any missing ids that consumer might miss if they refresh too slowly.

from transit.

abyrd avatar abyrd commented on May 3, 2024

I'm still having some difficulty understanding how this would be used, so additional concrete examples would be appreciated where it is important for alerts to be linked to other deleted alerts. Can the authors and commenters @ibi-group-team and @gcamp confirm that the intended use case is triggering notifications to subscribed end users? Wouldn't it already be possible to push notifications any time a new alert appears (or existing one disappears) and is attached to an entity (e.g. route) the user is subscribed to?

Are there cases where it matters that the new alert replaces a specific older alert, where it's not just sufficient to know that the new alert affects an entity the end user is interested in?

If people really want to see expired alerts, it seems to me that a backend component could just retain all the recent alerts affecting each entity, including ones that were deleted, and show them if requested. But even in a case where there are "back to normal" alerts after disruptions, would this really be a common feature expected to exist across many consumer systems? It seems sufficient to just see the most recent alert, I'm not sure someone who hadn't already seen and acted upon a past deleted alert would want to see it.

from transit.

gcamp avatar gcamp commented on May 3, 2024

For updated notifications, you usually don't want to send notification for updates on the same alerts. Often it's typo fixes, or really small changes. Instead you want to update the text of the notification (replace the notification without buzzing the phone).

For created at, we display that value in the UI of the alerts. It often helps the users get context on the alert.

That's our use case, @ibi-group-team can add more.

from transit.

ibi-group-team avatar ibi-group-team commented on May 3, 2024

Through our work with agencies, we have seen that there is a need for these timestamp fields: certain events are triggered when an alert changes or certain functionality depends on knowing when an alert is created. For example, some agency alert websites show the time when an alert was updated, which can be provided by the last_modified_timestamp, or their websites categorize alerts by their recency (“New”, “Ongoing”, etc.) which can be inferred by the created_timestamp. Some agency alert websites show closed alerts for a period of time so customers can explicitly see that a previously active alert is now closed, and email push notification systems trigger “resolved” emails to be sent to subscribers once an alert is closed. Consumers could trigger similar functionality when an alert is removed from a feed, but this relies on an implicit assumption that the alert was removed because it was closed rather than due to a data processing error that caused the producer to temporarily leave out the alert from the feed.

While maintaining consistent ids may address some of these needs, this is a fairly data-intensive process that requires applications to compare every feed with all previous feeds to determine if any alert has changed. This would also require consumers to assign created/last modified/closed timestamps based on the time that a consumer processed a feed, which will lead to inconsistency of these times between different consumers rather than all consumers using the timestamps for these values that are provided by the producer.

To address the need for these three timestamp fields, we have developed a separate feed (called the Enhanced JSON feed), which contains in JSON format all fields from the PB feed, as well as additional fields beyond those in the GTFS-realtime specification. And this feed has proven very useful for agencies. Agencies have now asked for these same additional fields to be part of the PB feed.

We understand that these timestamp fields may not be useful to all consumers and producers (even when proposed as optional fields), and so we understand if this proposal is not approved. In that case, we are ok with requesting an extension for these fields.

from transit.

tsherlockcraig avatar tsherlockcraig commented on May 3, 2024

Trillium is in a similar position to @ibi-group-team , with responsibilities for agencies that include both alerts feeds and website management. I agree with all the above comments in support of adding these three fields.

I'm also open to the other options itemized by @LeoFrachet .

from transit.

barbeau avatar barbeau commented on May 3, 2024

@ibi-group-team If agencies are already producing these fields and they are being consumed by applications, I think it's a strong argument for including them in the spec. Could please you open a pull request with a full proposal, and also add more details about who is producing and consuming the fields?

from transit.

ibi-group-team avatar ibi-group-team commented on May 3, 2024

@barbeau we are producing these fields through our enhanced JSON feed for our clients of the TRANSIT-alerts system. These fields are currently being used by the MBTA (Boston) and TransLink (Vancouver) for their websites. See the example from TransLink’s website below. @paulswartz can you add some more detail about how the MBTA uses these fields?

image

We will open a pull request with a full proposal and call for a vote soon.

from transit.

skinkie avatar skinkie commented on May 3, 2024

I would like to see one answer before a vote. Are these fields only informational or do they contain a validity like do not show after closed_timestamp?

from transit.

barbeau avatar barbeau commented on May 3, 2024

I would disagree that agencies producing experimental fields and other people consuming them is sufficient grounds to include something in a specification.

@abyrd I agree that having existing producers and consumers certainly shouldn't be the only metric by which we judge proposals.

from transit.

gcamp avatar gcamp commented on May 3, 2024

@gcamp, you say that we "usually don't want to send notification for updates on the same alerts. Often it's typo fixes, or really small changes" (emphasis mine). But on some occasions it will be a notification-worthy change. How do you know when the change is a typo versus a correction in meaning? I suppose in one case you could delete and recreate the alert, while in the other you just update an existing alert, but these supplemental rules and semantics have not been spelled out.

I agree with this, there's no way to differentiate between a major or minor change, and it would be useful to know.

from transit.

paulswartz avatar paulswartz commented on May 3, 2024

For our notification system, here's another value that we use for distinguishing between minor and major changes: last_push_notification_timestamp. When an alert is updated, there's a checkbox for "Sent updates to subscribers", and if the box is unchecked, the last_push_notification_timestamp is not updated. The notification system only sends an updated notification when that value changes.

from transit.

abyrd avatar abyrd commented on May 3, 2024

@gcamp said:

I agree with this, there's no way to differentiate between a major or minor change, and it would be useful to know.

Thanks for the comment. So, I believe we can agree the proposed fields do not supply this information. This is not just a detail that's useful to know: differentiating between major and minor changes was presented as one of the main use cases or justifications for these new fields. If timestamps do not allow this differentiation, then they do not solve the underlying problem.

I would recommend that we start over (perhaps in a new ticket) by stating the problems that we want to solve, and proposing solutions to those problems, one of which may be timestamp fields.

Underlying goals / problems to solve:

  • Need to know whether new alerts represent new situations in the field, or are updates or corrections to existing or removed alerts
  • Need to distinguish between updates that justify alerting end users and updates that are just minor corrections like typos

from transit.

abyrd avatar abyrd commented on May 3, 2024

For our notification system, here's another value that we use for distinguishing between minor and major changes: last_push_notification_timestamp. When an alert is updated, there's a checkbox for "Sent updates to subscribers", and if the box is unchecked, the last_push_notification_timestamp is not updated. The notification system only sends an updated notification when that value changes.

It's great to have another commentary on how push notifications are handled. @paulswartz I don't fully understand the system you're describing though. If there's a checkbox, this must be a producer-side tool for editing alerts. If the person editing the alerts check "sent updates to subscribers", that implies they already sent the updates by some other means - is that done automatically by another component? You say that the notification system only sends notifications when the timestamp value changes, but this seems backward to me, wouldn't the timestamp be updated when the notification system sends notifications, rather than the other way around?

from transit.

paulswartz avatar paulswartz commented on May 3, 2024

@abyrd You say that the notification system only sends notifications when the timestamp value changes, but this seems backward to me, wouldn't the timestamp be updated when the notification system sends notifications, rather than the other way around?

That was previously the case, as the producer-side and the notification-side were the same system. Now that the notification-side has been pulled into a separate service, it uses that field to determine whether the producer-side wants a notification to be sent.

I realize that there's a typo in my original post, which might have driven the confusion. The checkbox is "Send notifications to subscribers", not sent.

from transit.

ibi-group-team avatar ibi-group-team commented on May 3, 2024

@skinkie created_timestamp and last_modified_timestamp would be informational fields. It would be up to consumers to determine the functionality they want to put in place in their applications based on these fields.

As for closed_timestamp, there are a lot of good points being raised here about its validity. We do see some issues with backwards-compatibility for closed_timestamp but we have short-term need to include it in the feed for agencies we are working with.

To summarize, we still see value in providing these three fields in the feed, even if they aren’t used explicitly for sending push notifications or differentiating between major and minor changes. We believe that customers are looking for consistent information across all channels. While values for these three timestamps may be inferred by the agency or downstream applications, we believe that there is a benefit in providing explicit values to reduce inconsistencies. For example, providing explicit values will make it so a customer looking at an alert on an app will see the same last modified timestamp shown on a screen at a station.

At this point, we will be calling for a vote on this proposal in a couple of days. If the vote fails, we suggest opening issues for each of these timestamps separately for future discussions.

from transit.

LeoFrachet avatar LeoFrachet commented on May 3, 2024

Proposal has been rejected. IMHO, if somebody one day want to continue this conversation, we should divide the different needs and use case, and not discuss the three fields at once. So I'm closing this issue.

from transit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.