-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: create ADR 0004 for Skate MQTT Hierarchy #2686
Conversation
Coverage of commit
|
documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Outdated
Show resolved
Hide resolved
documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Outdated
Show resolved
Hide resolved
documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to #2685 (review), I think it'd be worth discussing some alternatives, and why we're choosing this approach.
Alternatives that we should at least mention:
- A single topic with no retained messages where we put explicit "diff" messages (Detour A exists now, Detour B now applies to trip xyz, Detour C is deactivated, etc)
- Topics split out by detour ID with retained messages, but instead of deactivating with a 0-byte message, deactivating with an explicit "deactivated" message
- No deactivating at all - we just stop adding trip ID's to a particular detour (or remove them if they've already been added) once a detour doesn't apply to future or current trips anymore.
3f21acc
to
5a94454
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
5a94454
to
70c3dd0
Compare
52211ca
to
c539214
Compare
70c3dd0
to
a373637
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
a373637
to
3d9b684
Compare
c539214
to
aa8ed20
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
3d9b684
to
124dfdc
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
124dfdc
to
00ca2f3
Compare
@joshlarson I copied your thoughts and added some responses, let me know what you think? |
So the one thing I'm still uncertain about is what happens if we publish a zero-byte deletion while Transit Data is down. My fear is something like the following sequence of events:
I see a few ways around this, some of which depend on precisely what MQTT does with zero-byte messages. MQTT could delete the topic. If that happens, I'm not sure what we do, TBH. MQTT could keep the topic around with no retained message (I think this is the most likely outcome), and when new clients come online, it treats the topic as empty. Transit Data could iterate through all of the topics, and treat any that have no retained message as "deactivated" and do its Transit Data thing to deactivate them downstream. This seems fine, but does run the risk of Transit Data having to iterate over an increasingly large collection of topics on startup. MQTT could deliver the zero-byte message on startup, which means this isn't even a concern (I'd be surprised if that's how this works, but I like surprises 🙂) |
As I understand it, Concentrate is ephemeral and does not have a DB, it'll re-read open topics when it comes online, and publish a new GTFS-RT file without the removed trip modification in it. |
Topics don't really "exist", you only know about them when a message is sent on it.
I don't understand? But see above, I think this is an incorrect assumption?
This is what's expected, except it's that when it first connects it'll receive all retained messages for topics it subscribes too. (which should include the wildcard to subscribe to all
There's not much to do aside from publish a new GTFS-RT file with the current data, because it's stateless, there's nothing to "remove", only to "omit" (and omission is a side-effect of the data disappearing from the feed via the zero-byte message)
We should be deleting any topics that aren't live anymore, if the collection is growing, either all those detours should be live, or there's a bug IMO. |
dc0d1bb
to
785ac3e
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
This is helpful context, thank you! I guess I wasn't (maybe am still not?) totally clear on who TD's downstream consumers are, and what data they need - for instance, I thought that TD was going to publish something directly to Swiftly, which might mean that they need to publish a deactivation to Swiftly as well? Anyway, assuming that Walton/TransitData approves, then I think this looks great! |
My understanding is that Swiftly consumes our GTFS-RT feed, so they'd just implicitly get these once we started publishing them to the feed, and they'd continue to make predictions for us based on this info. I'd defer to @Whoops to make sure, but I'm fairly certain we won't be tying into Swiftly via any specific API's at this point. |
@fsaid90 This is the question I mentioned in our meeting. Can you put what you said in writing? :-) |
Yep! Right now the flow is: Busloc -> Swiftly -> Consumers Busloc assembles GTFS-RT VehiclePositions + TripUpdates sans predictions and delivers them to Swiftly. Swiftly send adjusted feeds with predictions. Concentrate consumes + applies block waivers to cancel trips then publishes to consumers. We'll have a new flow as well: Skate -> Somewhere? S3? Concentrate? -> Swiftly -> Concentrate -> consumers Skate would send TripModifications to us, we'd have to forward them on to Swiftly (to adjust predictions), then we'd receive updated GTFS-RT from Swiftly that factors in the TripModifications, and we'd publish that GTFS-RT (alongside our TripModifications). I don't expect that Swiftly will adjust TripModifications, they just need to know about them to update predictions accordingly (i.e. we can publish the TripModifications as is from Skate). @Whoops you mentioned this might be a circular dependency, but I'm not sure that it is! Let me know what you think. |
Seems a little circular to me 😅 graph TD
Skate
-- MQTT --> Concentrate
-- without predictions --> Swiftly
-- with predictions --> Concentrate
--> Consumers
But also, Skate could become a consumer at some point because we'd potentially want the Alert information that Alerts would add eventually. graph TD
Skate
-- MQTT --> Concentrate
-- without predictions --> Swiftly
-- with predictions --> Concentrate
--> Consumers -.-> Skate
|
Haha definitely looks circular 😅 I'm really thinking about it as two separate flows: GTFS-RT TripUpdates + VehiclePositions flow continues to be the same: Busloc -> S3 -> Swiftly -> Concentrate -> public TripModifications: Skate -> Concentrate -> Public AND Skate -> Concentrate -> S3 -> Swiftly <- it ends there since we're not waiting on the resource to come back to us? But obviously I'm not a developer so I'll defer to y'all 😃 |
I think the big question is what exactly is Switly expecting to consume? Right now Concentrate takes in a bunch of RT (including Swiftly) feeds, and soon these Skate messages and outputs them into an RT feed. If Swiftly is going to then consume that feed then we have a situation where Swiftly is both consuming that RT feed to generate it's output, and Switfly's output goes into that RT feed. I'm curious what you mean by "without predictions" in the diagram above. The only Concentrate output that doesn't have predictions in my mind is VehiclePositions. |
Oh that was probably unneeded and just adds confusion, but I'm thinking that Swiftly will need the VehiclePositions without predictions AND the Trip Modifications, so that it can make predictions on the detoured trips. Where/How does Swiftly get our VehiclePosistions and TripUpdates without predictions currently? |
Here's where I might be missing something - in my thinking:
Swiftly retrieves VehiclePositions and TripUpdates that Busloc produces and dumps into S3 - basically polling |
I'm wondering about the Our current code doesn't do anything to compute that field, and I'd been thinking about that field as effectively a prediction, which puts it into Swiftly's wheelhouse. So I had envisioned a TripModification going through a pipeline like
How do you all envision (I think this field is what Kayla was referring to by "with/without predictions".) |
I think that because Swiftly is responsible for providing the TripUpdates data, that their predictions will just include the info computed from the Trip Modification data we give them. We should assert that this assumption is correct, but this seems most correct to me. So I don't think we need to provide the Edit!!: |
On `Shape` Messages, the | ||
[`Shape.shape_id`](https://gtfs.org/realtime/reference/#message-shape) | ||
field will also have a unique ID generated by Skate, which will be | ||
referenced by the corresponding Trip Modification's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: if the shape also has an ID, why not:
<prefix>/trip_modifications/#{modification id}
<prefix>/shapes/#{shape id}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this feedback, this does give me pause!
cc: @mbta/skate-developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little stuck here.
One issue with separating them out, is that you have to predict what id's you want to subscribe to if you want to subscribe to only a single modification and all of it's dependencies.
I know that this is attempting to predict the future, and we don't have a use case for subscribing to individual topics, but following the "Best Practices" from HiveMQ I wonder if having a shared ID for identifying all related resources to a trip modification is more extensible?
OTOH, IMO it's possible (but unlikely, given ordering requirements if for example it applied to an inbound and outbound trip) that we'd have a single shape referenced by multiple trip modifications, in which the proposed structure falls apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find the relationship between shapes and trip modifications to decide whether the risk you note in the last paragraph ("... in which the proposed structure falls apart") is likely to be a problem. TripModifications
contains a list of SelectedTrips
which have Shape
s of the existing route/trip, but the Modifications
field doesn't have any nested references to Shape
s, it seems. If that's the case, then the logic in paragraph 2 especially makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if two trips had different lists of missed stops but the same detour shape, then we'd need to publish two trip modifications. I don't think we're at the point where we'd do that sort of optimization(share a single shape between "disjointed" trips/trips on different routes), but the current structure would not cope with that as well as the format @paulswartz proposed.
It's still doable and not that much of an issue if the subscriber subscribes to the wildcards.
<prefix>/trip_modifications/+/trip_modification
<prefix>/trip_modifications/+/shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oh ohhhhh I was somehow missing that selected_trips
contain a reference to the new detour shape, not the existing trip shape. Ok, so I can totally see a situation where different trip mods can reference the same shape. And I can see where trips on different routes would get sent down the same detour path, right? But I agree that the reuse of shapes for different trip_mods would be infrequent, and so nesting shapes under trip_mods seems logical (for your subscription reasoning)
documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Outdated
Show resolved
Hide resolved
documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Outdated
Show resolved
Hide resolved
785ac3e
to
3483134
Compare
Coverage of commit
|
d33e08e
to
49c24b6
Compare
Coverage of commit
|
Coverage of commit
|
Co-authored-by: Hannah Purcell <69368883+hannahpurcell@users.noreply.github.com> Co-authored-by: Josh Larson <jlarson@mbta.com>
49c24b6
to
61465b8
Compare
Coverage of commit
|
This proposes the initial schema and topic hierarchy.
Rendered: https://github.com/mbta/skate/blob/kf/doc/mqtt-hierarchy-adr/documentation/adr/0004-trip-modifications-mqtt-topic-structure.md
Asana Ticket: https://app.asana.com/0/1205385723132845/1207716595296863/f