-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify which events get returned when requesting initial timeline events #366
Conversation
If we returned multiple distinct ranges, we always assumed that the history visibility was "joined", so we would never return events in the invite/shared state. This would be fine if the client had a way to fetch those events sent before they were joined, but they did not have a way as the prev_batch token would not be set correctly. We now only return a single range of events and the prev batch for /that/ range only, and defer to the upstream HS for history visibility calculations. Add end-to-end test to assert this new behaviour works.
// we only care about the LAST nid range, otherwise we can end up with gaps being returned in the | ||
// timeline. See https://github.com/matrix-org/sliding-sync/issues/365 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wording is going to confuse my future self. (Ditto for the other comment citing this issue). It makes it sound like "gaps in the timeline" is fundamentally bad---but we do want there to be events not sent to users in certain situations (e.g. joined
history vis).
"Gaps in the timeline" means presumably means(?): the proxy returns a timeline [A, ...., B, C, ... D] such that
- B is not a direct ancestor of C,
- there is some event between B and C that the user has permission to see, and
- there is nothing in the final sync response which tells clients about points (1) and (2)?
Maybe we can say "otherwise we can end up with incorrect gaps in the timeline"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno, for me this seems clear. A "gap" fundamentally is in between two things. In this case, there is a gap between two timeline events e.g:
[A,B,C,D,E] true timeline
[A,B,D,E] calculated timeline with a gap
Hence, "gaps in the timeline" are fundamentally bad, as it means we have omitted an event.
Technically, we could be pedantic and say that a "gap" is also present when you have:
[A,B,Leave, .... , Join, D, E]
and that this is a "correct" gap (assuming joined history visibility), but this has a ton of caveats and ultimately isn't something the proxy should ever do anymore as of this PR. The moment we start differentiating between correct (which won't be producible by the proxy) and incorrect gaps, it just adds more ultimately unnecessary lexicon imo.
This is further supported by how end-users describe it in bug reports:
- "I have gap in my timeline"
- "There are gaps in my timeline"
- "Sent a message from my matrix.org account. Got notification on EIX, opened it from there. When the timeline loaded, there is a gap in the timeline"
Let's use the same term to mean the same thing, a bad thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also doesn't help that we speak of "gappy" v2 syncs etc.
As outlined in #365, we were trying to be too smart before by calculating all the ranges for the room and returning them as a slice of from/to positions. This neglected to consider when history visibility was set to world_readable/shared/invited. In those scenarios, if the invite and join were in close proximity in the timeline (which is fairly typical), such that the
timeline_limit
would encompass both the invite and the join, the calculated timeline would incorrectly omit the events between the invite and the join. This is very bad, because it means the client has no way to fetch those messages using/messages
, as theprev_batch
token will relate to the invite (timeline[0]
).This PR fixes this by only returning a single range, the last joined range for that user. This ensure the prev_batch token is always correct, so the client can hit
/messages
with it and the upstream HS can calculate history visibility.It's worth noting that there isn't much of a tradeoff here. The old behaviour was worse in almost all circumstances. It was only "better" in the rare case of:
joined
history visibilityIn that one case, the proxy would be able to return more timeline events correctly, thus potentially cutting the need for a
/messages
hit.It looks like this bug has sat in the codebase since the beginning, and whilst we do have membership transition tests, and we did know we were conservative on history visibility checks, we didn't combine the two in any tests to check that clients can get the data they need. This PR also adds a regression test which explicitly hits
/messages
with the providedprev_batch
token to ensure that clients see the right data.