This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114
Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114
Changes from 8 commits
f30302d
438e222
4983739
a64bb2e
260ca06
886071b
477c15d
4191f56
f39c1da
7da8012
69dfa16
83474d9
1263c7e
ee47878
5bfde7b
2fbe3f1
1d3f417
4a12304
9a6d8fa
3e09d49
5afc264
6ea263b
3d387f9
fb8e281
c772b35
e0ff66d
76d454f
3529449
321f9ea
15c3282
5db717a
e96fd5c
f3b7b3e
7f2105a
246278e
ec35be5
bc0ba8c
363aed6
d771fbd
b559e23
6b64184
1d00043
b071426
ec33a40
b99efa8
3810ae1
df2a152
cc4eb72
47590bb
a38befa
033360a
3f22e42
e5670ff
023bd3e
b3fcffb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikjohnston After experimenting with bigger datasets using the Gitter import script and writing bigger Complement tests, I'm seeing problems with our whole
stream_ordering
approach to sorting historical messages betweendepth
.depth
is the same for all homeservers butstream_ordering
is different across homeservers and in Synapse land, just means whenever the event was processed.With all of the massaging I'm trying to do in this PR to make it work for the basic scrollback case, it won't solve all of the problems. Technically, it shouldn't really matter how
/backfill
returns things but I'm just trying to make thestream_ordering
a little more consistent from the origin to the remote homeservers in order to get the order of messages from/messages
consistent (sorted by(topological_ordering, stream_ordering)
).Even if we can backfill messages in order, it still doesn't guarantee the same
stream_ordering
(and more importantly the/messages
order) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in thestream_order
and totally throw off the sort.This will be even more of an exact use case when we add the MSC3030 jump to date API endpoint so the static archives can navigate and jump to a certain date.
The real solution around event ordering looks related to these issues where we don't rely on
depth
orstream_ordering
and it's just up to the shape of the DAG itself. Any thoughts for making this work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sort the events returned by
/backfill
before persisting them based on the DAG? We have asorted_topologically
function that can be used to sort a set of events somewhere. Or is that not enough?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may help for the specific backfill case I'm sorta optimizing here in this PR where the user is scrolling back in time sequentially.
But it doesn't help the case where you jump back in time via permalink or jump to date endpoint and start the
stream_ordering
from the middle the historical messages. It's currently important thatstream_ordering
starts from our newest historical messages at-1
and decrements to our oldest historical messages-939394
, etc. If we have the middle historical messages at-1
, then the sorting is all off.That's the most obvious case of it going wrong, but could also go wrong in
/backfill
if some other homeserver backfills in their own not Synapse way where we try to provide it in perfect order (like an older Synapse or other homeserver implementation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of doing the sorted_topologically on the receiving side? That way it doesn't matter what order the other server returns them in.
Ideally ideally, we wouldn't use depths and instead use an "online topological sorting" algorithm (e.g. Katriel–Bodlaender algorithm) to ensure that we correctly handle disjoint chunks of the graph becoming connected. But that is fiddly to do and I'm not quite sure how things like pagination tokens would work if the ordering could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sorting on both the sending and receiving sides would be good to try to ensure best chance of it working (just for server differences between sending and receiving). But it's still not bullet-proof in the permalink case where
stream_ordering
becomes out of order.Does the chunking you worked on achieve this? #3785 Worth reviving?
Any good references for the Katriel–Bodlaender algorithm? Something more grokable over text PDF 😵 -- maybe how that algorithm works doesn't matter and the aspect I'm curious about is how we store this topological order and how it's updated if we insert an event in the middle. I assume it's more than just
depth: 9
and we update every single row before or after when we insert something in the middle?I see matrix-org/gomatrixserverlib#187 which discusses the "online topological sorting" and additionally the chunking stuff.
For reference: current pagination tokens, https://github.com/matrix-org/matrix-doc/issues/574
t8-8_0_0_0_0_0_0_0_0
seems to represent a single event already, i.e. the combo ofdepth
andstream_ordering
only point to a single event.Could pagination tokens just become the event ID itself and then you just use the direction to paginate from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not found any helpful references to that algorithm beyond compsci papers. The actual algorithm is relatively fine, the major problem is picking a data structure for the labels that satisfy the necessary properties and is efficient to store in the DB:
While they suggest just using integers, that is a bit problematic for us as that would require potentially rewriting a lot of rows in the DB. There's no reason that they need to be integers, the previous work used fractions to allow inserting nodes in between existing nodes without renumbering everything (and also operated on chunks rather than individual events, iirc).
It would need to be a set of events to handle forks in the graph, annoyingly, as you basically need to serialise the state of a breadth-first search algorithm, iyswim. I.e. if you have a DAG
A -> B -> D, A -> C -> D
then if you have a token atB
you don't know if you then need to still send downC
or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be valuable to resurrect the chunking project fwiw, its just a faff without concrete benefits to end users so its just never been prioritised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversation for topological sorting when we receive backfill events continues in #11114 (comment)
For chunking and online topological sorting, we discussed this in an out of band call: