This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix historical messages backfilling in random order on remote homeservers (MSC2716) #11114
Merged
MadLittleMods
merged 55 commits into
develop
from
madlittlemods/return-historical-events-in-order-from-backfill
Feb 7, 2022
+342
−149
Merged
Changes from 5 commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
f30302d
Scratch debugging why events appear out of order on remote homeservers
MadLittleMods 438e222
Use OrderedDict to gurantee order returned is the same as we were bui…
MadLittleMods 4983739
Avoid constant missing prev_event fetching while backfilling
MadLittleMods a64bb2e
Add changelog
MadLittleMods 260ca06
Some more trials of trying to get many many events to backfill in ord…
MadLittleMods 886071b
Fix backfill not picking up batch events connected to non-base insert…
MadLittleMods 477c15d
Some more debug logging
MadLittleMods 4191f56
Remove fake prev events from historical state chain
MadLittleMods f39c1da
Remove debug logging
MadLittleMods 7da8012
Remove extra event info
MadLittleMods 69dfa16
Move to sorting the backfill events in the existing sorted
MadLittleMods 83474d9
Put MSC2716 backfill logic behind experimental feature flag
MadLittleMods 1263c7e
Remove unused import
MadLittleMods ee47878
Fix mypy lints
MadLittleMods 5bfde7b
Merge branch 'master' into madlittlemods/return-historical-events-in-…
MadLittleMods 2fbe3f1
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods 1d3f417
Revert back to string interpolation for SQL boolean value
MadLittleMods 4a12304
Put empty prev_events behind new room version
MadLittleMods 9a6d8fa
WIP: Don't include the event we branch from
MadLittleMods 3e09d49
Revert "WIP: Don't include the event we branch from"
MadLittleMods 5afc264
WIP: Sort events topologically when we receive them over backfill
MadLittleMods 6ea263b
Revert "WIP: Sort events topologically when we receive them over back…
MadLittleMods 3d387f9
WIP: Sort events topologically when we receive them over backfill
MadLittleMods fb8e281
Fix direction of fake edges
MadLittleMods c772b35
Implement backfill in handler so we can do fetching later
MadLittleMods e0ff66d
Fix backfill being able to cleanly branch into history and back to "l…
MadLittleMods 76d454f
Some backfill receive sorting fixes but not using it yet
MadLittleMods 3529449
Fix lints
MadLittleMods 321f9ea
Move back to the old get_backfill_events and simplify backfill.
MadLittleMods 15c3282
Remove the new backfill implementation and pull some good parts of th…
MadLittleMods 5db717a
Always process marker events regardless if backfilled
MadLittleMods e96fd5c
Add comment docs
MadLittleMods f3b7b3e
Add better explanatory comment
MadLittleMods 7f2105a
Remove topological sort when receiving backfill events
MadLittleMods 246278e
Fix lints
MadLittleMods ec35be5
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods bc0ba8c
Protect from no auth events for non-existent provided prev_event
MadLittleMods 363aed6
Revert unused refactor to get PDU raw
MadLittleMods d771fbd
Only run the tests package to get streaming Complement output
MadLittleMods b559e23
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods 6b64184
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods 1d00043
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods b071426
Plumb allow_no_prev_events through for MSC2716
MadLittleMods ec33a40
Make the historical events float separately from the state chain
MadLittleMods b99efa8
Plumb allow_no_prev_events through create_and_send_nonmember_event
MadLittleMods 3810ae1
Clarify comments
MadLittleMods df2a152
Fix NPE when trying to grab event from wrong roomId (fix sytest)
MadLittleMods cc4eb72
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods 47590bb
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods a38befa
Some review optimizations
MadLittleMods 033360a
Fix lints
MadLittleMods 3f22e42
Fix unused lint
MadLittleMods e5670ff
Fix lints
MadLittleMods 023bd3e
Don't run MSC2716 complement tests for everyone
MadLittleMods b3fcffb
Use same txn iteration optimization
MadLittleMods File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) historical messages backfilling in random order on remote homeservers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erikjohnston After experimenting with bigger datasets using the Gitter import script and writing bigger Complement tests, I'm seeing problems with our whole
stream_ordering
approach to sorting historical messages betweendepth
.depth
is the same for all homeservers butstream_ordering
is different across homeservers and in Synapse land, just means whenever the event was processed.With all of the massaging I'm trying to do in this PR to make it work for the basic scrollback case, it won't solve all of the problems. Technically, it shouldn't really matter how
/backfill
returns things but I'm just trying to make thestream_ordering
a little more consistent from the origin to the remote homeservers in order to get the order of messages from/messages
consistent (sorted by(topological_ordering, stream_ordering)
).Even if we can backfill messages in order, it still doesn't guarantee the same
stream_ordering
(and more importantly the/messages
order) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in thestream_order
and totally throw off the sort.This will be even more of an exact use case when we add the MSC3030 jump to date API endpoint so the static archives can navigate and jump to a certain date.
The real solution around event ordering looks related to these issues where we don't rely on
depth
orstream_ordering
and it's just up to the shape of the DAG itself. Any thoughts for making this work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sort the events returned by
/backfill
before persisting them based on the DAG? We have asorted_topologically
function that can be used to sort a set of events somewhere. Or is that not enough?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may help for the specific backfill case I'm sorta optimizing here in this PR where the user is scrolling back in time sequentially.
But it doesn't help the case where you jump back in time via permalink or jump to date endpoint and start the
stream_ordering
from the middle the historical messages. It's currently important thatstream_ordering
starts from our newest historical messages at-1
and decrements to our oldest historical messages-939394
, etc. If we have the middle historical messages at-1
, then the sorting is all off.That's the most obvious case of it going wrong, but could also go wrong in
/backfill
if some other homeserver backfills in their own not Synapse way where we try to provide it in perfect order (like an older Synapse or other homeserver implementation).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of doing the sorted_topologically on the receiving side? That way it doesn't matter what order the other server returns them in.
Ideally ideally, we wouldn't use depths and instead use an "online topological sorting" algorithm (e.g. Katriel–Bodlaender algorithm) to ensure that we correctly handle disjoint chunks of the graph becoming connected. But that is fiddly to do and I'm not quite sure how things like pagination tokens would work if the ordering could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sorting on both the sending and receiving sides would be good to try to ensure best chance of it working (just for server differences between sending and receiving). But it's still not bullet-proof in the permalink case where
stream_ordering
becomes out of order.Does the chunking you worked on achieve this? #3785 Worth reviving?
Any good references for the Katriel–Bodlaender algorithm? Something more grokable over text PDF 😵 -- maybe how that algorithm works doesn't matter and the aspect I'm curious about is how we store this topological order and how it's updated if we insert an event in the middle. I assume it's more than just
depth: 9
and we update every single row before or after when we insert something in the middle?I see matrix-org/gomatrixserverlib#187 which discusses the "online topological sorting" and additionally the chunking stuff.
For reference: current pagination tokens, https://github.com/matrix-org/matrix-doc/issues/574
t8-8_0_0_0_0_0_0_0_0
seems to represent a single event already, i.e. the combo ofdepth
andstream_ordering
only point to a single event.Could pagination tokens just become the event ID itself and then you just use the direction to paginate from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not found any helpful references to that algorithm beyond compsci papers. The actual algorithm is relatively fine, the major problem is picking a data structure for the labels that satisfy the necessary properties and is efficient to store in the DB:
While they suggest just using integers, that is a bit problematic for us as that would require potentially rewriting a lot of rows in the DB. There's no reason that they need to be integers, the previous work used fractions to allow inserting nodes in between existing nodes without renumbering everything (and also operated on chunks rather than individual events, iirc).
It would need to be a set of events to handle forks in the graph, annoyingly, as you basically need to serialise the state of a breadth-first search algorithm, iyswim. I.e. if you have a DAG
A -> B -> D, A -> C -> D
then if you have a token atB
you don't know if you then need to still send downC
or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be valuable to resurrect the chunking project fwiw, its just a faff without concrete benefits to end users so its just never been prioritised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversation for topological sorting when we receive backfill events continues in #11114 (comment)
For chunking and online topological sorting, we discussed this in an out of band call: