Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Draft: Optimize MSC2716 /batch_send endpoint v1 #11778

Closed
wants to merge 63 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
f30302d
Scratch debugging why events appear out of order on remote homeservers
MadLittleMods Oct 18, 2021
438e222
Use OrderedDict to gurantee order returned is the same as we were bui…
MadLittleMods Oct 18, 2021
4983739
Avoid constant missing prev_event fetching while backfilling
MadLittleMods Oct 19, 2021
a64bb2e
Add changelog
MadLittleMods Oct 19, 2021
260ca06
Some more trials of trying to get many many events to backfill in ord…
MadLittleMods Oct 19, 2021
886071b
Fix backfill not picking up batch events connected to non-base insert…
MadLittleMods Oct 20, 2021
477c15d
Some more debug logging
MadLittleMods Oct 21, 2021
4191f56
Remove fake prev events from historical state chain
MadLittleMods Oct 21, 2021
f39c1da
Remove debug logging
MadLittleMods Oct 21, 2021
7da8012
Remove extra event info
MadLittleMods Oct 21, 2021
69dfa16
Move to sorting the backfill events in the existing sorted
MadLittleMods Oct 21, 2021
83474d9
Put MSC2716 backfill logic behind experimental feature flag
MadLittleMods Oct 21, 2021
1263c7e
Remove unused import
MadLittleMods Oct 21, 2021
ee47878
Fix mypy lints
MadLittleMods Oct 21, 2021
5bfde7b
Merge branch 'master' into madlittlemods/return-historical-events-in-…
MadLittleMods Oct 21, 2021
2fbe3f1
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Oct 21, 2021
1d3f417
Revert back to string interpolation for SQL boolean value
MadLittleMods Oct 21, 2021
4a12304
Put empty prev_events behind new room version
MadLittleMods Oct 28, 2021
9a6d8fa
WIP: Don't include the event we branch from
MadLittleMods Oct 29, 2021
3e09d49
Revert "WIP: Don't include the event we branch from"
MadLittleMods Oct 29, 2021
5afc264
WIP: Sort events topologically when we receive them over backfill
MadLittleMods Oct 29, 2021
6ea263b
Revert "WIP: Sort events topologically when we receive them over back…
MadLittleMods Oct 29, 2021
3d387f9
WIP: Sort events topologically when we receive them over backfill
MadLittleMods Oct 29, 2021
fb8e281
Fix direction of fake edges
MadLittleMods Oct 29, 2021
c772b35
Implement backfill in handler so we can do fetching later
MadLittleMods Oct 29, 2021
e0ff66d
Fix backfill being able to cleanly branch into history and back to "l…
MadLittleMods Oct 29, 2021
76d454f
Some backfill receive sorting fixes but not using it yet
MadLittleMods Oct 30, 2021
3529449
Fix lints
MadLittleMods Oct 30, 2021
321f9ea
Move back to the old get_backfill_events and simplify backfill.
MadLittleMods Nov 2, 2021
15c3282
Remove the new backfill implementation and pull some good parts of th…
MadLittleMods Nov 2, 2021
5db717a
Always process marker events regardless if backfilled
MadLittleMods Nov 3, 2021
e96fd5c
Add comment docs
MadLittleMods Nov 3, 2021
f3b7b3e
Add better explanatory comment
MadLittleMods Nov 3, 2021
7f2105a
Remove topological sort when receiving backfill events
MadLittleMods Nov 3, 2021
246278e
Fix lints
MadLittleMods Nov 3, 2021
ec35be5
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Nov 3, 2021
bc0ba8c
Protect from no auth events for non-existent provided prev_event
MadLittleMods Nov 3, 2021
363aed6
Revert unused refactor to get PDU raw
MadLittleMods Nov 3, 2021
d771fbd
Only run the tests package to get streaming Complement output
MadLittleMods Nov 11, 2021
b559e23
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Nov 11, 2021
6b64184
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Dec 9, 2021
1d00043
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Dec 16, 2021
b071426
Plumb allow_no_prev_events through for MSC2716
MadLittleMods Dec 16, 2021
ec33a40
Make the historical events float separately from the state chain
MadLittleMods Dec 16, 2021
b99efa8
Plumb allow_no_prev_events through create_and_send_nonmember_event
MadLittleMods Dec 16, 2021
3810ae1
Clarify comments
MadLittleMods Dec 16, 2021
df2a152
Fix NPE when trying to grab event from wrong roomId (fix sytest)
MadLittleMods Dec 16, 2021
cc4eb72
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Jan 14, 2022
491e91d
Cache event creation info and context
MadLittleMods Jan 19, 2022
e0b7186
We only have to store the state_group for events created by create_ev…
MadLittleMods Jan 20, 2022
a032104
Only share auth_event_ids when sender and type matches
MadLittleMods Jan 20, 2022
780d358
Remove auth_event_id caching is it doesn't make a noticeable difference
MadLittleMods Jan 20, 2022
7396f7c
Restore auth_event_id caching as it does something looking closer at …
MadLittleMods Jan 20, 2022
55ef667
Add extra potential check
MadLittleMods Jan 25, 2022
47590bb
Merge branch 'develop' into madlittlemods/return-historical-events-in…
MadLittleMods Feb 4, 2022
a38befa
Some review optimizations
MadLittleMods Feb 4, 2022
033360a
Fix lints
MadLittleMods Feb 4, 2022
3f22e42
Fix unused lint
MadLittleMods Feb 4, 2022
e5670ff
Fix lints
MadLittleMods Feb 4, 2022
023bd3e
Don't run MSC2716 complement tests for everyone
MadLittleMods Feb 7, 2022
b3fcffb
Use same txn iteration optimization
MadLittleMods Feb 7, 2022
cb0ff0b
Merge branch 'madlittlemods/return-historical-events-in-order-from-ba…
MadLittleMods Feb 8, 2022
4d9e904
Merge branch 'develop' into madlittlemods/optimize-msc2716-v1
MadLittleMods Feb 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion scripts-dev/complement.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ fi

# Run the tests!
echo "Images built; running complement"
go test -v -tags synapse_blacklist,msc2403 -count=1 $EXTRA_COMPLEMENT_ARGS ./tests/...
go test -v -tags synapse_blacklist,msc2403,msc2716 -count=1 $EXTRA_COMPLEMENT_ARGS ./tests/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Revert before merge

42 changes: 29 additions & 13 deletions synapse/handlers/message.py
Original file line number Diff line number Diff line change
Expand Up @@ -884,6 +884,30 @@ async def create_and_send_nonmember_event(
assert ev.internal_metadata.stream_ordering
return ev, ev.internal_metadata.stream_ordering

async def strip_auth_event_ids_for_given_event_builder(
self,
builder: EventBuilder,
prev_event_ids: List[str],
auth_event_ids: List[str],
depth: Optional[int] = None,
) -> List[str]:
temp_event = await builder.build(
prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
depth=depth,
)
auth_events = await self.store.get_events_as_list(auth_event_ids)
# Create a StateMap[str]
auth_event_state_map = {(e.type, e.state_key): e.event_id for e in auth_events}
# Actually strip down and use the necessary auth events
stripped_auth_event_ids = self._event_auth_handler.compute_auth_events(
event=temp_event,
current_state_ids=auth_event_state_map,
for_verification=False,
)

return stripped_auth_event_ids

@measure_func("create_new_client_event")
async def create_new_client_event(
self,
Expand Down Expand Up @@ -925,29 +949,19 @@ async def create_new_client_event(
# For example, we don't need extra m.room.member that don't match event.sender
full_state_ids_at_event = None
if auth_event_ids is not None:
# If auth events are provided, prev events must be also.
# If auth events are provided, prev events must also be provided.
# prev_event_ids could be an empty array though.
assert prev_event_ids is not None

# Copy the full auth state before it stripped down
full_state_ids_at_event = auth_event_ids.copy()

temp_event = await builder.build(
auth_event_ids = await self.strip_auth_event_ids_for_given_event_builder(
builder=builder,
prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
depth=depth,
)
auth_events = await self.store.get_events_as_list(auth_event_ids)
# Create a StateMap[str]
auth_event_state_map = {
(e.type, e.state_key): e.event_id for e in auth_events
}
# Actually strip down and use the necessary auth events
auth_event_ids = self._event_auth_handler.compute_auth_events(
event=temp_event,
current_state_ids=auth_event_state_map,
for_verification=False,
)

if prev_event_ids is not None:
assert (
Expand Down Expand Up @@ -991,6 +1005,8 @@ async def create_new_client_event(
and full_state_ids_at_event
and builder.internal_metadata.is_historical()
):
# Add explicit state to the insertion event so the rest of the batch
# can inherit the same state and `state_group`
old_state = await self.store.get_events_as_list(full_state_ids_at_event)
context = await self.state.compute_event_context(event, old_state=old_state)
else:
Expand Down
123 changes: 98 additions & 25 deletions synapse/handlers/room_batch.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import logging
from typing import TYPE_CHECKING, List, Tuple
from typing import TYPE_CHECKING, Dict, List, NamedTuple, Tuple

from synapse.api.constants import EventContentFields, EventTypes
from synapse.appservice import ApplicationService
from synapse.events import EventBase
from synapse.events.snapshot import EventContext
from synapse.events.validator import EventValidator
from synapse.http.servlet import assert_params_in_dict
from synapse.types import JsonDict, Requester, UserID, create_requester
from synapse.util.stringutils import random_string
Expand All @@ -16,10 +19,13 @@
class RoomBatchHandler:
def __init__(self, hs: "HomeServer"):
self.hs = hs
self.config = hs.config
self.store = hs.get_datastore()
self.state_store = hs.get_storage().state
self.event_creation_handler = hs.get_event_creation_handler()
self.room_member_handler = hs.get_room_member_handler()
self.validator = EventValidator()
self.event_builder_factory = hs.get_event_builder_factory()
self.auth = hs.get_auth()

async def inherit_depth_from_prev_ids(self, prev_event_ids: List[str]) -> int:
Expand Down Expand Up @@ -290,6 +296,18 @@ async def persist_historical_events(
"""
assert app_service_requester.app_service

room_version_obj = await self.store.get_room_version(room_id)

# Map from event type to EventContext that we can re-use and create
# another event in the batch with the same type
event_type_to_context_cache: Dict[str, EventContext] = {}

# Map from (event.sender, event.type) to auth_event_ids that we can re-use and create
# another event in the batch with the same sender and type
event_sender_and_type_to_auth_event_ids_cache: Dict[
Tuple(str, str), List[str]
] = {}

# Make the historical event chain float off on its own by specifying no
# prev_events for the first event in the chain which causes the HS to
# ask for the state at the start of the batch later.
Expand All @@ -309,38 +327,93 @@ async def persist_historical_events(
"origin_server_ts": ev["origin_server_ts"],
"content": ev["content"],
"room_id": room_id,
"sender": ev["sender"], # requester.user.to_string(),
"sender": ev["sender"],
"prev_events": prev_event_ids.copy(),
}

# Mark all events as historical
event_dict["content"][EventContentFields.MSC2716_HISTORICAL] = True

event, context = await self.event_creation_handler.create_event(
await self.create_requester_for_user_id_from_app_service(
ev["sender"], app_service_requester.app_service
),
event_dict,
# Only the first event in the chain should be floating.
# The rest should hang off each other in a chain.
allow_no_prev_events=index == 0,
prev_event_ids=event_dict.get("prev_events"),
auth_event_ids=auth_event_ids,
historical=True,
depth=inherited_depth,
)
# We can skip a bunch of context and state calculations if we
# already have an event with the same type to base off of.
cached_context = event_type_to_context_cache.get(ev["type"])

assert context._state_group
if cached_context is None:
event, context = await self.event_creation_handler.create_event(
await self.create_requester_for_user_id_from_app_service(
ev["sender"], app_service_requester.app_service
),
event_dict,
# Only the first event in the chain should be floating.
# The rest should hang off each other in a chain.
allow_no_prev_events=index == 0,
prev_event_ids=event_dict.get("prev_events"),
auth_event_ids=auth_event_ids,
historical=True,
depth=inherited_depth,
)

# Normally this is done when persisting the event but we have to
# pre-emptively do it here because we create all the events first,
# then persist them in another pass below. And we want to share
# state_groups across the whole batch so this lookup needs to work
# for the next event in the batch in this loop.
await self.store.store_state_group_id_for_event_id(
event_id=event.event_id,
state_group_id=context._state_group,
)
# Cache the context so we can re-use it for events in
# the batch that have the same type.
event_type_to_context_cache[event.type] = context
# Cache the auth_event_ids so we can re-use it for events in
# the batch that have the same sender and type.
event_sender_and_type_to_auth_event_ids_cache[
(event.sender, event.type)
] = event.auth_event_ids()

# Normally this is done when persisting the event but we have to
# pre-emptively do it here because we create all the events first,
# then persist them in another pass below. And we want to share
# state_groups across the whole batch so this lookup needs to work
# for the next event in the batch in this loop.
await self.store.store_state_group_id_for_event_id(
event_id=event.event_id,
state_group_id=context._state_group,
)
else:
# Create an event with a lot less overhead than create_event
builder = self.event_builder_factory.for_room_version(
room_version_obj, event_dict
)
builder.internal_metadata.historical = True

# TODO: Can we get away without this? Can't we just rely on validate_new below?
# self.validator.validate_builder(builder)

# TODO: Do we need to sanity check the number of prev_events?

resultant_auth_event_ids = (
event_sender_and_type_to_auth_event_ids_cache.get(
(ev["sender"], ev["type"])
)
)
if resultant_auth_event_ids is None:
resultant_auth_event_ids = await self.event_creation_handler.strip_auth_event_ids_for_given_event_builder(
builder=builder,
prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
depth=inherited_depth,
)
# Cache the auth_event_ids so we can re-use it for events in
# the batch that have the same sender and type.
event_sender_and_type_to_auth_event_ids_cache[
(ev["sender"], ev["type"])
] = resultant_auth_event_ids

event = await builder.build(
prev_event_ids=event_dict.get("prev_events"),
auth_event_ids=resultant_auth_event_ids.copy(),
depth=inherited_depth,
)
# We can re-use the context per-event type because it will
# calculate out to be the same for all events in the batch. We
# also get the benefit of sharing the same state_group.
context = cached_context

# TODO: Do we need to check `third_party_event_rules.check_event_allowed(...)`?

self.validator.validate_new(event, self.config)

logger.debug(
"RoomBatchSendEventRestServlet inserting event=%s, prev_event_ids=%s, auth_event_ids=%s",
Expand Down
1 change: 1 addition & 0 deletions synapse/storage/databases/main/room_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

from typing import Optional

from typing import List
from synapse.storage._base import SQLBaseStore


Expand Down