large initialsync may never complete #11611

matrixbot · 2023-12-19T06:41:56Z

This issue has been migrated from #11611.

For a sufficiently large account, it's possible that we will never be able to serialise the response before the client times out. This has recently been made a lot worse by #11478 which makes the serialisation process much slower, but it was a potential problem even before that.

The intention is that it shouldn't matter too much if it takes a long time to generate a sync response, because once we finally manage it, we will cache the response, which we can then serve up quickly the next time the client makes a request. However, this oversimplifies the situation.

We can basically consider there are four steps in serving a /sync response:

Figure out which rooms and events should be in the response. The end result of this step is a SyncResult object, which is cached.
"Serialise" each event in the response. This is where we turn the events from the internal python objects (FrozenEvent) into plain JSON dicts which match the format specified by the C-S API; in particular we add the unsigned.age field (which is the main reason this is done after caching); as of Dummy issue #11478 it is also the point at which we add bundled aggregations. The result of this step is a (potentially huge) JsonDict (ie, a dict containing other simple json-serializable values)
JSON-serialise the response dict to a bytes.
Finally, we start streaming the response to the client.

Since the result of step 1 is cached (at least on matrix.org), we can effectively neglect it here (though it is important to remember that it is only cached for 2 minutes... more on that later); so the steps of concern to us are steps 2 and 3, which must complete in less than the client's request timeout.

Some notes on timeouts:

Cloudflare times out requests if the server doesn't start sending its response within 100 seconds of the request.
However, haproxy doesn't notice this until we start trying to send back the response - so Synapse still sees the request as active until our haproxy times out the request, which happens after 180 seconds.
Before we start each of steps 2 and 3, we check that the client connection is still up, and skip that step if not. (Step 4 is aborted as soon as the connection drops.)

In short: we need to serialise the FrozenEvents to dicts (step 2) and JSON-serialise the dicts to bytes within 100 seconds.

That should be plenty though, right? Wrong.

The first problem is that #11478 makes step 2 take 224 seconds (assuming all the relevant aggregations are already cached from the database. The first time through, when we have do database activity it takes over 800 seconds):

https://jaeger.int.matrix.org/trace/1a723cd9877a1cc9

(Note that in the above trace, the client connection drops at 180 seconds, so we skip the JSON-serialisation step.)

Still, even with that reverted, we have problems. Consider this example:

https://jaeger.int.matrix.org/trace/2600e970dda6bc65

This request reached haproxy at 12:37:09, but we didn't start processing it until 12:37:56 (a delay of 47 seconds). We then manage encode the response in 58 seconds, and start streaming the response at 12:38:52, but this doesn't reach haproxy until 12:39:37 (a further 45 seconds). By that time, a total of 148 seconds has elapsed, and Cloudflare has long gone.

It's hard to trace those 47s and 45s delays precisely, but event serialisation is a synchronous process which blocks the reactor for up to a minute (eg, the preceding request has an encode_response call which blocks the reactor between 12:36:33 and 12:37:29) - after which we have a bunch of other work to catch up on. Prior experience suggests that all bets are off once the reactor starts ticking that slowly:

Ideas on things which might make this better:

Counter-intuitively, having a long timeout in haproxy might be making things worse, because it means we're doing work serving responses that will be thrown away at Cloudflare. Possibly we should decrease the timeout in haproxy to (say) 120 seconds.
We certainly shouldn't be blocking the reactor for 45 seconds at a time; at the very least, we should stick some sleep(0) calls in sync.encode_response.
It would be great to find a way to put most of the event serialisation work inside the SyncResponse cache.
- Alternatively, maybe we can do it more lazily, so that we can start JSON-encoding without having to event-serialise the entire response.
Dummy issue #10905 is also part of the problem here: See discussion at Encode JSON responses on a thread in C, mk2 matrix-org/synapse#10905 (comment)).

The text was updated successfully, but these errors were encountered:

matrixbot closed this as completed Dec 19, 2023

matrixbot changed the title ~~Dummy issue~~ large initialsync may never complete Dec 21, 2023

matrixbot added S-Major A-Sync T-Defect labels Dec 21, 2023

matrixbot reopened this Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large initialsync may never complete #11611

large initialsync may never complete #11611

matrixbot commented Dec 19, 2023 •

edited

Loading

large initialsync may never complete #11611

large initialsync may never complete #11611

Comments

matrixbot commented Dec 19, 2023 • edited Loading

matrixbot commented Dec 19, 2023 •

edited

Loading