Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse is sending events to appservices multiple times #11447

Open
andir opened this issue Nov 29, 2021 · 5 comments
Open

Synapse is sending events to appservices multiple times #11447

andir opened this issue Nov 29, 2021 · 5 comments
Labels
A-Application-Service Related to AS support T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@andir
Copy link
Contributor

andir commented Nov 29, 2021

Description

As described in the issue matrix-org/matrix-appservice-irc#1499 synapse is sending events to appservices multiple times when e.g. the federation is temporarily broken. This ends up creating duplicate messages. We've seen as much as 4x the same conversation snippet being posted.

During a discussion in https://matrix.to/#/#irc:matrix.org it was pointed out (message) that this is a spec violation.

Quote of the relevant part (https://spec.matrix.org/v1.1/application-service-api/#pushing-events):

The events sent to the application service should be linearised, as if they were from the event stream. The homeserver MUST maintain a queue of transactions to send to the application service. If the application service cannot be reached, the homeserver SHOULD backoff exponentially until the application service is reachable again. As application services cannot modify the events in any way, these requests can be made without blocking other aspects of the homeserver. Homeservers MUST NOT alter (e.g. add more) events they were going to send within that transaction ID on retries, as the application service may have already processed the events.

Especially the last part (as the application service may have already processed the events.) is relevant in this context.

I can provide logs of the appservice & synapse in the given time frames. Contact me on matrix @andi:kack.it.

Steps to reproduce

  • See steps in the linked IRC appservice issue

Version information

  • Homeserver:

If not matrix.org:

  • Version: 1.47.1

  • Install method:

  • Platform: NixOS
@squahtx squahtx added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label Dec 1, 2021
@reivilibre
Copy link
Contributor

reivilibre commented Dec 9, 2021

Is this consistently reproducible (i.e. every time you follow the steps, it happens)?

Will contact you for logs in Matrix. EDIT: acquired logs

@reivilibre reivilibre added the X-Needs-Info This issue is blocked awaiting information from the reporter label Dec 9, 2021
@reivilibre reivilibre removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Jan 7, 2022
@reivilibre
Copy link
Contributor

(I have received the logs: this is just waiting on me to get time to trawl through them, sorry!)

@reivilibre
Copy link
Contributor

I'm struggling to reproduce this issue.

I've set up 3 homeservers syn1, syn2 and syn3.
Each one has a user, all joined to a common room. Further, syn3 has an application service with one AS user joined to that room.

Using iptables, I've cut off traffic from syn2 to syn3. I then send messages between syn1 and syn2, and, as you remark, only by sending messages on syn1 do messages from syn2 appear on syn3.

I then remove the iptables rule that cuts off traffic from syn2 to syn3. If syn3 is missing any messages, they come in after a few moments.

Looking at the requests made to the application service, only one copy of each message is received.

N.B. I've also attempted to reproduce by using the federation blacklist: blacklisting syn3 from syn2 (restarting each time), but tried the iptables approach after in case the homeserver restart was the reason I couldn't reproduce.


In your case, is the network outage between the two servers lasting more than an hour?

Do you reproduce this consistently — does it happen every time?
Any idea which of the messages sent during the network issue are being duplicated? (e.g. are they all from the same server? are they only old ones?)

Finally, looking at your logs, I couldn't see which messages were duplicate — if you could help out there I would be grateful. Sorry if I'm just being stupid!

@reivilibre reivilibre added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter and removed T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jan 10, 2022
@andir
Copy link
Contributor Author

andir commented Feb 24, 2022

One important information might be that we do fairly frequently purge events older than a couple of minutes from our homeserver. Perhaps it is a combination of the network partitioning and the purging of the already seen events?

@reivilibre
Copy link
Contributor

reivilibre commented Feb 24, 2022

One important information might be that we do fairly frequently purge events older than a couple of minutes from our homeserver. Perhaps it is a combination of the network partitioning and the purging of the already seen events?

Aha, that's interesting!

The first thing I'm led to wonder is: what happens if we purge an event, then receive it again (e.g. from its origin homeserver which was not able to send it to us before)?
I don't know if it's possible that it's being persisted again and then notifying the appservice again (hence sending it out multiple times to appservices)?

This sounds like it would be worth trying to reproduce. I'll have a try soon

(edit: I had a try and have most of a jig set up, but I didn't appear to get history purging working correctly — likely a PEBKAC so I need to return to this at some stage; will assign to me for reproduction since I already have the jig set up, it may as well be me that tries to reproduce)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants