Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse flooding with DNS SRV queries for one single address (even though it has a .well-known with a port and shouldn't need to make any SRV queries) #11703

Closed
M-Stenzel opened this issue Jan 6, 2022 · 10 comments
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@M-Stenzel
Copy link

Hi team,

I am running matrix-synapse 1.49.2 on an opensuse Linux system (non dockerized), official suse rpm and "default" configuration.

To do some housekeeping I checked for the top 10 DNS queries and found that my synapse server does 414.268 (fourhundredandfourteenthousandtwohundredsixtyeight) queries / 24 hours, that is 4.79... / second for only one single address ("_matrix._tcp.sfunk1x.com").

Now I wonder how come?

When restarting the matrix-synapse process the queries are gone (temporarily?)

Is this a bug or an attack or ...

Martin.

P. S. I run https://github.com/AdguardTeam/AdGuardHome as my local name server lookup solution but did not find any unusual with this installation.

P. P. S. In homeserver.log I can find this... (only one entry of the following).

...
2022-01-06 19:00:01,066 - synapse.federation.sender.per_destination_queue - 356 - WARNING - federation_transaction_transmission_loop-65284 - TX [sfunk1x.com] Failed to send transaction: Failed to send request: ConnectError: An error occurred while connecting: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.
2022-01-06 19:35:12,822 - synapse.federation.sender.per_destination_queue - 356 - WARNING - federation_transaction_transmission_loop-67301 - TX [sfunk1x.com] Failed to send transaction: Failed to send request: TimeoutError: Timed out after 60s
...

I do not know if this directs to anything meanful.

@M-Stenzel
Copy link
Author

M-Stenzel commented Jan 7, 2022

Queries

This is additional information regarding the issue:
As I already feared the massive queries reappeared, this time at 10 in the evening (maximum peak at 11 p. m.). After midnight the spook dissolved.
N. B. The graph shows a 24 hour time frame.

What triggers these DNS query events? I am lost...

@reivilibre
Copy link
Contributor

I think you'd want to use a caching DNS resolver locally (for example, systemd-resolved is such a caching resolver that is included on Ubuntu and some other distributions).

I have no experience with OpenSuse but a quick search suggests nscd (Name Service Caching Daemon) is typically in use on OpenSuse.

/etc/resolv.conf will usually tell you which DNS server you're using — I would expect it to be a loopback address if you're using a caching daemon.


However there might be a problem here in Synapse. _matrix._tcp.sfunk1x.com doesn't have any SRV records and I notice that we have a TODO saying we should cache DNS name errors...

except DNSNameError:
# TODO: cache this. We can get the SOA out of the exception, and use
# the negative-TTL value.
return []

That sounds like something we should consider addressing.

However, the server in question does have a well-known file ({"m.server": "matrix.sfunk1x.com:443"} for future reference).
As it includes a port literal, this is not meant to undergo an SRV lookup according to my reading of the spec (https://spec.matrix.org/latest/server-server-api/#resolving-server-names).

Looking into this.

@DMRobertson
Copy link
Contributor

That sounds like something we should consider addressing.

Sydent could make use of this too; c.f. matrix-org/matrix-python-common#2

@reivilibre reivilibre changed the title matrix flooding with dns srv queries for one single address Synapse flooding with DNS SRV queries for one single address (even though it has a .well-known with a port and shouldn't need to make any SRV queries) Jan 7, 2022
@reivilibre
Copy link
Contributor

Another thing that might help us see what's going on: would you be able to put your logging level to INFO (or even DEBUG) at a time whilst this is going on, and report back with some of the logs?

@reivilibre reivilibre added the X-Needs-Info This issue is blocked awaiting information from the reporter label Jan 7, 2022
@M-Stenzel
Copy link
Author

Another thing that might help us see what's going on: would you be able to put your logging level to INFO (or even DEBUG) at a time whilst this is going on, and report back with some of the logs?

Well, again I checked for the queries done in the last 24 hours and this specific query did not happen within this time frame. I will check again and will increase the log verbosity level.

What I wonder, why is it this very specific record which is asked for?

@M-Stenzel
Copy link
Author

I think you'd want to use a caching DNS resolver locally (for example, systemd-resolved is such a caching resolver that is included on Ubuntu and some other distributions).

I have no experience with OpenSuse but a quick search suggests nscd (Name Service Caching Daemon) is typically in use on OpenSuse.

/etc/resolv.conf will usually tell you which DNS server you're using — I would expect it to be a loopback address if you're using a caching daemon.

Well I chose AdGuard due to the fact that it does not only resolve, but additionally does filter addresses.
And... if I understand correctly this is something similar to treating symptoms but not the underlying disease.... Lookups in this magnitude should not happen, although you might "satisfy" by using a caching server... am I right? Thanks for brainstorming!

@reivilibre
Copy link
Contributor

And... if I understand correctly this is something similar to treating symptoms but not the underlying disease.... Lookups in this magnitude should not happen

Yeah, I think you're probably right here — seeing 5 requests a second to the same server sounds unusual.
However there is a middle ground where it's still sensible to use a caching resolver at slower rates, too.

The main thing that might help here is having some more detailed logs so we can get an idea of what's happening.

@M-Stenzel
Copy link
Author

I checked my latest entries and found that now

_matrix._tcp.matrix.ryouko.eu

was queried 11.714 times w/in the last 24 hours.

@G2G2G2G
Copy link

G2G2G2G commented Jan 14, 2022

Man I get like 80k DNS queries per day, it's a major synapse issue goes right with the notification presence issues like #9478 imo, cuz it just spams constantly.

You definitely should run a dns caching something, even pi-hole would solve it.

@M-Stenzel
Copy link
Author

I think you'd want to use a caching DNS resolver locally (for example, systemd-resolved is such a caching resolver that is included on Ubuntu and some other distributions).

I have no experience with OpenSuse but a quick search suggests nscd (Name Service Caching Daemon) is typically in use on OpenSuse.

/etc/resolv.conf will usually tell you which DNS server you're using — I would expect it to be a loopback address if you're using a caching daemon.

However there might be a problem here in Synapse. _matrix._tcp.sfunk1x.com doesn't have any SRV records and I notice that we have a TODO saying we should cache DNS name errors...

except DNSNameError:
# TODO: cache this. We can get the SOA out of the exception, and use
# the negative-TTL value.
return []

That sounds like something we should consider addressing.

However, the server in question does have a well-known file ({"m.server": "matrix.sfunk1x.com:443"} for future reference). As it includes a port literal, this is not meant to undergo an SRV lookup according to my reading of the spec (https://spec.matrix.org/latest/server-server-api/#resolving-server-names).

Looking into this.

This is an update:
I followed your advice and removed AdGuardHome and installed
Unbound (validating, recursive, and caching DNS(SEC) resolver).
Maybe I should not care too much...

From my side the topic may be closed.

@richvdh richvdh closed this as completed Feb 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

5 participants