-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
synapse explodes with a stack overflow and logging errors #4240
Comments
Part of the problem here is of course that we loop stderr back into the logger - for which see #1539. However, I think there is another problem to be resolved here. |
A workaround for the stack overflows is to run synapse in the foreground rather than via |
#4202 also contains a stacktrace that looks related, from python3. |
This happens (on python 2) when we log a unicode string and a non-ascii byte object in the same line:
However, this doesn't explain why we end up in an infinite loop. It also doesn't explain why it affects python3 (if indeed #4202 is the same problem), because there is no implicit encode/decode on py3. |
I feel like this might be related to non-ascii event ids, which could end up in the logcontext; however I still can't quite figure out how that can make everything explode. |
After I start matrix-synapse as service (not in screen with '-n' option) - it is crash after ~23 hour...
|
@progserega when you say crash you mean... stops responding? uses 100% cpu? segfaults? |
But my script check service by execute command every 10 minutes: If command fail - script save homeserver.log and restart matrix-synapse service. |
So it is "stop responding". |
ok so maybe it's not actually an infinite recursion. Can you send me ( |
I start synapse as: I will send synapse_n_stdout_file.log to you after two days. |
I was send log to you in private chat. |
OK so sadly nothing particularly enlightening in those logs. I have figured out where @progserega 's UnicodeDecode errors are coming from (#4252), but still don't understand why his server stops responding. @progserega: it might be worth disabling the StreamHandler in your logging config and leaving it a bit longer. |
Now I start synapse as service with "-n" options (I think this is may help not crash to server). Do the "-n" doing such as disabling "console" in log config?
How I must change log config? Now my /etc/matrix-synapse/log.yaml:
|
I also experienced stack overflow error caused by this logging, and it does not want to recover by restarting. The stack overflow somehow happened at some point in time due to unknown origin. Below is some entry of the log
I tried to make a workaround by preventing the recursive call of log function in
At least now the Synapse server is running again. But I cannot identify the root cause of the error. The log for original crash is already overwritten after several restarting attempt. So I am not sure what causes it. |
I also can reproduce the fatal python error:
It takes about 10-20 minutes to trigger this again.
|
@symphorien it looks like you've cut out the important part of that error log? |
I've raised a PR which I hope will stop synapse aborting when the error happens, but it won't solve the underlying problem. The most likely cause of an error is that you've configured a rotating log handler, but the synapse process doesn't have permissions to write to the directory containing the log files. |
fixed by #8268 |
Thanks, it works, but it seems that with my logging configuration
|
@symphorien: that log config will send logging to redirected stderr (ie, back to the logging system). This will be the reason for your earlier stack overflows. You need to configure a handler: version: 1
handlers:
console:
class: logging.StreamHandler
root:
level: INFO
handlers: [console]
disable_existing_loggers: false |
We've had a bunch of reports that synapse goes into a meltdown with logging errors. For example (from #4086):
And from #4191:
(Note that both of the above reports are from python 2.7 instances. I don't know if it also applies to python 3.)
I think what's happening here is that we are attempting to log a non-ascii character, which then causes a logging error; the error is written to stderr, which is directed back into the logger, so we end up with a stack-overflow and high CPU usage.
However, I've been unable to reproduce the problem, and certainly non-ascii characters are normally logged without any problem. I don't really understand why
self._fmt % record.__dict__
throws aUnicodeDecodeError
.The text was updated successfully, but these errors were encountered: