-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Can't log in anymore (Initial Sync) #9216
Comments
@yz55 Thanks for your report. Looks like generating sync responses is choking on an event which has a string value for its The following SQL query should reveal any events that content a str-based content object (though unfortunately will take some time to run):
I would be surprised if this event were created by redacting an event. In any case, we should find which room it is in in order to:
And then work out a mitigation to prevent this in the future. |
@anoadragon453 Thank you. I got 10596 results with the SQL query, that's a bit too much. Now that you mention the event content, I remember that I executed this script #4206 for redacting events. A closer look in the script shows that the content is indeed cleared with One of the 10596 results: I don't understand why the syncing problem is not affecting every user from the SQL result. |
I did some testing with my testing server and uploaded the live database. The user was still not able to login. To see if it's because of some event, I tried to delete the whole "event_json" table. Log in still crashing. Log in was only possible after truncating the "events" table brutally (CASCADE). I think the script broke some event. The problem is now, how can I find the broken event and how could I fix this? |
Even when initial syncing, only the most recent events for each room are returned. If the affected events are old enough (aka if you only purged old events), then they won't be recalled. Users that are simply syncing (instead of initial syncing) will also only get new events.
Looking at the script, which I've copied to a gist for linkability, it seems as though the troubling lines are here:
In general I'd recommend not running random scripts from the internet on your synapse database - at least not without taking a backup first.
This was done on the testing server, right? :) To rectify this, you should just be able to replace any empty event content with First try this query, which sets the top-level update event_json
set json = jsonb_set(
json::jsonb,
'{content}',
jsonb '{}',
false
) #>> '{}' where json::jsonb->'content' = '' and event_id = '$xxx.yyy'; Check that the json of the event is what you expect (look for
If it looks good, then we can expand the query to all affected events:
|
@anoadragon453 Thank you for investigating, appreciate this! I tried the first query on the test database:
results in: |
Whoops, of course empty space is not valid JSON. Please replace:
with
|
That fixed it, awesome. Thank you! Never run scripts without testing them on a testserver - lesson learned. |
Glad to hear it worked 🙂 |
Description
Server was running fine for about 6 months, today one user could not start the application anymore. We reinstalled the application and tried to login again, but still no luck. Tried with element-web, but it stucks on "Syncing.." or "Unable to connect to Homeserver. Retrying.."
Found a similar issue that is "solved", but the solution was not working on my side. #7349
Steps to reproduce
Logs
Because I could not find any event or room id, I changed the log level to "DEBUG". This happens everytime before the error above:
Version information
My Homeserver was running on 1.21.1, I tried to upgrade to 1.25, but still the same problem.
Install method:
Debian package manager
Platform:
Ubuntu 18.04.5 LTS (Linux 19997 4.15.0-132-generic Config restructuring. #136-Ubuntu SMP Tue Jan 12 14:58:42 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux)
Python 3.6.9
psql (PostgreSQL) 10.15
The text was updated successfully, but these errors were encountered: