Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

TypeError in synapse.notifier #7774

Closed
auscompgeek opened this issue Jul 2, 2020 · 13 comments · Fixed by #7880
Closed

TypeError in synapse.notifier #7774

auscompgeek opened this issue Jul 2, 2020 · 13 comments · Fixed by #7880
Assignees
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)

Comments

@auscompgeek
Copy link
Contributor

Description

I'm seeing this every so often on my Sentry dashboard/in my homeserver.log. Not sure what triggers it.

2020-07-02 00:08:37,707 - synapse.notifier - 306 - ERROR - None- Failed to notify listener
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/synapse/notifier.py", line 304, in on_new_event
    user_stream.notify(stream_key, new_token, time_now_ms)
  File "/usr/lib/python3/dist-packages/synapse/notifier.py", line 104, in notify
    self.current_token = self.current_token.copy_and_advance(stream_key, stream_id)
  File "/usr/lib/python3/dist-packages/synapse/types.py", line 401, in copy_and_advance
    new_id = int(getattr(new_token, key))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Version information

  • Homeserver: vovo.id.au
  • Version: 1.15.0
  • Install method: apt, from Debian unstable
  • Platform: KVM
@clokep clokep added the z-bug (Deprecated Label) label Jul 2, 2020
@clokep
Copy link
Member

clokep commented Jul 2, 2020

We saw a burst of this on matrix.org from ~Jun 17 - Jun 22 as well.

@erikjohnston
Copy link
Member

@babolivier
Copy link
Contributor

@auscompgeek Having a quick look at this, are you running Synapse with workers or as a monolith?

@auscompgeek
Copy link
Contributor Author

No workers here.

@babolivier
Copy link
Contributor

I had a quick skim over what could be causing this issue, it looks like this happens when the notifier gets notified via on_new_event on the device_list_key key with a None position - which shouldn't happen. Given there's no workers involved in this specific case, the two main suspects are:

def notify_user_signature_update(self, from_user_id, user_ids):
"""Notify a user that they have made new signatures of other users.
Args:
from_user_id (str): the user who made the signature
user_ids (list[str]): the users IDs that have new signatures
"""
position = yield self.store.add_user_signature_change_to_streams(
from_user_id, user_ids
)
self.notifier.on_new_event("device_list_key", position, users=[from_user_id])

and

position = yield self.store.add_device_change_to_streams(
user_id, device_ids, list(hosts)
)
for device_id in device_ids:
logger.debug(
"Notifying about update %r/%r, ID: %r", user_id, device_id, position
)
room_ids = yield self.store.get_rooms_for_user(user_id)
# specify the user ID too since the user should always get their own device list
# updates, even if they aren't in any rooms.
yield self.notifier.on_new_event(
"device_list_key", position, users=[user_id], rooms=room_ids
)

In both cases, position is populated from a StreamIdGenerator (either using get_next_mult or get_next), so whoever picks this issue up should start by looking for a reason this generator would return None.

@babolivier babolivier added the z-p2 (Deprecated Label) label Jul 7, 2020
@auscompgeek
Copy link
Contributor Author

I turned up synapse.handlers.device to DEBUG overnight. It doesn't look like line 467 is getting hit in this scenario.

The last log message I get now before the exception is "Attempting to resync the device list for [mxid]".

@auscompgeek
Copy link
Contributor Author

Right, so I think add_device_change_to_streams is getting called with an empty device_ids list from notify_device_update from user_device_resync.

@babolivier
Copy link
Contributor

I did some poking around and it doesn't look related to user_device_resync - I ran some tests to make sure it was behaving well with an empty list and/or null cross-signing keys and couldn't reproduce.

Is the Attempting to resync... line appearing before every exception? If so, is it always the same MXID? If not, it's worth investigating the other suspect as per my comment above, which happens to be a code path hit client-side, which client (and version) are you running?

@richvdh
Copy link
Member

richvdh commented Jul 10, 2020

@babolivier it looks to me like this will happen whenever notify_device_update is called with an empty device list, so I don't quite see how user_device_resync can work when the device list is empty.

It might be easier just to chuck a condition into notify_device_update so that it does nothing when the device list is empty.

@auscompgeek
Copy link
Contributor Author

Looking through a random sample, a lot of these seem to involve (different) matrix.org users, and the others involve other homeservers which I know/seem to recall are running a federation sender worker. I don't even share an encrypted room with any of these MXIDs. I wonder if synapse is doing something wrong on the sending side here as well.

The last exception I got was when I was offline (i.e. no E2EE clients calling /sync).

I got another batch of these from matrix.org yesterday (9 July) between 02:26:43 and 02:30:02 UTC (the MXIDs in question appear bot-like in nature in this batch).

@auscompgeek
Copy link
Contributor Author

Taking a closer look I managed to find one originating from a Synapse without workers. Sorry for the red herring.

Still doesn't explain why I would receive device list updates if my homeserver never cared about their device lists in the first place though. IIUC, device list updates are only sent to homeservers that have previously fetched device lists, right?

@babolivier
Copy link
Contributor

babolivier commented Jul 11, 2020

@babolivier it looks to me like this will happen whenever notify_device_update is called with an empty device list, so I don't quite see how user_device_resync can work when the device list is empty.

But my point is that I've been calling user_device_resync with MXIDs for which the device list is empty (both in unit tests and on my live HS using the manhole) and I absolutely can't reproduce this bug. I've also tried calling notify_device_update directly with an empty device list through the manhole and still can't reproduce on my live Synapse.

>>> device_list_updater.user_device_resync("@notfoundthisdoesntexist:labs.abolivier.bzh")
<Deferred #3>
Deferred #3 called back: {'user_id': '@notfoundthisdoesntexist:labs.abolivier.bzh', 'stream_id': 406, 'devices': [], 'master_key': None, 'self_signing_key': None}
>>> device_handler.notify_device_update("@notfoundthisdoesntexqsdfqsdfist:labs.abolivier.bzh", [])
<Deferred at 0x7f1156fbe518 current result: None>
>>> 

And I've got no error in my Synapse logs.

It might be easier just to chuck a condition into notify_device_update so that it does nothing when the device list is empty.

It's probably something we should do regardless of whether this is the real cause of the issue yeah.

@richvdh
Copy link
Member

richvdh commented Jul 13, 2020

I've also tried calling notify_device_update directly with an empty device list through the manhole and still can't reproduce on my live Synapse.

to repro the bug, you'll have to do it with a real user which actually has a request waiting for a notification via on_new_event.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label) z-p2 (Deprecated Label)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants