-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Improve API/event validation in synapse #8445
Comments
Small note about 3: Additionally; Should we use ( |
I will not create a dedicated issue, but for instance Synapse CS API will accept if a client send a state event with un-specified value. For instance using "foo" for the "join_rule" field of https://matrix.org/docs/spec/client_server/r0.6.1#m-room-join-rules This break compatibility on some clients (ex: on Element Android this Event is ignored and not displayed in the timeline), and I'm not sure what happen server side regarding the room properties and what should be the fallback we display to the user in this case. At least on the CS API I would expect to get an error 400 when trying to send such malformed event. |
I'd also like to mention pydantic here as an interesting option, as it provides speedy (C-extension-backed) validation of objects. |
I had done some testing with this and found it to be quite a bit slower (from memory about 100x slower). I had put together a benchmark script at https://gist.github.com/clokep/17064b16a9b471d09feb3e61851886af Results from running this on my 2016 MBP with Python 3.9.6 (times in nanoseconds):
Raw results
|
@clokep I think your benchmark just does the validation and nothing else? I'd be interested to see how those numbers compare when they're part of request handling as a whole. |
Yes, that's what I was attempting to benchmark. 😄 (It also is completely fake -- the schemas of most of our endpoints are much more complicated.) As I had said in chat, I didn't spend too much time on this and was attempting to get a gut of whether it would make sense to drop a lot more time into it or not. |
Working it |
One case I ran into - Synapse accepts an object into the |
Background
We've recently encountered a number of bugs in which malformed (or at least, unexpected) data has caused Synapse or clients to misbehave in some way. These bugs stem from the fact that, faced with a given datastructure, you cannot rely on it having the expected format. For example:
displayname
in anm.room.member
event is not a string (User directory gets stuck when encountering non-string display name #8220, Sqlite Error: Error binding parameter 1 - probably unsupported type (when joining a room) #8340)?origin
field (Failing to join / send_join rooms: FrozenEventV3 has no 'origin' property #8319)?Such bugs are disruptive, and in extreme cases could progress beyond "denial of service" to "security threat", and it might be possible to avoid them by validating data at the point of entry to Synapse. We've recently discussed this in some depth within the core Synapse team; this issue serves to record some of our thoughts on the topic, including promising areas for further development.
Introduction
There are actually multiple reasons why it would be useful to improve validation of data within Synapse. These include:
isinstance
checks everywhereThere are multiple related, but different, things we mean when we talk about validation. At a high level, these can be broken down into:
Let's consider these in turn.
API validation
Given that we already have JSON schema specifications for our APIs, this is theoretically relatively straightforward (see, for example c39941c, which is a proof-of-concept applying this to a single endpoint), though it's certain to uncover a large number of places where clients are relying on non-spec-compliant behaviour. We also have to be wary, especially on the SS API, to ensure backwards compatibility.
Doing this would certainly reduce the number of false 500 errors, which as above brings a number of advantages. It might also reduce the occurrences of bugs due to bad events (e.g. non-string display names) since updated Synapses will correctly reject them on the CS API; however, it will absolutely not fix those bugs, since such malformed data could still be received from buggy or malicious servers over federation.
Event validation
Validating events, particularly those received over federation, is quite hard as:
For example, an event with malformed
m.relates_to
data can’t just be dropped as, according to the authentication rules of all room versions to date, it is a valid event, even though its payload (that Synapse still interprets) is invalid, since annotations were added after the current room versions were specced.The main problem we have currently is that events contain a mix of properties which are validated on receipt (
auth_events
,prev_events
, etc) alongside a bunch of untrusted data that we cannot assume nor assert the types of. Then, when we come to access event data, we need to remember to add checks for any untrusted data. While room versions allow us to add stricter schemas, relying on that approach will always be a case of playing catchup as we’ll want to use new features in older room versions where possible. (See also MSC2801 on this subject.)Ideally, therefore, we’d try and add some tooling to make it easier to statically assert (or at least tell easily in the code) whether the fields you’re accessing on events (and other data types) have been validated or not. Hopefully it is possibly to do something with mypy here, however it will require some experimentation here.
Conclusions
In summary there are a few paths going forward:
Both 1 and 2 would be good things to do, and may reduce the number of occurrences of bugs, but won’t actually fix the class of bugs we’re seeing due to unexpected formats of various keys in events. However, while paths 1 and 2 are something that we know how to do, path 3 has a lot more unknowns attached to it, but ultimately is the only option that will fully prevent the class of bugs we’re seeing.
The text was updated successfully, but these errors were encountered: