Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3862: event_match (almost) anything #3862

Closed
wants to merge 10 commits into from
91 changes: 91 additions & 0 deletions proposals/3862-event-match-almost-anything.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# MSC3862: event_match (almost) anything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a client perspective, I don't think either this MSC or #3758 is particularly troublesome, though personally I do lean more towards this MSC as it shows what push rules "should" be: matches on events irrespective of data type (there's no reason to punish someone for "42" when they meant 42, at least not in this subsystem). However, the appeal of a clean slate to "fix" push rules with 3758 is equally as high.

Both are equally complex to implement and would be considered a breaking change: here, clients won't agree with servers on what the notification counts are (not that they do already...[1]), and with 3758 it's a whole new condition type. In essence, both MSCs would be writing a new condition type into the client code-wise.

[1]Element Android for instance doesn't handle keyword notifications, leading to nasty surprises when the user moves from mobile to desktop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches on events irrespective of data type (there's no reason to punish someone for "42" when they meant 42, at least not in this subsystem)

Do you really think so? It seems to me that this sort of "sloppy" matching could lead to things like #2223, where the push rules end up matching something different to the rest of the system. (That was the cause of security vulnerabilities in synapse and matrix-js-sdk.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my primary concern with #3758 is that a new push rule condition type might be more likely to break clients than extending the existing one. If @turt2live's opinion, as a client developer, is that there's not much to choose, my inclination is to go with the more complete, and more elegant, #3758.

Copy link
Member

@ara4n ara4n Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer #3758, as it avoids casting everything to strings, and the dotted notation is more powerful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches on events irrespective of data type (there's no reason to punish someone for "42" when they meant 42, at least not in this subsystem)

Do you really think so? It seems to me that this sort of "sloppy" matching could lead to things like #2223, where the push rules end up matching something different to the rest of the system. (That was the cause of security vulnerabilities in synapse and matrix-js-sdk.)

It's not a strong opinion, tbh. For things like notifications it'd be a nice-to-have to support typeless matches, leaving the type validation for elsewhere (whether an event can be rendered, etc). It has been the cause of some security vulnerabilities, but push rules in general are an open avenue for multiple different kinds of implementation-specific attack to begin with. It would be a challenge to keep the "sloppiness" to the push rules though: maintaining a higher standard for other subsystems to counteract the typelessness is not traditionally something that's gone well.

richvdh marked this conversation as resolved.
Show resolved Hide resolved

The [`event_match`] condition kind on push rules currently only
matches string values. This is limiting because some event types have
important information in boolean or integer fields that currently cannot
be matched against.

[MSC3758] tries to address this by introducing a new condition kind
`exact_event_match`. The current proposal takes a different route by
generalising the existing `event_match` condition kind to work on any
primitive value.

## Proposal

In order to apply an `event_match` condition, homeservers transform all
primitive values in the event body into a lowercase string
representation. In particular this means

- Strings are converted into case-insensitive strings (this already happens today)
- `true` / `false` become `"true"` / `"false"`
- Integers such as `123` become `"123"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify that it's a requirement to encode "as written" or whatever terminology is required to say that encoding to 123e^7 is invalid (as some int-to-string functions like to do on large numbers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An added concern is I don't think any amount of documentation will save us from folks using language-specific features to encode numbers, which can easily lead to improper notation. The implementation likely wouldn't notice until it's too late, and generally cause the same sorts of problems we saw with Canonical JSON: implementations differing in how strict they are to the spec, leading to mismatched notifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify that it's a requirement to encode "as written" or whatever terminology is required to say that encoding to 123e^7 is invalid (as some int-to-string functions like to do on large numbers).

This made me think, what if somebody used the exponential form in the JSON? E.g. should "pattern": 123e7 be converted into "123e7"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it's allowable by Canonical JSON, but if it is then ugh/yay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. It isn't. I've added a corresponding sub-bullet to explain the conversion.

- Implementations should take care to perform the conversion without adding
formatting (such as thousands separators) and without shortening the result
(e.g. through exponential representation). The final string should match the
pattern `^-?(0|[1-9][0-9]*)$`. This aligns with the number format enforced by
[canonical JSON] itself.
- `null` becomes `"null"`

This transformation is applied regardless of how deeply the value is
nested but not if the value is part of an array. Arrays – which are not
primitive values – and their elements are not converted. This means
that, as before, it's not possible to match anything inside an array.

For the avoidance of doubt, the exemplary event body

{
"a": "string",
"b": true,
"c": {
"d": 1,
"e": null
},
"f": [1]
}

is transformed into

{
"a": "string",
"b": "true",
"c": {
"d": "1",
"e": "null"
},
"f": [1]
}

After the transformation the condition is evaluated on the string
representations.

## Potential issues

It is possible for JSON values of different data types to map to the
same string representation (e.g. `0` and `"0"` or `true` and `"true"`).
These values are not possible to differentiate with the generalised
`event_match` condition kind as described in this proposal. However, in
practice it is rare for two events to differ only in the data type of a
particular field. If that happens, it’s most probably due to the
evolution of an unstable event type in which case it’s likely desirable
to treat both data types identically anyway. The same applies to
situations where clients accidentally sent the wrong type.

## Alternatives

As mentioned earlier, [MSC3758] provides an alternative by introducing
a new condition kind `exact_event_match`. While being more powerful,
turt2live marked this conversation as resolved.
Show resolved Hide resolved
this option is also more invasive, complicating the push rule API. The
existing specification of `event_match` already allows for exact matches
on strings and the proposal at hand offers a lightweight generalisation
to other primitive types that is both easy to implement (see
e.g. [matrix-org/synapse#13466]) and use.

## Security considerations

None.

[`event_match`]: https://spec.matrix.org/v1.3/client-server-api/#conditions-1
[canonical JSON]: https://spec.matrix.org/v1.3/appendices/#grammar
[spec]: https://spec.matrix.org/v1.3/client-server-api/#conditions-1
[MSC3758]: https://github.com/matrix-org/matrix-spec-proposals/pull/3758
[matrix-org/synapse#13466]: https://github.com/matrix-org/synapse/pull/13466