-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add validation of format for 3pid and add validation of 3pid in admin api #7022
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall! I believe the adding additional failure modes to the API endpoints will need a proposal for changing the spec.
I believe the referenced issue also talks about trimming whitespace on the input data, seems like that would be pretty straightforward to add here while modifying this code?
changelog.d/6398.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Add validation of format for 3pid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to how towncrier is normally used, we've been putting in newsfragments just for the pull request number, so this file can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this file.
# For emails, transform the address to lowercase. | ||
# We store all email addreses as lowercase in the DB. | ||
# (See add_threepid in synapse/handlers/auth.py) | ||
address = threepid["address"].lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this seems related to #7021.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to wait on the result of that conversation then.
synapse/util/threepids.py
Outdated
""" | ||
|
||
if medium == "email": | ||
regex = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance this looks like it'll hit most valid e-mails, but usually things like e-mails and domains are relatively hard to validate -- where did this regular expression come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source of regex is: https://emailregex.com/
Python
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"
We can change the regex. That is the reason why the validation is a new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not found a good regex for msidns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue I see here is that this doesn't support international domain names (or Unicode in general, although I'm not sure if Unicode is valid in the local part of an email address).
Taking a look at that site, it seems the Python regular expression is much simpler than some of the others, which I find concerning.
Poking around a bit, some other solutions seem to suggest being generous since it is hard to know the form of an email without trying to actually validate it and use something like: r"^[^@]+@[^@]+\.[^@]+$"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here. I left a few more comments, I'm not sure what we want to do about the lower-casing of the email addresses as that's really covered in #7021. I think the other comments are straightforward though.
This refers to issue #6398, in order to fix that more nicely would it make sense to strip whitespace on the 3pids before validating?
synapse/util/threepids.py
Outdated
""" | ||
|
||
if medium == "email": | ||
regex = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue I see here is that this doesn't support international domain names (or Unicode in general, although I'm not sure if Unicode is valid in the local part of an email address).
Taking a look at that site, it seems the Python regular expression is much simpler than some of the others, which I find concerning.
Poking around a bit, some other solutions seem to suggest being generous since it is hard to know the form of an email without trying to actually validate it and use something like: r"^[^@]+@[^@]+\.[^@]+$"
.
changelog.d/6398.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Add validation of format for 3pid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this file.
# For emails, transform the address to lowercase. | ||
# We store all email addreses as lowercase in the DB. | ||
# (See add_threepid in synapse/handlers/auth.py) | ||
address = threepid["address"].lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to wait on the result of that conversation then.
elif medium == "msisdn": | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment above this saying that any other medium is not understood and is thus invalid? Thanks!
@@ -55,6 +55,7 @@ class Codes(object): | |||
THREEPID_IN_USE = "M_THREEPID_IN_USE" | |||
THREEPID_NOT_FOUND = "M_THREEPID_NOT_FOUND" | |||
THREEPID_DENIED = "M_THREEPID_DENIED" | |||
INVALID_THREEPID = "M_INVALID_THREEPID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me if this needs a spec change or not. @matrix-org/synapse-core any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find an spec for these errors. Only a documenttion https://github.com/matrix-org/matrix-doc/blob/master/specification/client_server_api.rst
I can also remove this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the list of error codes at https://matrix.org/docs/spec/client_server/r0.6.0#api-standards, this would need to get added as a separate one following the process at https://matrix.org/docs/spec/proposals
@dklimpel Thanks for the contribution! After reviewing it again I think this probably needs to get split up into a few different PRs:
|
I'm going to close this for now since it needs some substantial changes. please do open new PRs with smaller changes! :) |
Fix #6398 (Threepid whitespace is not trimmed before inserting to database)
and also add the check to admin api
PUT /_synapse/admin/v2/users/<user_id>
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Signed-off-by: Dirk Klimpel dirk@klimpel.org