Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add validation of format for 3pid and add validation of 3pid in admin api #7022

Closed
wants to merge 8 commits into from

Conversation

dklimpel
Copy link
Contributor

@dklimpel dklimpel commented Mar 2, 2020

Fix #6398 (Threepid whitespace is not trimmed before inserting to database)
and also add the check to admin api PUT /_synapse/admin/v2/users/<user_id>

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
  • Pull request includes a sign off
  • Code style is correct (run the linters)

Signed-off-by: Dirk Klimpel dirk@klimpel.org

@dklimpel dklimpel changed the title Add validation of format for 3pid Add validation of format for 3pid and add validation of 3pid in admin api Mar 3, 2020
@dklimpel dklimpel marked this pull request as ready for review March 3, 2020 19:19
@clokep clokep self-assigned this Mar 9, 2020
@clokep clokep self-requested a review March 9, 2020 19:17
@clokep clokep removed their assignment Mar 9, 2020
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall! I believe the adding additional failure modes to the API endpoints will need a proposal for changing the spec.

I believe the referenced issue also talks about trimming whitespace on the input data, seems like that would be pretty straightforward to add here while modifying this code?

@@ -0,0 +1 @@
Add validation of format for 3pid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to how towncrier is normally used, we've been putting in newsfragments just for the pull request number, so this file can be removed.

Copy link
Contributor Author

@dklimpel dklimpel Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this file.

# For emails, transform the address to lowercase.
# We store all email addreses as lowercase in the DB.
# (See add_threepid in synapse/handlers/auth.py)
address = threepid["address"].lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this seems related to #7021.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to wait on the result of that conversation then.

"""

if medium == "email":
regex = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this looks like it'll hit most valid e-mails, but usually things like e-mails and domains are relatively hard to validate -- where did this regular expression come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source of regex is: https://emailregex.com/

Python
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$)"

We can change the regex. That is the reason why the validation is a new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not found a good regex for msidns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I see here is that this doesn't support international domain names (or Unicode in general, although I'm not sure if Unicode is valid in the local part of an email address).

Taking a look at that site, it seems the Python regular expression is much simpler than some of the others, which I find concerning.

Poking around a bit, some other solutions seem to suggest being generous since it is hard to know the form of an email without trying to actually validate it and use something like: r"^[^@]+@[^@]+\.[^@]+$".

Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here. I left a few more comments, I'm not sure what we want to do about the lower-casing of the email addresses as that's really covered in #7021. I think the other comments are straightforward though.

This refers to issue #6398, in order to fix that more nicely would it make sense to strip whitespace on the 3pids before validating?

"""

if medium == "email":
regex = r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I see here is that this doesn't support international domain names (or Unicode in general, although I'm not sure if Unicode is valid in the local part of an email address).

Taking a look at that site, it seems the Python regular expression is much simpler than some of the others, which I find concerning.

Poking around a bit, some other solutions seem to suggest being generous since it is hard to know the form of an email without trying to actually validate it and use something like: r"^[^@]+@[^@]+\.[^@]+$".

@@ -0,0 +1 @@
Add validation of format for 3pid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this file.

# For emails, transform the address to lowercase.
# We store all email addreses as lowercase in the DB.
# (See add_threepid in synapse/handlers/auth.py)
address = threepid["address"].lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to wait on the result of that conversation then.

synapse/util/threepids.py Outdated Show resolved Hide resolved
synapse/util/threepids.py Show resolved Hide resolved
elif medium == "msisdn":
return True
else:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment above this saying that any other medium is not understood and is thus invalid? Thanks!

@@ -55,6 +55,7 @@ class Codes(object):
THREEPID_IN_USE = "M_THREEPID_IN_USE"
THREEPID_NOT_FOUND = "M_THREEPID_NOT_FOUND"
THREEPID_DENIED = "M_THREEPID_DENIED"
INVALID_THREEPID = "M_INVALID_THREEPID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me if this needs a spec change or not. @matrix-org/synapse-core any opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find an spec for these errors. Only a documenttion https://github.com/matrix-org/matrix-doc/blob/master/specification/client_server_api.rst
I can also remove this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the list of error codes at https://matrix.org/docs/spec/client_server/r0.6.0#api-standards, this would need to get added as a separate one following the process at https://matrix.org/docs/spec/proposals

synapse/rest/admin/users.py Outdated Show resolved Hide resolved
@clokep
Copy link
Member

clokep commented Mar 31, 2020

@dklimpel Thanks for the contribution! After reviewing it again I think this probably needs to get split up into a few different PRs:

  1. A PR which takes the current check_3pid_allowed and adds them to the admin API. (This should be straightforward and not controversial at all!)
  2. A PR which adds the additional check_3pid_valid_format to both the client-server API and the admin API. This will need a spec change to add the additional error code, let me know if I can help with that.
  3. The changes to lower-casing 3PIDs seems like it should be part of Fix inconsistent handling of upper and lower cases of email addresses. #7021, not this PR.

@dklimpel dklimpel marked this pull request as draft May 2, 2020 18:03
@richvdh
Copy link
Member

richvdh commented Jun 5, 2020

I'm going to close this for now since it needs some substantial changes. please do open new PRs with smaller changes! :)

@richvdh richvdh closed this Jun 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Threepid whitespace is not trimmed before inserting to database
3 participants