Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail-fast behavior on blocked matrix.org/vector.im connections #9828

Closed
jrick opened this issue May 23, 2019 · 16 comments · Fixed by matrix-org/matrix-react-sdk#3067
Closed
Assignees
Labels
S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect X-Regression X-Release-Blocker

Comments

@jrick
Copy link

jrick commented May 23, 2019

Description

I believe since afdaca0, riot is failing fast on connection errors to matrix.org. I use umatrix, and had previously had matrix.org blocked (because I self host synapse and it wasn't necessary), but upon reloading /develop I got the message "Homeserver URL does not appear to be a valid Matrix homeserver", and according to umatrix, no other third party requests were made.

This fail-fast behavior could also potentially serve an issue for local clients that are unable to reach matrix.org's synapse, even when the configured synapse server is reachable.

Steps to reproduce

  • Use riot.im (any web release) with a non-matrix.org server
  • Install umatrix plugin
  • Load riot.im/develop

Version information

  • Platform: web

For the web app:

  • Browser: Chromium
  • OS: OpenBSD
  • URL: riot.im/develop
@jrick jrick added the T-Defect label May 23, 2019
@turt2live
Copy link
Member

The way this is worded I'm not sure if it's a bug or feature request, but I assume a bug report. The behaviour is caused by Riot validating the config.json at launch prior to even knowing what it should be doing (loading user information, etc).

I'd love for us to not validate the homeserver only when needed, however that raises some problems. For instance, if a user launches the app and is logged out due to external factors, the app won't have a homeserver configured for them to log back into. We could validate the config when someone visits the login page again, however we were doing this before with massive inconsistencies.

We've currently opted to fail-fast, preferring that people configure their Riot config.json correctly. The desktop app is an interesting case though - it'll almost always be pointing at matrix.org and might not work for people due to closed environments, etc. Although the desktop app is configurable in this regard, people are unlikely to do that.

Running with the assumption that this is a bug report, I think this needs a @lampholder to make a product decision on how we can improve the fail-fast configuration check to not be so invasive.

@turt2live turt2live added the X-Needs-Product More input needed from the Product team label May 23, 2019
@jrick
Copy link
Author

jrick commented May 23, 2019

It was a bug report. I can imagine this ending poorly in cases where matrix.org is blocked by a firewall, or you have a major security incident...

@turt2live
Copy link
Member

see also #9185

@turt2live turt2live changed the title Fail-fast behavior on blocked matrix.org connections Fail-fast behavior on blocked matrix.org/vector.im connections May 23, 2019
@jryans
Copy link
Collaborator

jryans commented May 24, 2019

Flagging @lampholder to take a look when he's back.

@turt2live
Copy link
Member

in summary for @lampholder:

Problem: /develop (and similar instances) default to matrix.org however people use custom homeservers. When people are already logged in using a custom homeserver, and matrix.org is down/inaccessible, it doesn't make sense to spend our time checking for matrix.org. However, if the user were to be logged out due to external factors, we'd want a guest account/login page that defaults to matrix.org.

Possible solutions:

  1. Move configuration check to after the session loads but before we do anything with that user. This is complicated because some things are immediately out of our control (sudden logout, the app starts hitting the homeserver no matter what we say, etc). We would also lose the opportunity for fast-failure on configuration problems, or risk duplicating validation logic. This assumes we have fixed Syntax errors in the config shouldn't give a white screen of welcome #9519 using a similar "Your riot is borked, contact your sysadmin" red box.
  2. Run away from our problems. This is basically saying "this isn't a bug" and moving on.
  3. Reworking how we do validation entirely. Maybe it makes sense for us to duplicate homeserver checks on the login, welcome, etc pages? Maybe there's a third way of config validation which magically solves problems?
  4. Waiting to see how much of an issue this becomes in a release. This has the downside of potentially very bad PR, but it is an option at this time.

It's worth noting that this issue is further amplified by the nature of our application. We are privacy focused, and quite a few people don't see matrix.org (the domain) as embracing this for various reasons we are not going to discuss here. This leads to people (like the OP, presumably) blocking matrix.org intentionally.

There's also the part of being decentralized, so the app makes requests to popular domains often enough to trip the "could be harvesting user information" alarms of many adblock/privacy browser extensions.

@lampholder
Copy link
Member

@turt2live is anything about the above specific to matrix.org, or could it be said for any riot instance with a configured default HS that is different to the HS the user actually signed in on?

@turt2live
Copy link
Member

It'd be the same problem for configured-but-using-different-homeserver instances too. matrix.org just happens to be the largest victim/instigator.

@lampholder
Copy link
Member

Also, I wonder if the 'right' behaviour might be that if the default HS can't be reached, we show the regular login/register UX but with a big red rectangle on top saying which the default HS is and that it can't be reached, giving people the option to connect to a different HS if they want to.

@jrick
Copy link
Author

jrick commented May 28, 2019

Not to sidetrack this, but for reference, I only wished to block matrix.org because it was not necessary, not because I distrust it.

@jrick
Copy link
Author

jrick commented May 28, 2019

Also, I wonder if the 'right' behaviour might be that if the default HS can't be reached, we show the regular login/register UX but with a big red rectangle on top saying which the default HS is and that it can't be reached, giving people the option to connect to a different HS if they want to.

Would this DTRT if the client was previously logged in to a different homeserver than the riot configuration specified, by still logging them in seamlessly, or would the same error be shown and the user must manually re-authenticate to their other homeserver?

turt2live added a commit that referenced this issue May 29, 2019
Only applies if the user appears to be logged in. If the user is not logged in, we scream loudly.

This is a temporary measure for #9828

Fixes #9835
@dbkr
Copy link
Member

dbkr commented Jun 4, 2019

I just ran into this on my dev setup and it took me ages to work out it was because I just hadn't started my dev HS yet. My suggestions for fixing this are:

  1. Do only validation of the config file itself at startup: don't make any network requests. Do the .well-known lookup from whatever info we have whenever we do a request (we could cache it for a few minutes, but we can't just do the lookup at app startup as it might change anyway).
  2. Change 'Error loading Riot' to 'Invalid config file' and remove the first paragraph, ("This installation of Riot"). I think the important info is getting lost in the text, and I think this would convey all the same info much more succinctly.
  3. Make the error message visually different from the boilerplate text in some way: just putting it in <pre> tags would be fine.
  4. Make the error message more informative, eg. 'https://example.org' could not be contacted is much more helpful than, Homeserver URL does not appear to be a valid Matrix homeserver: it tells you more specifically why it doesn't like it and what URL it's trying.

@dbkr
Copy link
Member

dbkr commented Jun 4, 2019

Conclusions from IRL discussion with @turt2live and @lampholder:

  • There are 2 problems: the well-known check and the liveness checks.
  • The .well-known check is necessary only if the config uses default_server_name. This is essentially deprecated in favour of default_server_config, so if the config specifies this, we'll still have to do the .well-known lookup at app start time, but in no other situation. (Maybe we can also log an error to the console telling you to use default_server_config instead?)
  • We split the validation of the server config into two parts, one where we checks it looks sensible and another where it does liveness checks. We do the former at app start but then do the liveness checks when we try to use the server (so probably when we start a login/registration/forgot-password flow or when we change servers).
  • There's a bug where if you try to change the server and the versions request fails on the URL you gave, it just silently ignores you and stays as what it was before, so fix this.
  • Fix Room Directory and Chat with Riot Biot from welcome broken when HS disables guest access #9546 by hiding the buttons if we can't register a guest account, which will cover us for the case where we can't contact the default HS URL.

@lampholder
Copy link
Member

Sorry I had to dash - do we need design for the feedback when the server's not responding? Something like (but with better wording and better everything):

hs_not_available

@turt2live
Copy link
Member

Oh yes. I meant to ping Nad for that...

@turt2live
Copy link
Member

So I don't forget: plan wrt to design is to iterate on it tomorrow, but for now I'll be focusing on just getting the error to the right page.

@turt2live turt2live added S-Major Severely degrades major functionality or product features, with no satisfactory workaround X-Regression and removed X-Needs-Product More input needed from the Product team labels Jun 5, 2019
turt2live added a commit to matrix-org/matrix-js-sdk that referenced this issue Jun 5, 2019
turt2live added a commit to matrix-org/matrix-react-sdk that referenced this issue Jun 5, 2019
This performs liveliness checks on the auth pages to try and show a friendlier error. Earlier checks in the app startup are expected to not block the app from loading on such failures.

See element-hq/element-web#9828
turt2live added a commit that referenced this issue Jun 5, 2019
Also warn about deprecated config option usage.

See #9828
@nadonomy
Copy link
Contributor

nadonomy commented Jun 5, 2019

From some internal chat, we settled on:

Your Riot is misconfigured
Ask your Riot admin to check your config for incorrect or duplicate entries.

Cannot reach homeserver
Ensure you have a stable internet connection, or get in touch with a server admin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect X-Regression X-Release-Blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants