Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add support for no_proxy and case insensitive env variables #9372

Merged
merged 9 commits into from
Feb 26, 2021

Conversation

tzyl
Copy link
Contributor

@tzyl tzyl commented Feb 10, 2021

Changes proposed in this PR

  • Add support for the no_proxy and NO_PROXY environment variables
  • Extract env variables using urllib's getproxies/getproxies_environment which supports lowercase + uppercase, preferring lowercase, except for HTTP_PROXY in a CGI environment

This does contain behaviour changes for consumers so making sure these are called out:

  • no_proxy/NO_PROXY is now respected
  • lowercase https_proxy is now allowed and taken over HTTPS_PROXY

Related to #9306 which also uses ProxyAgent

Signed-off-by: Timothy Leung tim95@hotmail.co.uk

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
  • Pull request includes a sign off
  • Code style is correct (run the linters)

@clokep
Copy link
Member

clokep commented Feb 10, 2021

@tzyl Overall looks good, give a shout when you think this is ready for a review!

@tzyl
Copy link
Contributor Author

tzyl commented Feb 10, 2021

@clokep I think this is ready for review! I have one question that I'm not sure how to solve though. I changed from using getproxies -> getproxies_environment in the last commit because it does some platform specific fallbacks which I don't think proxy_bypass_environment works with. However, in this case mypy doesn't have the types for it do you have any suggestions here? (assuming continuing to use *_environment from urllib)

synapse/rest/media/v1/preview_url_resource.py:28: error: Module 'urllib.request' has no attribute 'getproxies_environment'  [attr-defined]
--
  | synapse/server.py:38: error: Module 'urllib.request' has no attribute 'getproxies_environment'  [attr-defined]
  | Found 2 errors in 2 files (checked 379 source files)
  | ERROR: InvocationError for command /workdir/.tox/mypy/bin/mypy (exited with code 1)
  | ___________________________________ summary ____________________________________
  | ERROR:   mypy: commands failed
  | 🚨 Error: The command exited with status 1

@clokep
Copy link
Member

clokep commented Feb 10, 2021

@clokep I think this is ready for review! I have one question that I'm not sure how to solve though. I changed from using getproxies -> getproxies_environment

Is that a thing? It isn't documented: https://docs.python.org/3/library/urllib.request.html#urllib.request.getproxies

Can you point to documentation on this?

For mypy you can either ignore the types or manually specify them.

@tzyl
Copy link
Contributor Author

tzyl commented Feb 11, 2021

The relevant source code for all the proxy functions is this block:
https://github.com/python/cpython/blob/bdb941be423bde8b02a5695ccf51c303d6204bed/Lib/urllib/request.py#L2487-L2776

Default: getproxies -> getproxies_environment
Darwin: getproxies -> getproxies_environment or getproxies_macosx_sysconf
NT: getproxies -> getproxies_environment or getproxies_registry

I was initially worried about those fallbacks and whether they would be incompatible with proxy_bypass_environment but looking a bit deeper they do return the same format as in the documented getproxies function so I think it should be ok to continue to use getproxies which is better as it is in the documentation!

For what it's worth getproxies_environment git blames to 13 years ago and is used in requests so I think although undocumented it would probably be worthy of a mypy type as I think it also types (lint doesn't complain?) some other undocumented functions like proxy_bypass_environment which is used in this PR:
https://github.com/python/cpython/blame/bdb941be423bde8b02a5695ccf51c303d6204bed/Lib/urllib/request.py#L2488
https://github.com/psf/requests/blob/8c211a96cdbe9fe320d63d9e1ae15c5c07e179f8/requests/compat.py#L59

@clokep
Copy link
Member

clokep commented Feb 11, 2021

I don't think we should be using undocumented methods from CPython, we have some users using PyPy for example so we really need to be sticking to what is available in the standard library.

@tzyl
Copy link
Contributor Author

tzyl commented Feb 11, 2021

Totally reasonable and I defer to your expertise! So my main priorities I would like here are:

  • Consistent behaviour with standard conventions at minimum within the python ecosystem (urllib, requests) and ideally with all software
  • Doesn't reinvent the wheel and rewrite tried and tested code to risk unexpected inconsistencies in behaviour

In this case from the two functions being used in this PR getproxies and proxy_bypass_environment:

  • getproxies is documented
  • proxy_bypass* is not documented but mypy does not complain about this
    • mypy has a comment in their repo about undocumented proxy_bypass although I don't specifically see proxy_bypass_environment in that file
    • requests assumes these exist

If we're not comfortable using the undocumented proxy_bypass functions from urllib do you have any suggestions on what you would prefer? (e.g. reimplementing that logic in synapse)

@clokep
Copy link
Member

clokep commented Feb 11, 2021

  • mypy has a comment in their repo about undocumented proxy_bypass although I don't specifically see proxy_bypass_environment in that file

Should we use proxy_bypass instead of proxy_bypass_environment then? This seems to match what you found above that it is probably just a passthru.

@tzyl
Copy link
Contributor Author

tzyl commented Feb 11, 2021

Good question, unfortunately proxy_bypass_environment is the only one that accepts an optional input of proxies. The others take the hostname only and default to discovering the proxies internally. In that case we could use proxy_bypass only in ProxyAgent but it would mean we can't specify a custom no_proxy which is inconsistent with the behaviour of http_proxy and https_proxy being passed in as parameters to ProxyAgent.

@tzyl
Copy link
Contributor Author

tzyl commented Feb 15, 2021

Hey @clokep, any more thoughts here? It feels like to me the two most obvious paths forward currently are:

  • Continue using urllib's proxy_bypass_environment which mypy accepts but is not documented
  • Reimplement this function in synapse

@clokep
Copy link
Member

clokep commented Feb 16, 2021

I haven't looked at this deeply enough to have an opinion unfortunately. I'm going to put this into the team's review queue for someone to look deeper into!

@clokep clokep requested a review from a team February 16, 2021 13:33
@anoadragon453 anoadragon453 self-requested a review February 17, 2021 15:54
@anoadragon453
Copy link
Member

anoadragon453 commented Feb 19, 2021

While undocumented, PyPy does seem to implement urllib.request.proxy_bypass_environment, which is probably good enough?

(env) ➜  test_pypy python       
Python 3.7.9 (7e6e2bb30ac5, Nov 18 2020, 10:55:52)
[PyPy 7.3.3-beta0 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> from urllib.request import proxy_bypass_environment
>>>> proxy_bypass_environment
<function proxy_bypass_environment at 0x00007f839d8aa7a0>
>>>> proxy_bypass_environment("test.com", proxies={"no": "test.com,unused.com"})
True
>>>> proxy_bypass_environment("test.com")
False

@ShadowJonathan
Copy link
Contributor

Isn't urllib.request pure python? If so, then i think support can be pretty much guaranteed, documented or not.

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing this a bit in #synapse-dev, we're happy to use the undocumented functions, given they should work fine in PyPy.

As for which function to use, things become a little clearer after looking at the source for these functions. proxy_bypass/getproxies_environment will pull known proxies from environment variables, whereas proxy_bypass/getproxies are platform-specific functions which will first search the environment for proxies, and then if none are found, it will try to pull known active proxies from the system.

So I'm happy about using *_environment methods here. The latter sounds useful for a future PR, but for this PR we should remain backwards compatible and just pull proxy information from environment variables. I could see us creating a config option for Synapse to attempt to use the system proxy, but that can be done in a separate PR.

synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
tests/http/test_proxyagent.py Outdated Show resolved Hide resolved
@tzyl
Copy link
Contributor Author

tzyl commented Feb 22, 2021

Thanks for the review and clear direction @anoadragon453! I'll work on making those changes when I find time this week :)

@tzyl tzyl requested a review from anoadragon453 February 25, 2021 12:30
@tzyl
Copy link
Contributor Author

tzyl commented Feb 25, 2021

@anoadragon453 ready for another pass! I've made the following changes as requested:

  • Use getproxies_environment directly to avoid platform specific fallbacks for further breaks in behaviour
  • Change to a single use_proxy flag rather than specify all proxy variables separately
  • Change tests to mock patch env variables

Interestingly, mypy no longer complains about getproxies_environment which it did when I first opened this PR, I wonder what changed?

@@ -58,6 +59,9 @@ class ProxyAgent(_AgentBase):

pool (HTTPConnectionPool|None): connection pool to be used. If None, a
non-persistent pool instance will be created.

use_proxy (bool): Whether proxy settings should be discovered and used
from conventional environment variables. Defaults to false.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point in using ProxyAgent without use_proxy=True? I kept this default here to match the same behaviour as before if you didn't specify either of http_proxy or https_proxy but seems like it may be redundant now that this is combined into a single parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Outside of proxying (and a single debug log) ProxyAgent doesn't provide much on top of twisted's default Agent (as it shouldn't!).

I think we could try just using an Agent if use_proxy=False is passed to SimpleHttpClient.

Similarly in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather leave this out of this PR for now to avoid making more significant changes to the code but sounds like it could be worth a follow up!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds sensible, OK 🙂

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. This is looking great! A few more things to clean up and I think this will be ready to merge.

synapse/http/client.py Outdated Show resolved Hide resolved
@@ -58,6 +59,9 @@ class ProxyAgent(_AgentBase):

pool (HTTPConnectionPool|None): connection pool to be used. If None, a
non-persistent pool instance will be created.

use_proxy (bool): Whether proxy settings should be discovered and used
from conventional environment variables. Defaults to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Outside of proxying (and a single debug log) ProxyAgent doesn't provide much on top of twisted's default Agent (as it shouldn't!).

I think we could try just using an Agent if use_proxy=False is passed to SimpleHttpClient.

Similarly in tests.

synapse/http/proxyagent.py Outdated Show resolved Hide resolved
tests/http/test_proxyagent.py Outdated Show resolved Hide resolved
tests/http/test_proxyagent.py Outdated Show resolved Hide resolved
tests/http/test_proxyagent.py Outdated Show resolved Hide resolved
@tzyl tzyl requested a review from anoadragon453 February 26, 2021 13:25
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for iterating on it with us!

@anoadragon453 anoadragon453 merged commit ddb2402 into matrix-org:develop Feb 26, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Mar 14, 2021
Synapse 1.29.0 (2021-03-08)
===========================

Note that synapse now expects an `X-Forwarded-Proto` header when used with a reverse proxy. Please see [UPGRADE.rst](UPGRADE.rst#upgrading-to-v1290) for more details on this change.


No significant changes.


Synapse 1.29.0rc1 (2021-03-04)
==============================

Features
--------

- Add rate limiters to cross-user key sharing requests. ([\#8957](matrix-org/synapse#8957))
- Add `order_by` to the admin API `GET /_synapse/admin/v1/users/<user_id>/media`. Contributed by @dklimpel. ([\#8978](matrix-org/synapse#8978))
- Add some configuration settings to make users' profile data more private. ([\#9203](matrix-org/synapse#9203))
- The `no_proxy` and `NO_PROXY` environment variables are now respected in proxied HTTP clients with the lowercase form taking precedence if both are present. Additionally, the lowercase `https_proxy` environment variable is now respected in proxied HTTP clients on top of existing support for the uppercase `HTTPS_PROXY` form and takes precedence if both are present. Contributed by Timothy Leung. ([\#9372](matrix-org/synapse#9372))
- Add a configuration option, `user_directory.prefer_local_users`, which when enabled will make it more likely for users on the same server as you to appear above other users. ([\#9383](matrix-org/synapse#9383), [\#9385](matrix-org/synapse#9385))
- Add support for regenerating thumbnails if they have been deleted but the original image is still stored. ([\#9438](matrix-org/synapse#9438))
- Add support for `X-Forwarded-Proto` header when using a reverse proxy. ([\#9472](matrix-org/synapse#9472), [\#9501](matrix-org/synapse#9501), [\#9512](matrix-org/synapse#9512), [\#9539](matrix-org/synapse#9539))


Bugfixes
--------

- Fix a bug where users' pushers were not all deleted when they deactivated their account. ([\#9285](matrix-org/synapse#9285), [\#9516](matrix-org/synapse#9516))
- Fix a bug where a lot of unnecessary presence updates were sent when joining a room. ([\#9402](matrix-org/synapse#9402))
- Fix a bug that caused multiple calls to the experimental `shared_rooms` endpoint to return stale results. ([\#9416](matrix-org/synapse#9416))
- Fix a bug in single sign-on which could cause a "No session cookie found" error. ([\#9436](matrix-org/synapse#9436))
- Fix bug introduced in v1.27.0 where allowing a user to choose their own username when logging in via single sign-on did not work unless an `idp_icon` was defined. ([\#9440](matrix-org/synapse#9440))
- Fix a bug introduced in v1.26.0 where some sequences were not properly configured when running `synapse_port_db`. ([\#9449](matrix-org/synapse#9449))
- Fix deleting pushers when using sharded pushers. ([\#9465](matrix-org/synapse#9465), [\#9466](matrix-org/synapse#9466), [\#9479](matrix-org/synapse#9479), [\#9536](matrix-org/synapse#9536))
- Fix missing startup checks for the consistency of certain PostgreSQL sequences. ([\#9470](matrix-org/synapse#9470))
- Fix a long-standing bug where the media repository could leak file descriptors while previewing media. ([\#9497](matrix-org/synapse#9497))
- Properly purge the event chain cover index when purging history. ([\#9498](matrix-org/synapse#9498))
- Fix missing chain cover index due to a schema delta not being applied correctly. Only affected servers that ran development versions. ([\#9503](matrix-org/synapse#9503))
- Fix a bug introduced in v1.25.0 where `/_synapse/admin/join/` would fail when given a room alias. ([\#9506](matrix-org/synapse#9506))
- Prevent presence background jobs from running when presence is disabled. ([\#9530](matrix-org/synapse#9530))
- Fix rare edge case that caused a background update to fail if the server had rejected an event that had duplicate auth events. ([\#9537](matrix-org/synapse#9537))


Improved Documentation
----------------------

- Update the example systemd config to propagate reloads to individual units. ([\#9463](matrix-org/synapse#9463))


Internal Changes
----------------

- Add documentation and type hints to `parse_duration`. ([\#9432](matrix-org/synapse#9432))
- Remove vestiges of `uploads_path` configuration setting. ([\#9462](matrix-org/synapse#9462))
- Add a comment about systemd-python. ([\#9464](matrix-org/synapse#9464))
- Test that we require validated email for email pushers. ([\#9496](matrix-org/synapse#9496))
- Allow python to generate bytecode for synapse. ([\#9502](matrix-org/synapse#9502))
- Fix incorrect type hints. ([\#9515](matrix-org/synapse#9515), [\#9518](matrix-org/synapse#9518))
- Add type hints to device and event report admin API. ([\#9519](matrix-org/synapse#9519))
- Add type hints to user admin API. ([\#9521](matrix-org/synapse#9521))
- Bump the versions of mypy and mypy-zope used for static type checking. ([\#9529](matrix-org/synapse#9529))
anoadragon453 added a commit to matrix-org/synapse-dinsic that referenced this pull request Mar 22, 2021
…om mainline to dinsic (#93)

This PR is simply porting matrix-org/synapse#9372 to dinsic.

I also had to bring in matrix-org/synapse#8821 and matrix-org/synapse#9084 for this code to work properly - a sign that we should merge mainline into dinsic again soon.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants