Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable SSL on forwarded requests #169

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

rcthomas
Copy link
Contributor

This is my initial pass at #89. Happy to start iterating.

The main source of uncertainty for me is around check_hostname on the SSL context. I tested without that set and then fretted about it for 3 weeks now, then added it as something you could configure. I looked at both what JupyterHub and Dask Distributed do, and they both have actual opinions. I went with making the default look like what Dask does because that's closest to our set of assumptions. I'm not sure the "True" case has everything else needed to work.

ssl_options = None
if serverproxy.https:
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile)
ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a client certificate and key? If so could it be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was intending that the serverproxy.https configurable would make it optional but perhaps you mean something else? Thanks for taking a look!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably me misunderstanding how this works.

Is this so that jupyter-server-proxy can proxy a notebook URL such as https://example.org/notebook/proxied-service/ through to a backend service that's only accessible over https? Or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concrete case we are thinking of is where a Dask cluster is running on a HPC system and we turn on TLS to encrypt the traffic. The Bokeh dashboard can also be configured to use TLS, and we'd like to proxy that dashboard through jupyter-server-proxy.

Copy link
Member

@manics manics Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not following this up earlier.

For example you'd use it to proxy a site such as https://grafana.mybinder.org/ (though internal) via jupyter? This means:

  1. jupyter-server-proxy connects to https://grafana.mybinder.org/. For SSL validation this requires the root CA public certificates (either bundled with Python if you're using an externally provided cert, or if it's an internal CA then configured in jupyter-server-proxy).
  2. This is then passed through to the user's browser via jupyter notebook. If the notebook is protected by HTTPS (e.g. by Nginx) anything proxied by jupyter-server-proxy should also be protected since it all goes through the notebook server.

What I don't understand is where the requirement for a private key serverproxy.keyfile arises.

@jhgoebbert
Copy link

This is IMHO an important pull-request. Any chance it can be merged to master?

@betatim
Copy link
Member

betatim commented Feb 24, 2020

There are questions about how this should work/be implemented and what the exact goal/use-case is. I think to get the review process of this moved forward we need an example that people can try out to help understand the context. At least I think that would help sort out some of the questions that have come up. If you can prepare one that we can try at home that would be a great contribution.

# Configure SSL support
ssl_options = None
if serverproxy.https:
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we setup the context like this, do we also get all the "normal" CAs or only the one in the cafile?

"Normal" CAs in this case would be those that your OS trusts.

@rcthomas
Copy link
Contributor Author

We have a situation where compute nodes on our supercomputer are normally insulated from inbound connections from the outside world. The inside just has 10.x.x.x interfaces. Notebooks normally run outside those batch-scheduled nodes and users can keep notebook servers around persistently on those "outside" nodes, which are really just repurposed login nodes. These nodes do have an interface on the same network as the compute nodes. We use jupyter-server-proxy to forward things through to/from the computes, it's pretty neat.

Our users want to run Dask on the compute nodes in job allocations and talk to them from their persistent JupyterLab sessions, and we'd like to help them run SSL on the cluster and the dashboard. Yes it's an option to set up the scheduler and dashboard alongside the notebook server on the front end but network-wise this is less optimal. We are in the process of implementing JupyterHub end-to-end SSL as well. The Dask dashboard is the main focus for us with this pull request. There is a way to run the dashboard with SSL but the requests that jupyter-server-proxy needs to forward also need SSL support, hence the SSL context here (it is not for the scheduler+workers itself). The main goal is for each user to be able to be sure the entire paths are running under SSL.

@manics
Copy link
Member

manics commented Feb 25, 2020

Thanks for expanding your use case. I think a diagram of the requests between each of the components would be helpful. In particular I still don't understand why jupyter-server-proxy requires a private key. Typically the key would only be required for SSL if you're running a server. jupyter-server-proxy isn't a server, it's an add-on to jupyter-notebook which is the server. If the aim is to encrypt traffic between the dashboard and jupyter wouldn't that need to be configured on the dashboard side?

@rcthomas
Copy link
Contributor Author

I'll try to get a diagram going.

Indeed the aim is to encrypt traffic between the dashboard and Jupyter, and it is configured on the dashboard side. But the client (jupyter-server-proxy) needs an SSL context to talk to it. That's how I envisioned this working at least...

@rcthomas
Copy link
Contributor Author

rcthomas commented Mar 3, 2020

diagram

The hub and hub proxy are running on some servers managed by Rancher (Docker containers), and a load balancer not shown does the front-end termination. The hub starts Jupyter notebooks on a "login" node using a custom spawner. Now suppose a user submits a job with Slurm and starts up a Dask cluster inside the job allocation, including a scheduler, the dashboard, and a bunch of workers. These nodes running Dask are compute nodes and are not publicly accessible, so to see the dashboard running there we need Jupyter server proxy, in fact we needed #154 for that to work.

We'd like to be able to talk to the dashboard if it's using TLS, and I think that means we need the SSL context to be set up as a client in jupyter-server-proxy.

@manics
Copy link
Member

manics commented Mar 14, 2020

Thanks for the diagram. I agree you only need a SSL client. I'm not convinced it requires a key though. For example suppose you wanted to proxy https://grafana.mybinder.org/ (there's no reason why this would be different from proxying an internal URL), which key and certificate would you use?

@rcthomas
Copy link
Contributor Author

I wasn't thinking about that? My specific use case was for internal-ssl style self-signed certs (similar to how the Hub does it, e.g. via certipy or possibly even trying to re-use some of the infrastructure). At this point I suppose there are a huge number of other use cases we might want to worry about but I didn't realize I'd signed up to handle all those...

If it's just a simple change of making an argument optional or something I don't see an issue at all but if we need a more fully-featured PR than that perhaps someone with more experience with SSL needs to do it?

@manics
Copy link
Member

manics commented Mar 14, 2020

A self-signed cert would require the internal CA or the public self-signed certificate to verify it, but it still shouldn't require the private key.

@manics
Copy link
Member

manics commented Mar 14, 2020

See for example this Stack Overflow when using the requests module:
https://stackoverflow.com/questions/30405867/how-to-get-python-requests-to-trust-a-self-signed-ssl-certificate

Only the server's public certificate should be required

@rcthomas
Copy link
Contributor Author

OK so you mean

ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile or None)

If the keyfile is left as empty it falls back to the default for load_cert_chain https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain

@manics
Copy link
Member

manics commented Mar 28, 2020

According to the httpclient.HTTPRequest documentation you can pass the server certificate(s) ca_certs directly to the tornado.httpclient.HTTPRequest() call. I think this would be clearer than using the SSL context object, if only because the documentation is clearer.

What do you think?

@rcthomas
Copy link
Contributor Author

When originally drafting this PR I investigated how JupyterHub does this. There's a function there called make_ssl_context(). It's more general purpose than what I'm doing here, because it handles both server and client auth. This is called, for instance, to create the default for hub_http_client.

During that read-up I also saw the documentation you're referencing but when I saw that JupyterHub was already doing this one way, I figured if it was good enough for JupyterHub to use the SSL context then maybe it's good enough for jupyter-server-proxy!

@manics
Copy link
Member

manics commented Mar 31, 2020

Fair enough! Let's stick with the ssl-context for now then.

In addition to the load_cert_chain method you mentioned there's also a load_default_certs method, so how does the following sound:

  • Internal SSL certificate (your use case): ssl_context.load_cert_chain(serverproxy.certfile, serverproxy.keyfile or None)
  • Public SSL certificate (e.g. proxying a public webserver, or an internal service that uses a certificate from a recognised CA): ssl_context.load_default_certs()

@yuvipanda
Copy link
Contributor

Just wanted to bump this to see where this is :) There's a merge conflict already, would be great to get this in before there are too many of those!

Thanks for the thorough review, @manics.

@rcthomas
Copy link
Contributor Author

I can add an internal_ssl configuration option in that would decide between load_cert_chain and load_default_certs unless there is a better idea. Before pushing further changes to the PR here I'd like to know if that's what @manics has in mind, an explicit configuration option. I'll try to deal with the merge conflict and whatever guidance there is by the end of the week.

@manics
Copy link
Member

manics commented Apr 21, 2020

Instead of another property internal_ssl how about only use the existing properties:

  • https: True, other properties unset: use load_default_certs
  • https: True, one or more other properties set: use load_cert_chain?

@rcthomas
Copy link
Contributor Author

Thanks @manics I'll try that, probably Wed or Thu for the update.

@rcthomas
Copy link
Contributor Author

I think there might be an issue with keyfile being set but not certfile. Man I hate this SSL stuff.

@@ -330,7 +334,11 @@ def proxy_request_headers(self):
def proxy_request_options(self):
'''A dictionary of options to be used when constructing
a tornado.httpclient.HTTPRequest instance for the proxy request.'''
<<<<<<< HEAD

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcthomas this might be your culprit, pesky git 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh bother, thought I got them all...

Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persevering with this! I've made a couple of minor suggestions, but it generally looks good 👍

I'm having trouble testing this though, could you give me an example config? I tried to add an integration test in a PR against your fork:
rcthomas#1
but it's failing. I think I've traced it down to this line:

url = 'http://localhost:{}'.format(self.port)

# Configure SSL support
ssl_options = None
if serverproxy.https:
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile)
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=serverproxy.cafile or None)

The default value of serverproxy.cafile is "". If this is handled as None then it should use the system CA:

cafile, capath, cadata represent optional CA certificates to trust for certificate verification, as in SSLContext.load_verify_locations(). If all three are None, this function can choose to trust the system’s default CA certificates instead.
https://docs.python.org/3.6/library/ssl.html#ssl.create_default_context

help="""
Whether to use SSL for forwarded client requests.

If this is set to `True` then you should provide a path to an SSL key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If this is set to `True` then you should provide a path to an SSL key,
If this is set to `True` then you can optionally provide a path to an SSL key,

@manics
Copy link
Member

manics commented Apr 26, 2020

I'll see if I can push some changes so that my test will pass.

@manics
Copy link
Member

manics commented Apr 28, 2020

I think SSL is cursed!

I managed to get my test passing:
https://github.com/jupyterhub/jupyter-server-proxy/compare/master...manics:pr169?expand=1

To do this I moved the SSL configuration from the top-level global config to a per-server config (so https can be enabled for individual proxied servers).

I then (finally!?) understood @rcthomas's use case, which is to enable SSL proxying using something like /proxy/REMOTE_HOST:port/. However setting the SSL config globally would force https on all ports not just individual ones.

Suggestion: Use https for ports that aren't using a manage process by extend the remote-host format to optionally allow a protocol:

  • /proxy/<PROTOCOL>:<REMOTE_HOST>:<PORT>/
  • /proxy/absolute/<PROTOCOL>:<REMOTE_HOST>:<PORT>/

Alternatively instead of <PROTOCOL>:<REMOTE_HOST>:<PORT> this could be <PROTOCOL>://<REMOTE_HOST>:<PORT> as this more naturally represents a server, but this will depend on whther a double // in a url path is handled correctly.

Current URL handlers:

(url_path_join(web_app.settings['base_url'], r'/proxy/(.*):(\d+)(.*)'),
RemoteProxyHandler, {'absolute_url': False, 'host_whitelist': host_whitelist}),
(url_path_join(web_app.settings['base_url'], r'/proxy/absolute/(.*):(\d+)(.*)'),
RemoteProxyHandler, {'absolute_url': True, 'host_whitelist': host_whitelist}),
(url_path_join(web_app.settings['base_url'], r'/proxy/(\d+)(.*)'),
LocalProxyHandler, {'absolute_url': False}),
(url_path_join(web_app.settings['base_url'], r'/proxy/absolute/(\d+)(.*)'),
LocalProxyHandler, {'absolute_url': True}),

@yuvipanda what do you think?

@rcthomas
Copy link
Contributor Author

Hey @manics thanks for the help during my absence here, I am back. Sigh. SSL. Sigh.

You make a good point about how I would be forcing SSL on every remote_host:port combo. In my case I'm mostly worried (that's too strong a word, concerned maybe?) about one use very dominant use case for me (Dask). That said there are other things out there and I don't want to make things horrible for those users just because.

I have no opinion on :// (except that as an emoji it summarizes my feelings about SSL); I do remember that before I encountered the {remote_host}:{port} PR I had just implemented it with a / in between. That is, I don't think we have to make it look like a URL in there though I see how it makes it clearer what's going on. I'd be happy with just /'s between everything.

@yuvipanda
Copy link
Contributor

To perform some thread necromancy, I think forcing SSL on a per-server basis would mean we don't have to mangle URLs, right? I want folks to be able to use the dask dashboard without having to know if it's over https or http - especially without us redirecting properly. So we can give admins control over this on a per-server basis, and force https whenever it is set.

I think that's what I see in https://github.com/jupyterhub/jupyter-server-proxy/compare/master...manics:pr169?expand=1? If so, I think we can just merge this and iterate.

Thank you for persevering with this, @rcthomas and @manics!

@jhgoebbert
Copy link

jhgoebbert commented Oct 25, 2020

We would have a similar need for SSL support in jupyter-server-proxy.
Here it is not the dask dashboard but a user´s noVNC process on a multi-user-system, similar to @yuvipanda jupyter-desktop-service, which provides the user with a remote desktop in its browser.
The user´s noVNC process runs on one of our nodes and provides access to the user´s vncserver through http/https. Without https between noVNC and jupyter-server-proxy it would be a pretty insecure setup on a multi-user cluster, wouldn't it?

@davidbrochart
Copy link

Hi,
I want to use jupyter-server-proxy to forward some requests to a web server that runs in the kernel. As I understand it, because mybinder uses https, my web server must also use it, otherwise there is some mixing secure with insecure stuff issue.
I tried to set up SSL in my web server, with a self-signed certificate, but it is not accepted. Would this PR solve my issue? Do you plan to merge it soon?
Thanks!

@manics
Copy link
Member

manics commented Nov 1, 2020

@davidbrochart https is only required for external connections between your browser and mybinder. If your webserver is running inside your Docker container then plain http should work since communication will be fully within the container (between the notebook process and your webserver). External connections will automatically be https.

@davidbrochart
Copy link

@davidbrochart https is only required for external connections between your browser and mybinder.

But I think this is the case, my server is running in the kernel which is inside the Docker container, but the requests are made from the browser. So I understand that my server must support SSL, and that it should get the certificate from mybinder, but I'm not sure if this PR will solve my issue.

Comment on lines +207 to +247
keyfile = Unicode(
"",
help="""
Path to optional SSL key.

Use with `https=True` and `certfile`.
""",
config=True
)

certfile = Unicode(
"",
help="""
Path to optional SSL cert.

Use with `https=True` and `keyfile`.
""",
config=True
)

cafile = Unicode(
"",
help="""
Path to optional CA file.

Use with `https=True`.
""",
config=True
)

https = Bool(
False,
help="""
Whether to use SSL for forwarded client requests.

If this is set to `True` then you should provide a path to an SSL key,
cert, and CA. Use this if the proxied service expects to service
requests over SSL.
""",
config=True
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get all these SSL informations from mybinder (which uses SSL)?

@rcthomas
Copy link
Contributor Author

rcthomas commented Nov 4, 2020

@davidbrochart If the server you are talking about is running in the container alongside the notebook server, you shouldn't need to worry about SSL, you just proxy that server's port normally. You only care if the server's off somewhere else and you want to encrypt traffic between your notebook and the service. That's what this was originally about, in particular the case where the network path between jupyter-server-proxy and the back-end service is not going over the internet

@davidbrochart
Copy link

Yes the server is running in the container, and I would be fine with no SSL, but because the browser is going to make requests from an https URL, I am forced to serve through https also, otherwise there is some mixed-content issue. Does it make sense?

@manics
Copy link
Member

manics commented Nov 4, 2020

@davidbrochart
I think we'll need to see a diagram of all you components and communication between them.

In order to keep this discussion focused on the PR would you mind re-asking your question and adding the additional information on the Jupyter Community Forum https://discourse.jupyter.org/ instead and I'll follow up there?

In doing so you'll also be helping others working with jupyter-server-proxy, I'm sure others have run into the same problem so it'd be valuable to have this as a community post. Thanks 😀.

@davidbrochart
Copy link

Sound good @manics, I'll do that. Thanks!

@goekce
Copy link

goekce commented Oct 28, 2023

Are there any plans to merge this PR?

I use this plugin to start a Theia backend for each user. It turned out that Theia's webview recently started enforcing https on webviews and jupyter-server-proxy only supports http requests. I believe this PR could solve this problem.

Related: #332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants