Supporting arbitrary web applications without Jupyter server present #258

yuvipanda · 2017-11-10T04:13:38Z

One of our goals is to have first class support for non-jupyter applications on Binder (such as RStudio, arbitrary command execution ala ReproServer, etc).

Currently, the only things that binderhub expects the process started inside to do are:

Accept a token and enforce using that (via a cookie, header or query param) for authentication

From prior discussion, the simplest way to allow arbitrary execution seems to be to build a standalone no-deps simple application that does the following:

Authentication with token + proxying (including websockets) to any other process
Process supervision of the backend process, so it gets restarted as appropriate

This program should be completely statically linked, and then dropped into the image as a final step. This unfortunately probably means we can't use Python (unless cx_freeze exceeds all my expectations), and need to write this in Go or Rust.

Once this program exists, we can use it for supporting arbitrary web backends without having to impose too many requirements on the environment that is present inside the container image.

yuvipanda · 2017-11-10T04:14:53Z

We sortof already do this with a combination of the nbserverproxy + nbrsessionproxy for rstudio, and that mostly works when you have Jupyter already installed. However, I think to really be inclusive of non-Jupyter & non-Python users, we need to not have Jupyter present inside as a requirement.

yuvipanda · 2017-11-10T04:22:12Z

From the binderhub side, this would mean that we support the following:

Customizing what 'cmd' is run inside the container by the API
Formalizing how we pass in the token to the container as a stable API
Formalizing how we expect the container process to use the token as a stable API

Currently all these three are jupyter specific.

betatim · 2017-11-10T08:54:43Z

Supporting arbitrary web applications

I picked that instead of your less broad first line of the first comment because it is more radical ;) My question is: is this really our goal? For me this sounds like heroku and friends, which already exists. My first reaction is that binderhub is about "telling stories with data", "data narratives", and "encouraging reuse and exploration". For me this implies a notebook like interface. Something that encourages me to poke around inside and encourages the creator to not add so many layers that it is a nice and polished product which makes it harder to explore around and reuse it in a different project.

From a technical point of view a solid and robust way to launch things like RStudio and co that doesn't rely on too many "hacks" because we started with jupyter notebooks is something I welcome.

Maybe this is just a question of wording but I felt the urge to tell you :)

minrk · 2017-11-10T15:01:15Z

It also doesn't have to be in go/rust because we could use a multi-container pod where the auth proxy is in a different container, right? Is there a reason the proxy must be in the container with the target application?

I'm all for using go/rust, as long as there is already a proxy that properly supports http + websockets without being told where those will be. I don't think developing and supporting our own proxy in go/rust ought to be within scope for this project, but adding simple auth to an existing one should be quite doable. So the first step, for me, is identifying the landscape of extensible proxy libraries in go and/or rust.

yuvipanda · 2017-11-10T17:15:42Z

@betatim perhaps 'arbitrary interactive web applications that are traditionally single-user'? I don't think we'd be heroku, since our model is:

Autobuild some container image I
Launch image I in pod P
Allow authenticated access for the individual user who initiated process to pod P

While heroku's model is more like:

Autobuild some container image I
Launch image I in one or many pods P
Allow arbitrary access to everyone for the things in pods P at a stable URL

IMO we should make sure that 'our model' does not require Jupyter to exist in the image I, and that's in scope if we want to consider RStudio and future similar projects (Eclipse Che maybe?! plain Terminado?) as first class citizens.

yuvipanda · 2017-11-10T17:23:36Z

@minrk after listening to you and @remram44 talk about this I'm now convinced that having a sidecar container with a proxy is the right thing to do.

i definitely do not want us to write any new code if possible, and splitting the proxy and process supervision parts lets us do that. It also means we are not actually restricted to single binary proxies, which brings nginx back into play...

minrk · 2017-11-10T20:42:40Z

Given that we're talking about simple token auth, an nginx proxy seems like it would be a pretty good fit as a side-car, and not too complicated since it's not conferring with an external service.

We may want to consider whether activity-tracking should be part of this API, as well (either in this proxy pod, or in the user application), as we are moving away from tracking at the CHP level.

Step one is multi-container pod spec support in KubeSpawner, I think?

saulshanabrook · 2019-02-18T14:08:05Z

Currently, the only things that binderhub expects the process started inside to do are:

Accept a token and enforce using that (via a cookie, header or query param) for authentication

Could this be set as an environmental variable instead of passed as an arg? I assume at least the port would also have to passed in.

Setting environmental variables to dictate how the container should behave seems to the most explicit and flexible way solution, since it doesn't assume anything about the image.

What is the rationale for requiring it to launch jupyter notebook?

remram44 · 2019-02-18T15:38:10Z

Wouldn't it make sense for this to be part of the proxy instead?

yuvipanda · 2019-02-18T23:26:45Z

@remram44 if you do auth at the proxy level, it leaves services exposed to attacks from localhost (inside the container), and from the network if you can bypass the proxy (if you are another user in the same network). Defense in depth, etc.

yuvipanda · 2019-02-18T23:29:30Z

@saulshanabrook

What is the rationale for requiring it to launch jupyter notebook?

Something needs to do the authentication from inside the container rather than at the proxy. Currently this is the Jupyter Notebook, since that is where we already have the code. However, I'd love for us to write a single binary Rust / Go thing that can do the auth + proxying instead. We can also use a sidecar approach instead, but that would only work for Kubernetes and be non-generalizable.

yuvipanda · 2019-02-18T23:29:59Z

http://jupyter-server-proxy.readthedocs.io/ does all the things I want it to now, but would be great to not need it in the future.

saulshanabrook · 2019-02-18T23:32:51Z

Something needs to do the authentication from inside the container rather than at the proxy. Currently this is the Jupyter Notebook, since that is where we already have the code.

Ah I see! Sorry I was totally misunderstanding the problem here. Thanks!

manics · 2021-08-24T16:33:13Z

Does https://github.com/ideonate/jhsingle-native-proxy cover this use-case?
https://discourse.jupyter.org/t/new-package-to-run-arbitrary-web-service-in-jupyterhub-jhsingle-native-proxy/3493

yuvipanda · 2023-10-03T03:11:17Z

I think jhsingle-native-proxy and jupyter-server-proxy together cover this use case now.

willingc added architecture enhancement labels Nov 14, 2017

minrk mentioned this issue Nov 15, 2017

Multi binder launch token problem #211

Closed

ryanlovett mentioned this issue Dec 5, 2017

Open R notebook files with RStudio from launch form #325

Open

yuvipanda mentioned this issue Dec 22, 2017

Add support for jupyter notebook command line options in binder #375

Closed

psychemedia mentioned this issue Apr 25, 2018

OpenRefine running "natively" on Binderhub betatim/openrefineder#3

Open

yuvipanda closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting arbitrary web applications without Jupyter server present #258

Supporting arbitrary web applications without Jupyter server present #258

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

betatim commented Nov 10, 2017

minrk commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

minrk commented Nov 10, 2017

saulshanabrook commented Feb 18, 2019

remram44 commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

saulshanabrook commented Feb 18, 2019

manics commented Aug 24, 2021

yuvipanda commented Oct 3, 2023

Supporting arbitrary web applications without Jupyter server present #258

Supporting arbitrary web applications without Jupyter server present #258

Comments

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

betatim commented Nov 10, 2017

minrk commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

yuvipanda commented Nov 10, 2017

minrk commented Nov 10, 2017

saulshanabrook commented Feb 18, 2019

remram44 commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

yuvipanda commented Feb 18, 2019

saulshanabrook commented Feb 18, 2019

manics commented Aug 24, 2021

yuvipanda commented Oct 3, 2023