Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] use traefik for the proxy #1162

Closed
wants to merge 22 commits into from
Closed

[WIP] use traefik for the proxy #1162

wants to merge 22 commits into from

Conversation

minrk
Copy link
Member

@minrk minrk commented Feb 26, 2019

Once this is ready, we'll be able to have multiple proxy replicas, enabling restart/upgrade without any downtime while the proxy restarts.

Tasks:

  • remove autohttps, internal ingress, kube-lego
  • start etcd (as single container in hub pod, using hub-db-dir for storage)
  • replace configurable-http-proxy with traefik
  • enable auth for etcd (use existing proxy.secretToken)
  • enable auth for traefik api (also use existing proxy.secretToken)
  • test with multiple traefik replicas
  • test automatic provisioning of certificates by meeting letsencrypt challenges
  • test TLS-termination with use of provisioned certificates
  • run hubtraf with https, multiple traefik replicas

cc @GeorgianaElena who's done all the great work implementing https://github.com/jupyterhub/traefik-proxy

closes #1226
closes #1364

- proxy deployment contains traefik
- proxy-etcd statefulset contains etcd

These are mostly imported subsets from the incubator/etcd and stable/traefik charts
@consideRatio
Copy link
Member

@minrk @GeorgianaElena woooooo I'm excited about this!

@minrk
Copy link
Member Author

minrk commented Mar 5, 2019

This is super close to working (it works for me). The biggest headache is getting authentication setup, because both etcd and traefik work pretty hard to prevent you launching them with simple passwords/token auth set up, which is what we want.

  1. etcd requires launching insecurely and then enabling auth via insecure etcdctl calls (this appears to be working now), and
  2. traefik uses Apache basic or digest auth and there doesn't appear to be a way to put the token into the template in the right format without requiring the user to specify them in their config file after calling htpasswd on them. This is not insurmountable, but it's pretty inconvenient. One hurdle is that these use base64-encoded digests and the sprig digest functions output hex (and base64-encoded hex is not the same as base64-encoded original bytes). The Apache Basic auth spec does define a 'plain' unencrypted format that isn't supported everywhere, and traefik is indeed one of the places that it's not supported.

I'm trying to avoid forcing the user to generate more secrets to set up one thing, but I may punt and say that you need to call htpasswd to paste it into the form traefik wants. That's pretty annoying, though. The alternative is to rely on network policies to protect the proxy (maybe not the best).

@minrk
Copy link
Member Author

minrk commented Mar 6, 2019

I gave up trying to figure out how to enable traefik auth, given only a single token. That doesn't seem to be possible. Instead, users are just going to have to call htpassword to get the chart started. I really dislike this, but I don't see any way around it.

run in an initContainer for traefik

If we added htpasswd to the traefik image, we wouldn’t need to use an initContainer for this,
but we would need to maintain and push our own images, so 🤷
@yuvipanda
Copy link
Collaborator

How about:

  1. Make our own traefik image
  2. CMD is a Python Script that does the things we need it to do. We can ship htpasswd here.
  3. Then it execs to traefik

This should help avoid requiring users to run htpasswd on their own. What do you think?

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/server-becomes-stale-in-a-matter-of-days-even-hours/1920/7

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review notes

ACME certificate provisioning (Let's encrypt)

It seems like the traefik docs on ACME indicate that the use of acme.storageFile as used in the configmap.yaml file is deprecated, and that the certificates could be stored in the key value store instead. This would perhaps also remove the need for us to have a PVC?

Use a K8s Secret instead of a ConfigMap

We add proxy.secretToken into the configmap.yaml, so it should be made a secret instead I figure.

acme volume / acme.json file

We are trying to persist the acme.json? Hmmm... But it sais we use an emptyDir volume mount for the /acme folder. And, it seems like we are not using the PVC created perhaps with purpose to persist the acme.json file. If we would use a small PVC for this file, we end up in trouble whenever we get two replicas of this pod as they cannot both mount it in a ReadWriteOnce fasion, and ReadWriteMany would require NFS files etc. In short, this is not a viable option I think.

UPDATE: Action points

  • Remove the Acme PVC
  • Remove the storageFile reference, we already have the storage entry towards the KV store
  • Make the ConfigMap a Secret
  • Make the proxy's container relating to traefik startup using config in the KV store using basic command line arguments with details on how to communicate with the KV store
    There was no reason to do this as we understand it.
  • Make the traefik prefix be "/like-this" instead of "/like-this/"
  • Perhaps mount our trafik.toml into /etc/traefik/ where traefik expects to find such file anyhow, in the initContainer.
  • Pass all etcd config through the command line options and strip all etcd config from the configmap. Then, we don't need to make the configmap a secret!
  • Investigate what happens with the communication between Etcd and Traefik when Etcd pod restarts (along with the Hub ~ ET connection problem)
  • Transition to use Consul instead of ETCD
    The reason for doing is relates to performance, currently Traefik cannot work performant with ETCD or Consul when there are plenty of routes, like there are in mybinder.org for example. But, there is work done towards making Traefik performant with Consul.
  • Use a passthrough configuration for traefik.toml where we rely on the toToml command and some modification of the provided yaml that is to be written toToml based on other configuration where needed, just like i did in mattermost-team-edition. For reference on traefik.toml, see the treafik configuration docs.

@manics
Copy link
Member

manics commented Sep 6, 2019

@consideRatio Unfortunately it sounds like Traefik can't save the ACME certificate as a K8S secret, it needs a separately deployed K-V store: traefik/traefik#2542

According to traefik/traefik#2542 (comment) the best suggestion if you don't want to setup your own K-V store is to use cert-manager.

@GeorgianaElena
Copy link
Member

I think there are a few things that changed since first opening this PR and also some questions we need to answer before moving forward with this.

What changed

Questions

@mofanke
Copy link

mofanke commented May 18, 2020

@minrk I have a question about this ,when use traefik for the proxy as HA mode, can i disiable check_routes when jupyterhub started or other situations?

@srggrs
Copy link

srggrs commented May 26, 2020

What's the status of this? My need is just with the simple toeml implementation, no containers, no consul or etcd. Just want to know if this is ready with simple toeml. Thanks

@minrk
Copy link
Member Author

minrk commented May 26, 2020

@srggrs the traefik proxy is working great with toml config. This PR is only about integrating it as the default proxy in the helm chart. You can check out the littlest jupyterhub for an example of deploying traefik proxy with toml config on a single VM.

@mofanke you shouldn't disable check_routes, since that is still a recovery/reconciliation step in case of state changes on JupyterHub's side. You can increase the interval, though, if you think the issues are less likely to occur.

@srggrs
Copy link

srggrs commented May 28, 2020

@minrk thank you! Sorry I meant if was ready for traefik v2, but I realised I'm in the wrong issue. I got lost in chasing issues...

@lxm
Copy link

lxm commented Jul 3, 2020

any progress here?

@@ -67,6 +70,37 @@ spec:
{{- if .Values.hub.extraContainers }}
{{- .Values.hub.extraContainers | toYaml | trimSuffix "\n" | nindent 8 }}
{{- end }}
- name: consul
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we already run consul? Seems redundant to have this configured, especially since it doesn't have TLS, etc.

@chancez
Copy link
Contributor

chancez commented Jul 30, 2020

I feel somewhat mixed about this. I don't really want to run another loadbalancer at all and I already have ingress controllers in my cluster which work just fine, is it possible to instead just use my existing ingress? How does that play into internal SSL?

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/large-notebooks-save-fails-with-failed-to-fetch/9022/8

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/replace-the-configurable-http-proxy-chp-with-the-traefik-proxy-as-a-cluster-ip-service-using-z2jk/12587/2

@Vaibhav1919
Copy link

Hi everyone , Is there any update on this PR , i.e. is traefik supported in z2jh? Can i use this PR to create multiple replicas of proxy.

@consideRatio
Copy link
Member

Hey @minrk! In this PR I recall that we tried replacing autohttps (traefik) + proxy (chp) for a single traefik proxy deployment with X replicas, but concluded that they didn't support multiple replicas in conjunction with integration to acquire certs from Let's encrypt.

Due to this significant realization we arrived at part way through this PR, I'll close this PR as I see no clear path forward. We could go for traefik to replace chp and the jupyterhub proxy class representing traefik to replace the jupyterhub proxy class representing chp, but we wouldn't get the gains we hoped for. If we want to pursue this anyhow, I'd like to open a discussion in an issue at this point I think to overview the updated motivations more clearly.

@minrk
Copy link
Member Author

minrk commented May 6, 2022

Yes, I think that's fine. I still think we should try to get traefik going, but it's so weird and frustrating that the two biggest selling points for us - native ACME support and multiple replicas, seem to be mutually exclusive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet