Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: simplify and automate forwarding of env vars [server -> kernel] and [gateway -> kernel] #1000

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

telamonian
Copy link
Contributor

fixes #957

see discussion at #957

…matically determine ENV_FORWARD from kernel env stanza
@telamonian telamonian changed the title wip: wip: simplify and automate forwarding of env vars [server -> kernel] and [gateway -> kernel] Aug 22, 2021
@kevin-bates
Copy link
Member

Hi Max - Thanks for working on this. I'm having some difficulty with the names kernel_env_inherit and kernel_env_forward and I believe it's due to inherit and forward being verbs. I think kernel_inherited_envs and kernel_forwarded_envs denote their intention more clearly (IMO).

I also think we need to include deprecation logic because I'm thinking we'd release this in a minor release (2.6) - with the next major release being the transition to kernel provisioners. I suppose such logic would be necessary even if these were to first appear in a major release anyway. Could you please add some surrounding deprecation code?

@telamonian
Copy link
Contributor Author

telamonian commented Aug 22, 2021

I think kernel_inherited_envs and kernel_forwarded_envs denote their intention more clearly (IMO).

@kevin-bates I agree those are a bit more expressive. I was also thinking about maybe forward_envs_server_to_kernel and forward_envs_gateway_to_kernel, or some other such super declarative approach. I'll keep thinking about more perfect names as I work on the PR.

Before we get too caught up in dickering over the names, I did want to get your feedback on some design stuff. Ideally, I'd like to simplify and automate the env var forwarding logic to the point where I can get rid of at least one of the existing three related config fields. I ended up trying out ~4-ish different implementations before I started on the current PR. Some possibilities (for clarity I'll use the config field names from current mater):

Plan A: ship EnterpriseGatewayApp.env_whitelist to the GatewayClient, remove GatewayClient.env_whitelist

  • AFAIK there's no clear use case for being able to set the whitelist on both the client and the gateway (please correct if wrong). As a config/admin/security issue, I think it makes more sense to set EnterpriseGatewayApp.env_whitelist alone, and then ship that config value to the client as needed
  • I think it would make sense to add the value of EnterpriseGatewayApp.env_whitelist to the api/kernelspecs response that the client receives here in the code. I think the only issue would be to make sure that dynamic update of EnterpriseGatewayApp.env_whitelist get propagated to the gateway
  • env_whitelist could either be added as a new property in the /api/kernelspecs response schema, or just get loaded into the existing metadata property

Plan B: remove both GatewayClient.env_whitelist and EnterpriseGatewayApp.env_whitelist, use the keys of the env stanza of the to-be-started kernelspec as the sole source of whitelist truth

  • All env vars in the user's gateway client instance with keys that appear in the relevant kernelspec will get automatically picked up and included in any kernel start request.

  • The env stanza in the kernelspec defines default values for each env var, to be overridden (or not) by the env in the client's kernel start request.

  • So the pattern here would be that an admin would put any env vars they want to be settable by individual users in the kernelspec, and then put any "private" env on the gateway instance and in EnterpriseGatewayApp.env_process_whitelist.

  • pros:

    • easy and intuitive to define a kernelspec with an env that "just works"
    • easy to forward different sets of env vars based on the needs of each kernel
    • much easier setup for kube kernelspecs (will also require some improvements to some other JEG k8s machinery)
    • no more "but what do we name the whitelist params?" discussion
    • more of a me thing, but when I first worked with JEG this was actually how I assumed kernel env worked in the first place
  • cons:

    • will produce very different behavior w.r.t. current master, which would mean that this PR should probably wait until @kevin-bates is ready to go with the kernel provisioner support/JEG v3.0.0 release

    I'm leaning towards Plan B, and already have some code to support it in the PR, eg:

    https://github.com/jupyter/enterprise_gateway/blob/f49b82b4af31002eff4b75d8c864aad923e8aaf2/enterprise_gateway/mixins.py#L147-L153

Plan C: like Plan B, but instead use explicit metadata.process_proxy.forward_env kernelspec field as sole source of whitelist truth (EDIT: added in response to concerns below)

  • metadata.process_proxy.forward_env would get handled essentially the same as EnterpriseGatewayApp.env_whitelist does now, but unlike the env var metadata.process_proxy.forward_env will be configured per-kernel
  • pros
    • similar to Plan A, but can be configured per-kernel
    • similar to Plan B, but preserves current semantics (both in terms of kernelspecs env stanza and EG whitelist behavior)
    • explicit is better than implicit
    • ships to gateway client "for free"

@telamonian
Copy link
Contributor Author

telamonian commented Aug 22, 2021

'Fun' with CRDs!

I also looked into replacing launch_kubernetes.py and the ever-confusing kernel-pod.yaml.j2 with a single kernel custom resource description (CRD). Prior to this I didn't know anything about CRDs, aside from broad strokes, but now I think I understand exactly how they can be defined/consumed. Basically, they come in two parts:

1. a schema defining the kernel CRD

  • straightforward and json schema-based
  • in practice the kernel resouce would be a light encapsulation layer over a pod resource (and maybe also a secret, volume, etc resource as well, as implied by the kube api code here in launch_kubenetes.py.
  • On kernel init, the pod resource gets instantiated as a child of the kernel, and the pod assumes the lifecycle of the kernel
  • the kernel resource would have fields equivalent to those in the current jinja template in kernel-pod.yaml.j2
  • writing the CRD schemna is the easy part

2. a custom kube controller that implements all of the actual behavior of a kernel by listening for any additions/changes to kernel resources and reacting accordingly

  • can be defined in many different languages/tools. In theory this includes python
  • however, >85% of the OSS controllers avaible on https://operatorhub.io/ are written in go (as is kube itself)
  • con:
    • to do this part well, i'd have to learn go

After mulling over my kernel CRD idea for a while, I decided that it's overkill and out of scope for this PR. I'll need to learn some go first, but I do think it would be neat to try restructing JEG's builtin helm deployment into something based on a handfuil of simple and super declarative kernel, server, gateway, etc CRDs. I think there's potential here to make cloud deployments of JEG both much simpler to standup and much more flexible, at least in terms of what components run where

@kevin-bates
Copy link
Member

See #991 regarding CRDs. If you could take that for a spin that would be awesome. I'll get back to you tomorrow regarding this (env) PR.

@kevin-bates
Copy link
Member

Hey Max, I wanted to get back to you regarding #1000 (comment).

I really like the idea behind Plan B for the way it dramatically simplifies things and, as you point out, is perhaps more natural.

The concern I have is that it changes the semantics of the env stanza from static values to flowed values, which, for at least some of the env entries are not meant to be the case. However, you address that by saying that those "static" values should be placed into EG and their names configured in the process-whitelist - which makes sense. I suppose where that could run aground would be if there are still kernel-specific values for the same env name. In that case, there's no way to specify that kind of granularity in the process-whitelist.

Is that last comment enough to derail things? I suppose we could add a disallowed list such that envs in that list will NOT be propagated to the kernel, yet those same values could still be configured in the kernelspec on a per-kernel basis. Should any such envs get flowed from the client, we would log their exclusion and continue with the request.

The other semantic difference this introduces is the difference in behavior between non-EG and EG configurations as a straight Lab/Server installation won't behave this way. I suppose that difference will be more blurred once we have parameterization - so I'm not too worried about that.

Regarding plans, I've been thinking we're going to want a 2.6 release prior to the switch to provisioners and I wonder if we could start the deprecation of env-whitelist now and honor both env-whitelist and env-kernelspec (:smile:) simultaneously. We could also do the same in GatewayClient. We'd want to introduce this "disallowed" entry as well. Frankly, I don't think these env-based lists are used that often.

What do you think? If you'd like to meet and discuss, I'd be happy to do that as well.

Btw, I also like the idea in Plan A where the EG whitelist is posted in the kernelspec response and we have a kind of "protocol" in GatewayClient to look and use that value. That approach introduces far fewer semantic differences and eliminates the client-side trait - which admins spend time coordinating today.

The one aspect of things I'd like to retain however, is that all KERNEL_-prefixed envs still flow. Should we take plan B and introduce a disallowed-envs-list, then we'd need to honor it if it included KERNEL_-prefixed names.

@telamonian
Copy link
Contributor Author

telamonian commented Aug 26, 2021

Hey @kevin-bates, thanks for the detailed response.

I suppose we could add a disallowed list such that envs in that list will NOT be propagated to the kernel, yet those same values could still be configured in the kernelspec on a per-kernel basis. Should any such envs get flowed from the client, we would log their exclusion and continue with the request.

I see your point, but needing to add in an env_disallowed config param kind of kills Plan B for me. If anything, implicit whitelist harvested from kernelspec + explicit blacklist feels even less intuitive than the current state of affairs.

In response to the semantic and other concerns you explain above, I added a new "Plan C" to my earlier post, in which the forward env whitelist gets stored in a new metadata.process_proxy.forward_env kernelspec field. It's similar to Plan A in that it leaves the current semantics mostly alone, but like Plan B allows us to eliminate both GatewayClient.env_whitelist and EnterpriseGatewayApp.env_whitelist.

I'll bring up the general problem of eliminating the need for GatewayClient.env_whitelist at the server meeting today. We can also schedule some dedicated time to talk about this. I'll ping you on gitter

@telamonian
Copy link
Contributor Author

@kevin-bates Looking back over your kernel parameterization JEP, I see that the Plan C I suggested is basically the same as your proposal for parameterized environment variables, only without the ability to pass env vars to the "provisioner" (which in this case I assume would be the gateway node) or to specify var "types"

@kevin-bates
Copy link
Member

Hi @telamonian - I hope all is well.

I'd like to move forward, in some capacity, on these proposals and am finding myself leaning toward Plan C. I'd like to summarize the possible set of changes, along with considerations when older GatewayClients are in play, etc. Relative to the "future release" comments above, I believe the current state of affairs is that we will have a 3.0 release - that is still based on process proxies - followed by a 4.0 release based on provisioners.

  1. Although only somewhat germane to this, the current env_process_whitelist configurable will be renamed to inherited_envs. Environment variables that span kernel configurations and can be statically defined would be defined via this configurable.
  2. Process-proxy configurations will be extended to support a process_proxy.config.forwarded_envs (other names? owner_envs, user_envs, caller_envs) list[str] property consisting of envs that are derived from the client-side. This same name will be used in provisioners.
  3. Older GatewayClient implementations that still use env_whitelist will forward those values, but those that are not in the process_proxy.config.forwarded_envs will be ignored.
  4. GatewayClient implementations that are aware of process_proxy.config.forwarded_envs (or kernel_provisioner.config.forwarded_envs) will filter existing envs using those values, and GatewayClient.env_whitelist will be deprecated. If GatewayClient is directed at an EG where xxx.config.forwarded_envs is not present and env_whilelist is configured, it should forward those env values (and assume the EG is "whitelist-based").

Should we also allow EG to define EnterpriseGatewayApp.forwarded_envs? These would be for those env names that span all kernels and this list would be merged into each kernelspec's xxx.config.forwarded_envs entry. This could also ensure that at least an empty list exists in the config stanza so that use-case 4 can know that the targetted gateway does utilize this functionality.

Does this approach help your use-cases?

Do you mind if I proceed with this? (I'm assuming you have limited bandwidth right now.)

@kevin-bates
Copy link
Member

Ok, here's how I think we should proceed in the near term.

  • Given @telamonian is not able to work on this at the moment and we're nearing a major release cycle, I think it's imperative we, at a minimum, get the renaming (and deprecation) of existing names in place. I propose that env_whiltelist be renamed to client_envs (although this term was referred to as forwarded_envs above, I think client is more expressive of where their values are coming from) and env_process_whitelist to inherited_envs (as referenced previously).
  • I like "Plan C", but feel it's just too closely tied to process proxies which we'll be moving away from in the next major release, and identifying a more agnostic location may not be in line with the eventual parameterization we want to add.
  • I like the idea of conveying the list of client_envs in the returned kernelspec, but, again, that's only useful in interactive environments (i.e., with parameterized kernels) since a user really needs to be "at the helm".

Since parameterized kernels will not happen in the process-proxy lifetime, I think much of this PR will need to be part of EG 4.0 - when a switch to kernel provisioners takes place and the entire Jupyter ecosystem can benefit from parameterized kernels.

Regarding the env_whiltelist in GatewayClient. I think we can safely deprecate its use altogether since, today admins must already know the EG's env_whitelist value AND must instruct their users to have such envs configured into their environments (or coded into their startup scripts) so, as @telamonian has pointed out, its essentially useless.

Thoughts, or opinions on the above? Please provide input ASAP.

Unless I hear otherwise, I will be working on the rename/deprecation changes next week beginning July 19.

@kevin-bates kevin-bates added runtime configuration kernels waiting for author Waiting for information from item's author labels Jul 15, 2022
@kevin-bates
Copy link
Member

Regarding the env_whiltelist in GatewayClient. I think we can safely deprecate its use altogether since, today admins must already know the EG's env_whitelist value AND must instruct their users to have such envs configured into their environments (or coded into their startup scripts) so, as @telamonian has pointed out, its essentially useless.

I'm not sure what I was thinking here. Although we can deprecate GatewayClient.env_whitelist we still need to convey the permitted set of envs that a client can send since we certainly don't want to send every environment variable defined on the client. Here's my current line of thinking...

EG will send the current set of defined client_envs that are configured on the EG instance in each kernelspec. This list-valued property will be included under the metadata.client_envs section of the kernelspec. If the kernelspec happens to have an entry for metadata.client_envs, that entry will be extended with the value set in EnterpriseGatewayApp.client_envs.

Kernel start requests will then filter on that same union of envs, taking any included in client_envs AND any prefixed with KERNEL_, so we should probably look at having the cached kernelspec include this unionized value for metadata.client_envs.

GatewayClient will still need to honor GatewayClient.env_whitelist during its deprecation. As a result, the list of envs sent from GatewayClient to EG will be the union of GatewayClient.env_whitelist and the metadata.client_envs list of the kernelspec (if present). EG will log (warning) the list of envs that are NOT in metadata.client_envs (besides those prefixed with KERNEL_) since it should still enforce that requirement (and, per the quoted comment, should still adhere roughly to the previous EG env_whiltelist value since admin's need to coordinate those values anyway).

@kevin-bates
Copy link
Member

Sorry for the spam (if anyone is listening), but I'm beginning to think that the ONLY thing we should do for the 3.0 release relative to this discussion is rename the whitelists and nothing more. This is because it introduces too much divergence from Kernel Gateway if we were to remove GatewayClient.env_whitelist without also updating it to use the same names and send the client_envs in the metadata stanza of the kernelspec.

I'm also hesitant to introduce a field in the metadata stanza that will become obsolete once parameterization is available and parameterization cannot really be implemented, IMHO, until we convert to kernel provisioners, which then begs the question, do we need EG at all?

Perhaps our time is better spent getting EG 3.0 out (with these options renamed), then focusing on getting the provisioners in place (and dropping process proxies)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment