Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Provisioning - initial implementation #612

Merged
merged 42 commits into from
Jul 26, 2021

Conversation

kevin-bates
Copy link
Member

@kevin-bates kevin-bates commented Jan 26, 2021

Here are initial changes to introduce Kernel Provisioning (aka, Environment Provisioning). By default, all kernel launches will use the LocalProvisioner unless a different provisioner_name is specified in the kernel specification's metadata.kernel_provisioner stanza. The LocalProvisioner should yield behavior equivalent to today's kernel lifecycle management. As a result, we should see zero changes to the existing tests (other than changes needed for continuing migration to pytest). Any changes to existing tests must be justified so as to preserve compatibility - few exceptions are expected.

Provisioners are registered via the entrypoints framework and discovered via the group name jupyter_client.kernel_provisioners. When kernelspecs are located, each is checked for a provisioner_name entry. If found, the provisioner is loaded using entrypoints. If the load fails, a warning message is logged indicating the provisioner package is not available and that kernel specification is dropped from the results (if using get_all_specs()) or results in a NoSuchKernel exception (if using get_kernel_spec(name)). If no provisioner_name is found (which will constitute nearly 100% of the cases initially), the LocalProvisioner is instantiated as if it had been configured (per the first paragraph).

A singleton is used as a kernel provisioner factory and cache. A map of provisioner name to EntryPoint instance is built at startup and maintained during the process's lifetime whenever the kernelspec manager encounters a provisioner name that is not in the map (and that entry point load succeeds).

This pull request will remain in DRAFT mode until the following has been addressed:

  • Determine what to do about the connection file and zmq ports. Clearly, the port attributes must remain on KernelManager, but the point at which they are set will need to change. Since the provisioner should return the connection file information - which is also consistent with the behavior described in Kernel handshaking pattern proposal. Also, write_connection_file() is called before launch (which makes sense for local-only and non-handshaking) but this should be moved closer to the actual launch. (Utilized self.parent (ie., kernel_manager) to write local connection file in local-provisioner for now.)
  • Should the kernel_cmd trait be removed as its been deprecated since 4.0 (April 2014)? Its removal will resolve one potential (and minor) backwards compatibility issue for any applications using that trait. (See TODO: Potential B/C issue... in comment in EnvironmentProvisionerBase.pre_launch().) (Removed in [Release 7.0] Remove deprecations in kernel manager #643)
  • Need to convert any of the Enterprise Gateway process-proxy implementations to provisioners as a POC to be sure there aren't any glaring items missing. If others are doing similar things, it would be good to make sure this proposal addresses those instances. (POC instances of Hadoop YARN, Distributed(ssh), K8s and Docker provisioners have been implemented here.)
  • Add test to include custom provisioner.
  • Add documentation
  • Look at simplifying interface/base class

Implements/Resolves #608

@kevin-bates kevin-bates marked this pull request as draft January 26, 2021 01:05
Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but it's looking clean @kevin-bates .

jupyter_client/launcher.py Outdated Show resolved Hide resolved
jupyter_client/launcher.py Outdated Show resolved Hide resolved
jupyter_client/manager.py Outdated Show resolved Hide resolved
"""Launch the kernel process returning the class instance and connection info."""
pass

async def cleanup(self, restart=False) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we force this to be abstract as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with that. Since the default implementation didn't need this (although the connection file management is a concern that could change that), I didn't want to force its (potentially unnecessary) implementation. That said, cleanup() is something generally expected - with the act of nothing to do being the exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way. I think forcing implementers to consider what cleanup they need isn't the worst overhead even if it's a pass implementation but my opinion there isn't a strong one.

jupyter_client/provisioning.py Outdated Show resolved Hide resolved
@MSeal
Copy link
Contributor

MSeal commented Feb 22, 2021

Let me know if I am missing other issues that need attention. I have less time available so I am missing some threads here and there

@kevin-bates
Copy link
Member Author

Thanks Matt - I really appreciate your reviews thus far.

@kevin-bates kevin-bates changed the title Kernel Environment Provisioning - initial implementation Kernel Provisioning - initial implementation Feb 26, 2021
This enables the ability to specify provisioner traits on command line
or config file and override via kernelspec config stanza.
Fix typo.

Co-authored-by: David Brochart <david.brochart@gmail.com>
@kevin-bates
Copy link
Member Author

Thank you @blink1073 and @davidbrochart for your reviews.

@davidbrochart
Copy link
Member

I just merged #645, looks like you have conflicts now, but that should not be a big deal. Sorry Kevin!

@kevin-bates
Copy link
Member Author

No worries at all David. I think we should get at least another approval before merging this anyway. I'll try to resolve the conflict later today.

@blink1073
Copy link
Contributor

Let's plan to merge this by next Thursday (22 July) if we don't get any more feedback. @MSeal you had some early feedback if you'd like to take a final look.

Copy link
Member

@Zsailer Zsailer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @kevin-bates!

I added one minor comment in line. The PR is going to need a rebase before we can merge. Otherwise, I think it's ready to go!

jupyter_client/kernelspecapp.py Outdated Show resolved Hide resolved
@kevin-bates
Copy link
Member Author

kevin-bates commented Jul 23, 2021

It looks like this PR encountered an updated version of mypy that complains about missing packages. In this case, it was producing an error related to the (optional) paramiko package used in ssh/tunnel.py. I've updated the GH workflow to add --install-types --non-iteractive to resolve this, but this particular commit may be necessary in master prior to this PR's merge (and assuming that's the correct way to address this).

interactions still occur via the ``KernelManager`` and ``KernelClient``
classes within ``jupyter_client`` and potentially subclassed by the
application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the KernelManager or KernelClient API changes (additional (optional) methods, method signature change...) ? Is there any impact for current consumers of those APIs?

Maybe worth stating this in the documentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed on KernelClient. As for KernelManager there were some changes to "internal" methods and I agree those kinds of changes should be documented (primarily calls to the Popen instance are replaced with calls to the provisioner instance). As a result, only KernelManager subclass authors should be affected (and hopefully minimally). However, I would suggest those be part of the 7.0 release docs (or whatever major release this falls in) rather than embedded in functional/dev docs pertaining to Kernel Provisioning itself.

to use the YARN REST API. Or, similarly, a Kubernetes-based provisioner
would need to implement the process-control methods using the Kubernetes client
API, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are thos poll, wait... lifecycle methods enforced by the API? Here also, would be good to have a list (or link) to the lifecycle methods (if the number is not too high and does not break the reading flow).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it's not clear to me what you mean - an example will help me. I also don't understand what you mean by enforced by the API. Sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should look at the code but did not. My question is "Do all remote provisioners need to implement poll, wait...?". Another way to ask is "Are pool, wait... abstract methods that need to be implemented by all remote provisioners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - yes, abstract methods are used where "enforcement" is required. These methods are documented on KernelProvisionerBase and in the API portion of the provisioner docs.

@echarles
Copy link
Member

Left a few comments/questions with the future idea to embed a schema for the config as to have the ability for the user to override the config.

I see 3 approvers already. I don't want to block the merge and my questions/ideas can for sure be discussed in subsequent PRs.

Feel free to merge, but really interested to get feedback on your comments.

@kevin-bates
Copy link
Member Author

I really appreciate your review Eric - thank you. I've left you with a couple of questions to clarify my understanding regarding your comments.

@echarles
Copy link
Member

Feeling good with that warning 👍

[W 2021-07-26 09:53:42.722 ServerApp] Kernel 'datalayer_env' is referencing a kernel provisioner ('loca-provisioner') that is not available. Ensure the appropriate package has been installed and retry.

@echarles
Copy link
Member

echarles commented Jul 26, 2021

As further experiments, I have poc on top of this PR what it would be for JupyterLab to get a json schema baked in the kernelspec, display a form pre-populated with the default values shipped in the schema, send the user values to the kernel service that replaces the placholders defined in the kernelspec with the user values to launch the kernel with parameters. It is doable, we just need to review the changes (not so much to be honest).

I hope we can discuss this at the next server meeting.

Untitled

cc/ @kevin-bates @Zsailer @blink1073 @mlucool @goanpeca @ericdatakelly @Carreau

@blink1073 blink1073 merged commit 109e7ac into jupyter:master Jul 26, 2021
@blink1073
Copy link
Contributor

Huge thanks to @kevin-bates for all the work you did here!

@blink1073
Copy link
Contributor

Ready for preview: https://pypi.org/project/jupyter-client/7.0.0a1/

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/errors-with-asyncio-io-loop-in-custom-ipython-kernel/10908/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PROPOSAL] Kernel Provisioning
7 participants