-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packaging - the custom msg, entry points, and cached static assets solution. #116
Comments
I certainly like this approach. It makes sure that assets are based on the kernel runtime rather than associated with the overall notebook server (or other frontend). Can path be remote or local, depending on the author's implementation? |
It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path. The Python packaging registry is not language agnostic. It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy. I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions. Kernel authors will never bother to implement delete messages. |
If you include the require bits, I'd say that's true. However, treating this as a resource query and response relative to the kernel does not make it coupled to the notebook. We'd want this for any other HTML based frontends, including Hydrogen. I'm not in agreement about this using a Python packaging registry, as I think resources should be installed per kernel.
Don't you think that would effect their users negatively enough that eventually they would? |
The kernel knowing about static assets and telling the server seems problematic. I think if the kernel is being asked about the assets, it should be responsible for serving them, as well. |
No they won't thay are developper, it work for them if they restart the server, which they do every 10 minutes.
Nothing tell you that resources will be the same for hydrogen and the notebook. |
I don't expect this to be a problem. Any resource fetched from a kernel should necessarily be served from a kernel-specific path. So when kernel K is asked for resource R, the server maps it to I do think if we are going as far as making the Kernels responsible for static resources via messages, the most logical way to do that is to proxy requests to the Kernels themselves, and expect Kernels to run an HTTP server to serve the files. HTTP already has all the features we are describing here, I think. |
This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is. |
Hey guys, glad we are talking about this. Here are my responses.
Sorry! I really should have clarified, "path" here means "unique name". It can be whatever string the package author wants!
The only piece of the above that the kernel authors need to implement is the single message, in blue. It's up to kernel authors to choose a mechanism equivalent to Python's entry points, or something that can be used as an alternative.
No.
You missed the part where I mentioned caches are associated to specific kernels, by id.
That means their kernels aren't up to spec.
But then if the kernel hangs, or is thinking, the assets are unavailable. |
Thanks for catching that! I'll edit my post. |
That's fair. Wait... How many ports are we talking then? That doesn't seem tractable unless those are proxied to the main notebook server. |
That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:
|
I don't think we'd need to differentiate. The same way the rich display system works, if a front-end can load an asset, it wont. |
I mean the JS could be different in notebook than in hydrogen. or rodeo, or thebe. do you introduce mimetype per frontend ? |
I'm certainly going to reject that one. Doesn't work right for thebe or any other remote context.
At first I thought you were joking, then I assumed someone implemented that. Like this? https://github.com/fanout/zurl My thinking was that resources can be local paths or fully qualified URLs. |
One. The notebook server would proxy requests like There's also a question of whether these should be per kernel name or per kernel id. If it's per id, it's going to mean roughly 0 cache hits as every kernel instance would get its own URL. |
I may not understand, but this is what the cache is for. The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later. |
That is forcing knowledge of the notebook server onto the kernels. Do we really want to do that? I assumed not. |
Yes |
Cache only helps mitigate future requests, it still needs to get them from the kernel in the first place.
So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel? |
This feels like the wrong way round to do things. The webserver shouldn't be telling the client what to load, the client should be asking the server for the things it determines it needs. Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice. |
Yes, that was our thinking. It's totally possible we overlooked a use case where that was incorrect. Also, you could re-request assets on kernel restart (not just first start). |
I'm struggling to see what problems this solves. If we are assuming the kernel knows everything about the server's filesystem in order to tell the server where verything else, then what's the advantage of the kernel managing resources at all, if it can only manage them in a way that the server can understand and access? |
The widget display message does exactly that, "hey load this" |
Does this mechanism provide any benefit over a |
Possibly I misunderstood. It sounds like in your proposal, the server is just telling the frontend to load something, as a separate message from anything that might actually use it. The widget display messages say 'create this class, loading it from X if you need to'. Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded. |
If the client, webserver, and kernel exist on three different machines, it does. Also, the |
That's a good point, about the backend not being aware of when the resource is loaded. Unfortunatley this problem already exists in our current architecture. A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern. |
How? I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.
It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available? |
If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start. |
The bit about request/response makes sense to me, but I'm not sure what you mean about using async patterns in the kernel. I was thinking about race conditions in the frontend: if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky. If the frontend requests (with caching) the resources as it needs them, you avoid this problem. |
All: can this issue be closed? If not, what next steps are required? thanks! |
I think we have basically solved this in JupyterLab and we don't have plans on back porting to the classic notebook as it would require a massive amount of work. Closing. |
Certainly going to agree there, we ran into too much difficulty with needing to continue support of requirejs. It's still a core problem that people want to be able to declare an asset once for the life of a document -- we're not solving this in Jupyter notebooks (spec wise at least). |
As this is also a topic which is relevant to other clients/kernels implementing the stuff: can someone give a pointer how to issue should now be handled? AT a first glance, I couldn't find anything about this in http://jupyter-client.readthedocs.io/en/latest/messaging.html E.g. how should a javascript library (e.g. for a plot) be sent from an R kernel so that it is cached in the frontend and doesn't need to be resent (or at least not be saved multiple times)? |
Both the classic notebook and jupyterlab have extension mechanisms. I will
comment here on that of JupyterLab as it better represents how things will
work in the future.
* All frontend extensions are just npm packages. While one *could* bundle
an npm package in a Python package, that is not required.
* We are moving away from kernels sending JS code to the frontend as much a
possible. While not an official position of the project, I can imagine that
eventually we will remove JavaScript outputs as it is a huge security
problem.
* The frontend JS code is triggered by using various declarative message
sent to the frontend. For things like plots, that would be to return a
display message with a custom MIME type that has an installed npm
package/extension that is registered to render that MIME type.
* Any kernel can send such a message and utilize the same frontend
extension.
* In no cases is it ever required for a kernel to send JS to the frontend.
Cheers,
Brian
…On Tue, Apr 25, 2017 at 7:37 AM, Jan Schulz ***@***.***> wrote:
As this is also a topic which is relevant to other clients/kernels
implementing the stuff: can someone give a pointer how to issue should now
be handled? AT a first glance, I couldn't find anything about this in
http://jupyter-client.readthedocs.io/en/latest/messaging.html
E.g. how should a javascript library (e.g. for a plot) be sent from an R
kernel so that it is cached in the frontend and doesn't need to be resent
(or at least not be saved multiple times)?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#116 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABr0O_G62sgLsnYbU7cFECV3zzLelI5ks5rzgUSgaJpZM4Eq1YK>
.
--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com
|
To add on, my position on javascript (and html) is that outputs should be sandboxed in an iframe. Within that iframe though, we should be able to load assets. |
@ellisonbg Are there any examples, where a (python/R) package implemented such a thing to display something? Also, is there an implementation of a "consumer" of such new mime types, e.g. how would nbconvert handle such messages (when converting to docx via pandoc) and how would a package contribute a "handler" to such a consumer? Building a npm extension to display a R based plot sounds like a lot of work (judging by my knowledge of npm and such stuff, it's probably for most R/python devs something new to learn). On the R side, knitr is king and they have a very easy model with a way to display certain js/html only once. It would be unfortunate if we can't match the ease to display something. So I would be very interested to see such examples. :-) |
I'm interested to hear how knitr does it, since they receive such high praise from a lot of folks I work with. |
See here https://cran.r-project.org/web/packages/knitr/vignettes/knit_print.html -> the "Metadata" section. The biggest difference between jupyter an knitr is that knitr is optimized for converting an object to single output format (mostly markdown+html/js) and jupyter to display something in as much ways as possible. In contrast to jupyter, knitr knows all displayed objects as the complete document is converted and not like in the notebook, only a single cell. Knitr and the objects which get converted also know the final output format. To display something you would add a single-dispatch implementation of the From the above sections: library(knitr)
knit_print.foo = function(x, ...) {
res = paste('**This is a `foo` object**:', x)
asis_output(res, meta = list(
js = system.file('www', 'shared', 'shiny.js', package = 'shiny'),
css = system.file('www', 'shared', 'shiny.css', package = 'shiny')
))
} Knitr will then render the whole document, insert the object representation in the document and collect the meta objects. The meta objects will be made unique and then inserted in the head of the document. When we implemented repr (the equivalent of the ipython display system in the IRkernel), one big "problem" was that we couldn't reuse the R also has a very nice way to create and display html widgets, which are then handled nicely by knitr (knitr even seems to do screenshotting the html structure to embedded in other than html formats). |
mobilechelonian demonstrates one way to get JS to the notebook interface without re-sending it every time: it copies JS to the nbextensions directory, and then sends code to load it from there. The limitation, of course, is that the JS does not become part of the notebook, so it's harder to share the notebook with all its output. |
So this is as best a "workaround" for python packages, but not kernels of other languages. It would also not work on nbviewer. Is that right? How would nbviewer actually handle plots from plotting libs which send their plot as a new mimetype? |
As for the question about support for new mimetypes on nbviewer, support has to be added. Plotly, Vega, geojson, and the new tables are the prime ones to bring in. |
Yes, the "javascript in the python package" is not ideal for non-Python
languages. That is why we are improving it in JupyterLab. But at this
point, we don't have plans on fixing it in the classic notebook. We have
actually attempted to fix it, but we would have to break all the APIs
significantly to do so.
My hope is that once JupyterLab stabilizes, we can use its notebook+output
rendering packages to do client side rendering of notebooks on nbviewer,
with the extensible MIME based output rendering.
…On Sun, Apr 30, 2017 at 4:50 AM, Jan Schulz ***@***.***> wrote:
it copies JS to the nbextensions directory,
So this is as best a "workaround" for python packages, but not kernels of
other languages. It would also not work on nbviewer. Is that right? How
would nbviewer actually handle plots from plotting libs which send their
plot as a new mimetype?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#116 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABr0MZj86vXnqEilrVR9P9f-iOPIcTpks5r1HWTgaJpZM4Eq1YK>
.
--
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com
|
I don't think this is specific to Python kernels. It's convenient to reuse the existing Python function to install the nbextension, but all it's really doing is copying some files, and it wouldn't be hard to implement in another language. It does assume that the kernel is accessing the same filesystem as the server, which doesn't have to be true, but in practice it usually is.
That is right. |
Re-reading this from the future, and I want to add some clarification to this comment that prompted closing:
JupyterLab does not solve this. I think it would be more accurate to say that JuptyerLab has instead committed to not solving this issue. The JupyterLab approach to extensions makes solving these target use cases that prompted this issue more difficult to impossible:
Instead, I would say that JupyterLab draws a more explicit line, that kernel packages and frontend packages are fundamentally separate, and to install a tool that has both frontend and backend components will always require two separate, explicit installation steps (which may be encapsulated in a single metapackage install in the common case where the kernel and server are in the same env). I think the JupyterLab position is that kernel packages should never be able to deliver javascript to the frontend, and instead choose to communicate with mime-types and protocols. There are plenty of reasons for working this way, but we shouldn't claim to have a solution to this issue. |
I agree that there's still a problem worth solving here. Today's workarounds require kernel packages to know what notebook client they are installed to. Suboptimal I think generically, Packages need to be able to define blobs. These blobs can then be loaded dynamically by the client by hash or by alias. Blobs don't have to be JS. Whether the blobs would be stored in the kernel, cached in the notebook server, stored in the notebook server, or a separate service is still unknown. Since I operate at a much lower capacity now, my hope is that you guys can take ownership of this and push it forward. |
Since we haven't agreed on this yet, I'm opening an issue instead of writing an IPEP. If this is agreed on, I'll write an IPEP.
The week before last, @ellisonbg and I brainstormed about Jupyter packaging. As I remember it, the best solution we came up with requires a combination of Python packaging entry-points, a new message type, and static asset caching (in the web server). This is my understanding of how this solution would work (my notes from our meeting are at home, and my apartment is being fumigated, so I don't have access to them).
Jupyter level, kernel extensions
A new message, the first, (in blue) would be added, allowing the server to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.
The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.
A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This is preferred over standard display(JS) calls, because the notebook contents will remain unaffected.
IPython level, kernel extensions
Python entry points will be used as a registry. Two entry points will be defined:
Jupyter level, server extensions and notebook extensions
Python entry points will be used as a registry. Three entry points will be defined:
require
ed when the notebook page loads.EDIT
To help the discussion, issues and specific cases are listed here: https://jupyter.hackpad.com/Packaging-PbIgxnC71or
The text was updated successfully, but these errors were encountered: