-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate jupyter-archive #252
Conversation
I am -1 on this. The point of the Besides, there are many case where we specifically don't want to have such an endpoint, like in the case of a voilà application, where we don't want to expose the content of directories... What is the issue with keeping this feature in an extension to the Jupyter server? |
@SylvainCorlay I understand voila's requirement. We can think to other features that we could expose to make client's life easier but that would not be recommended for specific use cases (voila or others). The goal is indeed to see the proliferation of a rich extensions ecosystem, and at some point, we need to make a decision if we integrate an extension which would be used by the mass, or if we ask the mass to install separated python packages and change their launch command to enable them. For the archive case, my impression is that it is generic and needed enough to get that into jupyter server repository so that end users should not install separated package. To support voila.. requirements, I was thinking to introduce That concept of profile still needs to be defined/implemented, it is just an idea for now. |
I understand your concern but as long as a file handler already exists I don't see the reason why a dir handler could not exist. As for the extension, it's great to have it but it's not shipped by default and so require to install it. |
I think that adding new handlers in core makes the server heavier to move forward - while the point of the split was to make it lighterweight. |
I'd consider this to be missing functionality in the notebook server that we are rectifying. |
It is probably ok for this extension to be moved to a jupyter org, and to be a dependency of e.g. lab and/or nbclassic, so that it always gets installed! But I don't see the point of including it in the core repository.
The auto-enabling of server extensions is more security sensitive than front-end extensions. You install a new package in your environment, and you may have a new endpoint on your server that you were not expecting. I think that the way to handle this was for the ExtensionApp system to allow the app used as the main entry point to disable other extensions (which would be used by Voilà, and e.g. not JupyterLab). |
One of the things the server does at its core is serve files, this is fixing a hole in that capability, not adding a new capability. |
I like the "everything is an extension" approach of lab, and I think it is a reasonable thing to do for jupyter_server as well. Making things extensions also makes it easier to have a separation of concerns when building upon it. |
My only usage of Jupyter is throught JLab so I would be fine having it there personally. But @blink1073 has a good point saying it's more a missing feature than a new one considering that file handler already exists. So it should belong to core. Or to stay consistent we should move the existing file handler outside of core? You guys are more experienced than me about the jupyter ecosystem so I'll let you decide where you want this to go. |
I agree that this PR belongs in the core server. Regarding @SylvainCorlay's concern about agility and weight, I think we should try to distinguish between today's server extension and core service. IMHO, an extension brings functionality that a) is not necessarily mainstream, and b) is not an obvious improvement or relation to existing (core) functionality. Extensions could also be POCs, who's end game is to be promoted into server core once mature (as is the evolution of many lab extensions). It sounds like Voila could leverage service composition where Contents/Files handlers are not available at runtime and perhaps we should try to make that easier for applications. |
+1 to get this feature into core. Content is anyway already exposed, we just provide a shortcut to download in one shot what could have been dowloaded (disclaimer: I have not yet looked that the content security model is applied in this PR - I assume it is and that we don't allow access to content that should not be allowed). About the comparison with jlab, you can enable/disable a core extension. Not that easy... you need to find the correct setting (search for The |
Please understand that my concern was more architectural (embracing the "everything is an extension" approach) than a concern about Voilà, and is not about the usefulness or the quality of code. If everyone feels that it belongs to core, no problem with moving forward with this. |
I am also -1 to adding new endpoints to the base server. This can and should be done via extensions. We should be conservative about which APIs are guaranteed to be available to consumers. |
Thanks @ivanov. It sounds like we should discuss this further at tomorrow's (July 9) server meeting. These are fundamental concepts regarding direction that we should be on the same page about. |
Side-comment about the functionality -- it might be nice to define restrictions for archive sizes to/from the server that's enforced if this is getting added. I'd hate to have the notebook server crashing for all users when one user tries to upload a 10GB archive or download a working directory with all their data files. This can technically happen with individual files, but it's a lot easier to accidentally do this when exposing directory interactions. |
What about a warning on the jlab side and/or an option that is set to a certain size by default but can be increased as needed? Both above proposals are frontend only. |
Option to set the size seems reasonable. A warning is usually too late if you've started the actions because you're now racing to an OOM event. Usually what I've seen for doing this in a dynamic nature / to support wanrings is to make it have a server default configurable value with a 413 response code if it's an oversized request and allow an explicit size limit as an API request parameter. This lets the front-end prompt a user about the issue, and for interfaces to override if they really want to. But really any interface that keeps the accidental bad-action from causing a crash is usually good enough in my experience. |
Make sense to me as long as this is something configurable and not hardcoded on the server-side. I will let others comment about it. |
Have you decided something about this PR? |
Thanks for pinging here, @hadim. We discussed this a couple of weeks ago (see the notes here). We are planning to accept this change. Unfortunately, our attention has been directed at getting Jupyter Server into JupyterLab before it's 3.0 API freeze scheduled for next week. Once we get to a good place there, we'll circle back and finish reviewing this PR. Thanks for your patience @hadim! I really appreciate the work you're doing here. |
@hadim, are you available to add some missing docstrings to a few spots in this PR? We are getting close to releasing 1.0 and this would make a great addition. |
Sure. Please comment directly in the code and I will try to address your reviews this week. I will also add docstrings where needed. |
@blink1073 let me know whether this is ok for you. By the way, tests fails on my machine (python 3.8, Ubuntu 20.04) while it works on the CI. Errors: FAILED tests/services/contents/test_api.py::test_delete[-inroot] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[Directory with spaces in-inspace] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[unicod\xe9-innonascii] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[foo-a] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[foo-b] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[foo-name with spaces] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[foo-unicod\xe9] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[foo/bar-baz] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[ordering-A] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[ordering-b] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[ordering-C] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete[\xe5 b-\xe7 d] - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete_dirs - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error
FAILED tests/services/contents/test_api.py::test_delete_non_empty_dir - tornado.httpclient.HTTPClientError: HTTP 500: Internal Server Error |
Ah, I see.
Good point, that's because they aren't under the |
Kicking CI to see if the timeout was spurious |
@blink1073 anything else I can do here? |
I think we're good, thanks! I'll bring this up again at the server meeting tomorrow to make sure we want to commit to this API expansion. |
Notes from today's meeting
|
@blink1073 if I set
|
I can make the test to pass with |
I am glad ya'll are working on this, but I will ask again if there is a reason why the jupyter archive stuff cannot be an extension? It really feels like no one wants to justify putting it in as part of the core API, as is, other than having a case of the "gimmes". I understand the excitement of a new feature, but can we please recognize how much pain it will be to handle changing these APIs if this ships to everyone from the get-go versus allowing it to bake as an opt-in extension? There's so much complexity around this that it will become its own can of worms. Here are a few that come to mind:
|
@blink1073 can you clarify the status of this during the next jupyter meeting? |
Hi @hadim, we discussed this in today's meeting. We came to the consensus that we would like to see this continue as a separate server extension, but would eventually like it to become a Jupyter community managed extension. We have other repositories that could become siblings to |
Not sure I agree about all the points raised in this ticket going against this integration but you guys decided as a community, not me. (at least make it default in JLab...) |
* add robots * subclass jupyter robots * working robots * ignore robots logic for test coverage score * fix command for running robots in make file * add dependency on jupyter robot framework * try a different test image * setup chromedriver * try installing chrome * install chromedriver from source * install chrome * try different image * remove webdriver * top level docker image in rio * see if new docker image is used * build chrome and chromedriver from scratch * different installing command for chrome * wrong package name in rio * just install driver * install webdriver with new docker image * remove robots from rio * update manifest to robots
Two new API endpoints:
r"/directories/(.*)",
to download a directory (after packing it into an archive).r"/extract-directories/(.*)",
: extract an archive into a directory on the server side.This implementation is also being discussed at #251
And the original project is located at jupyterlab-contrib/jupyter-archive#2
(I used black to format the files I was working on so some diffs are not related to the PR, sorry for that.)