-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identity API #638
Comments
Also pointing to the auth plugin in Jupyverse, where some work was started on this. We use FastAPI-Users which has |
Talking about users (with a That PR should be rename "Multiuser support for jupyter server". In a RTC context with multiple users on a single jupyter-server, asking for "me" requires that PR (or similar) |
It have also the nice side effect that permission for an user may evolve during a session. For example, an user A may share its server with an user B granting him read-only access. But during the collaborative session, user A decides it will grant execute access to B. This should work transparently for user B.
This is also coherent with the classical security pattern that authorization and authentication rights should not be stored permanently but should be refreshed periodically. So the authorization and the authentication cache will be handled by the server and the client should not persist the permissions. |
Yes, and as much as possible, the frontend shouldn't need to ask what permissions it has ahead of attempting to take actions that may fail. But I suspect there will be a few cases, at least, where it will want to check to disable certain UI/behaviors. @echarles I'm not quite sure what the sessions would be required for here. In JupyterHub, and custom endpoints in general, defining |
Let's say you have But maybe I miss something: or it is not needed to know who is making the request, or jupyter-server/tornado can say that today, or in your mental model jupyterhub is responsible to say that ? I have renamed #391 to "Multi user server wit session management" |
I think it does, this is the authentication system that sets a cookie in the user's browser after he logs in. |
Yes, it's absolutely JupyterHub's responsibility (or whatever authentication plugin you use that overrides LoginHandler.get_user). When you run under Jupyterhub, this already works. The default LoginHandler implementation sets the same "anonymous" username for all authenticated requests, though, because it hasn't yet had a reason to distinguish between connections. We would also need to change our token authentication to generate different tokens for different users to distinguish between users by default. The system as a whole doesn't discriminate between browser sessions, though, so if you need to distinguish between 'me' in Firefox and 'me' in Safari, that's certainly something Sessions would be needed for, e.g. multiple RTC connections that may have equivalent credentials. Multiple sessions and multiple users are definitely related, but not quite the same thing. |
Maybe the following case will help to discussion. As it is today (without going into the session, cookie... technical stuff), I'd like to make sure the user who query the single
|
For my understanding of RTC requirements, I definitely think your session-tracking feature makes sense, independent of whether Users are distinguishable. I take back what I said about user_id always being "anonymous", though. That's only true when auth is disabled. It is set to a random uuid for each separate login cookie that's set. So it would already be the case that every browser has a different UUID for a username with the default implementation. We could certainly switch this to be a more realistically populated random User dict, if that would be useful. However, in JupyterHub, or any other LoginHandler that implements actual authentication, multiple browsers logged in as the same user would be equivalent, and you would absolutely need a separate 'session' concept to distinguish them. Still, I think it's important to make clear that two sessions may have the same user, and not conflate the two. We can certainly decide that the |
Thx. I was discussing this
I am following the great work the JupterHub is doing, especially the latest feature around authorization, but I am trying for now to discuss without a hard requirement on JupyterHub. Jupyter Server Identity should work nicely with JupyterHub, RTC, and the rest of the world.
Web sessions are tricky. They are used to distinguish and put server information for User1 (running on laptop1 with browser1) vs User2 (running on laptop2 with browser2). My experience in other frameworks/languages is that User1 (running on laptop1 with browser1) and User1 (running on laptop1 with browser2) - think you connecting with the same credentials to the same server on Chrome and on Firefox - will be assigned 2 different server sessions. this is not much different from what I see when I connect (even with 2 different tabs in Chrome) to the same Google Meet session: You will be seen as 2 different users, with the same picto - If this duplication/multiplication could be real.... :) But I would consider this last point (same user with different sessions) like a edge case which should be taken / solved for now. |
Yes, I don't want to assume JupyterHub for any of this, either. From Jupyter Server's perspective, all JupyterHub does is implement Jupyter Server's declared extension API of I just want it to be clear that session management is a level above users, and some things are associated with the session, while others are associated with the user. It seems important to not conflate the two. I'm AOK with deciding that |
I am OK with the previous. I just want to emphasis that in a RTC work, to deliver
Agree. the complete levels look more something like that (rtc needs multiuser, which needs sessions that deliver user info for each users)
|
I'm not up to speed on the session storage requirements for RTC. What would get stored in the session-store that is added in #391? #391 seems to add a lot of per-session logic (arbitrary per-session key-value store). Do you foresee that as a substantial need for RTC, or is the need mostly the presence of unique session ids (since user ids will not be unique)? Storing the user info in the session store doesn't make a lot of sense to me, since that should come from the auth provider. If it's only the unique session id, then it seems like it could be accomplished with a substantially smaller change:
|
There may be smaller changes like you explained, but I am looking at (maybe the only one here...) a stronger solution where any server extension will benefit from a read/write KV session storage to put the user information it needs. We are not building here a ecommerce website, but the typical example is an extension maintaining a shopping basket. For jupyter, this could be the list of opened notebook with for each notebook the connected users with their permission. Just put the session infrastructure in place (which BTW can still be qualified as small change) and use cases will come. |
I think we might be getting a little off track here, but I don't agree that we should build significant new features without specific, concrete uses in mind. I'm not saying those don't exist, I just don't know what they are. I'm not sure how relevant this discussion is to the identity api, other than the fact that it will make sense to include a session id if/when one exists. |
I didn't click through to #122 which has the more detailed discussion, sorry! I see the discussion in more detail, there. In any case, I'll leave the session management discussion to those already participating there. |
Sorry, I have given the impression that I wanted a feature and then use case would come. My thought is that the primary use case for Identity API is RTC and that RTC needs sessions. But it looks like you have a clear view on how to implement that Identity API, so that's fine. Sorry for the distraction. |
Sorry, my fault for not catching up with the relevant discussions! I think the main thing to establish here is if this endpoint is somehow redundant or conflicting with your session storage plan. As long as this endpoint still makes sense, and the main interaction is what exactly goes in the model, then I think everything's alright. |
Thank you @minrk, for opening up the conversation.
Yes. Now that we introduced RTC, we have multiple users accessing the same server. This raises the necessity of identity. We need to show every user the identity of everyone with access to the server. In addition, not every user must have the same level of permission (some of them will be able only to read documents, while others can also write, but only one or a small group of them will have access to settings, terminals, and kernels). Each user will have a slightly different UI depending on the permission level. For this purpose, we need to know in advance the scopes of the current user, and it is not necessary to see the permission level of the user when launching the server (
Even though this approach would be enough for JupyterLab to check the user's scopes, in my opinion, the identity endpoint should return the identity of the user with his permissions. For my understanding, what #165 proposes is a hook We do not need to declare the list of scopes in advance. We can default to the most restrictive permission if JupyterLab expects a specific scope, not on this list. |
Another topic I wanted to discuss (let me know if this should be in another issue) is the possibility of having an endpoint in Jupyter-Server to request who has access to the server and which permission. During the RTC meeting, I asked about the possibility of having a connected users endpoint in Jupyter-Server. I was wrong. The endpoint that we need is to know which users have permission to access the server and also an endpoint to allow users to grant permissions to other users. I believe this is out of the scope of Jupyter-Server but more an answer that the authorization provider should respond. Is it possible to create another hook, like |
Why do we need that? I don't quite follow the need to see information about everyone who might have access.
Requiring that the user be able to return complete permissions is a substantial change in specification (that doesn't mean it's wrong), because it means all possible scopes for all possible extension must be knowable, and complete permissions for given user must be available in a static form. The implementation as it is now is very simple because For instance, the default "AlwaysAllowAuthorizer" that preserves the current behavior of the server looks like: def is_authorized(action, resource):
return true whereas to return the corresponding list of permissions as part of the user model would require:
It does not need to be in advance, but it does need to be available to compute the list of permissions when the user model is requested. That effectively makes it a required part of the Jupyter Server Extension API to declare any and all permissions it will use. The explicit "check permissions" endpoint, on the other hand, is vastly simpler because it matches calls to |
Problem
As we proceed with authorization (#165, jupyterlab/jupyterlab#11434, jupyterlab/jupyterlab#11355), it's becoming apparent that frontends like JupyterLab are going to want to know, at least to some degree, what permissions they have. They also want things like the name, etc. for populating the identity widget (see jupyterlab/jupyterlab#11443).
Proposed Solution
GET /api/identity
endpoint to return the current user identity, and define a minimum model (additional keys should be allowed)get_current_user
and Authorizer extension points.Example:
User model (coordinate fields with jupyterlab/jupyterlab#11443) should have at least a
username
string andpermissions
dict, representing state of #165.The permissions part is tricky, because this assumes clear, complete declarative permissions.
However, none of the proposed examples in #165 actually work that way.
An alternative for the permissions part would be to have an explicit check permissions endpoint that takes a list of permissions to check, and returns which are permitted. This avoids the need to always return all possible permissions, and only returns answers to the questions the client needs to know, and would remove the need for permissions to be part of the identity model.
In JupyterHub, at least, permissions are a declarative property of the identity model, so we could shift the implementation of the Authorizer in that direction, too.
Additional context
The text was updated successfully, but these errors were encountered: