You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (jupyterlab/jupyterlab#13277). The logic would be as follows:
if path points to a file withing root_dir, the file should be opened on the frontend for edition
if path points to a file beyond root_dir we should either:
a) do nothing in security sensitive setups
b) ask kernel to provide source of such file and display it as read-only - this is already implemented in ipykernel using debugger adapter protocol source request (this would be necessary for remote kernels)
c) have a custom server extension which would implement ContentsManager API allowing exposing specific files outside of root_dir based on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)
Problem
It is currently impossible to distinguish between 1 and 2 (whether we are within root_dir or outside of it).
For server started in root_dir = "~/server_root", we can expect the following traceback from ipykernel:
The problem is that frontend cannot tell whether ~/server_root/a_file.py is within root_dir or not.
This is the case even if frontend knows what the root_dir is. For example if root_dir is /home/my-username/server_root, the frontend does not know what is the expansion of ~ in the kernel space (it may well be /home/another-username/).
"Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.
Proposed Solution
Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.
This could account for kernels which are spawned in a filesystem different from where the root_dir resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started within root_dir), only one of them, or neither.
# in the case of documents and kernels not being on the same filesystem,
Examples
For simplicity, let's call the proposed endpoint /api/resolve (although maybe it should be integrated with existing file ID manager, in which case it could be /api/fileid/resolve). In pseudocode it would be described as:
I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.
Additional context
Additional scope for exposing source access
As noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn cwd) reusing existing DAP source request. The /api/resolve response could advertise that a path is known by the kernel's source handler. Augmenting the first example:
Per (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond root_dir. This would benefit other uses where access to files on filesystem is desirable (jupyter-lsp/jupyterlab-lsp#850).
A scope provider configured to expose files under ~/shared with server (as in first example) spawned at ~/server_root and kernel spawned in the same location would resolve the following:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/shared/test.py&kernel={uuid}
[{'scope': 'filesystem', 'relative': `~/shared/test.py`}] # filesystem is relative to filesystem root (de facto absolute)# /api/resolve?path=~/not-allowed/test.py&kernel={uuid}
[]
The difference between filesystem and source scope is subtle but noticeable when:
the kernel is running in different separate filesystem than server
there are multiple contents managers
Impact on multiplexed content managers
A number of ways to provide multiple content managers was proposed over the years:
Jupyter(Lab) IDrive frontend API which may be connected to alternative /api/conents endpoint
jpmorganchase/jupyter-fs using MetaManager where drives are managed on the server side rather than frontend
the IDrive would need to amended to allow providing an URL for alternative /api/resolve.
the drive-aware meta-managers like jupyter-fs should be able to handle for /api/resolve by overriding implementation of ContentsManager.resolve_path to account for drive prefixes.
The proposed solution would make it easier to find out root_dir from the frontend because a user could check numerous paths and deduce root_dir path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.
Motivation
It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (jupyterlab/jupyterlab#13277). The logic would be as follows:
root_dir
, the file should be opened on the frontend for editionroot_dir
we should either:source
request (this would be necessary for remote kernels)root_dir
based on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)Problem
It is currently impossible to distinguish between 1 and 2 (whether we are within
root_dir
or outside of it).For server started in
root_dir = "~/server_root"
, we can expect the following traceback from ipykernel:The problem is that frontend cannot tell whether
~/server_root/a_file.py
is withinroot_dir
or not.This is the case even if frontend knows what the
root_dir
is. For example ifroot_dir
is/home/my-username/server_root
, the frontend does not know what is the expansion of~
in the kernel space (it may well be/home/another-username/
)."Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.
Proposed Solution
Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.
This could account for kernels which are spawned in a filesystem different from where the
root_dir
resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started withinroot_dir
), only one of them, or neither.jupyter_server/jupyter_server/services/kernels/kernelmanager.py
Line 195 in 09c15ce
Examples
For simplicity, let's call the proposed endpoint
/api/resolve
(although maybe it should be integrated with existing file ID manager, in which case it could be/api/fileid/resolve
). In pseudocode it would be described as:For a server spawned at
~/server_root
with a kernel spawned in the same location:For a server spawned at
~/server_root
with a kernel spawned in~/server_root/kernel
:For a server spawned at
~/server_root
with a kernel spawned in/tmp/kernel
:I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.
Additional context
Additional scope for exposing
source
accessAs noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn
cwd
) reusing existing DAPsource
request. The/api/resolve
response could advertise that a path is known by the kernel'ssource
handler. Augmenting the first example:Additional scope for broader
filesystem
accessPer (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond
root_dir
. This would benefit other uses where access to files on filesystem is desirable (jupyter-lsp/jupyterlab-lsp#850).A scope provider configured to expose files under
~/shared
with server (as in first example) spawned at~/server_root
and kernel spawned in the same location would resolve the following:The difference between
filesystem
andsource
scope is subtle but noticeable when:Impact on multiplexed content managers
A number of ways to provide multiple content managers was proposed over the years:
IDrive
frontend API which may be connected to alternative/api/conents
endpointMetaManager
where drives are managed on the server side rather than frontendMixedContentsManager
) - deprecatedWith the proposed solution:
IDrive
would need to amended to allow providing an URL for alternative/api/resolve
.jupyter-fs
should be able to handle for/api/resolve
by overriding implementation ofContentsManager.resolve_path
to account for drive prefixes.C-f jupyter/notebook#3233
Impact on security by obscurity
The proposed solution would make it easier to find out
root_dir
from the frontend because a user could check numerous paths and deduceroot_dir
path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.Related discussions
ContentsManager
injupyter-lsp
server?The text was updated successfully, but these errors were encountered: