Resolve server absolute path to server or kernel-relative path #1280

krassowski · 2023-05-21T12:02:56Z

Motivation

It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (jupyterlab/jupyterlab#13277). The logic would be as follows:

if path points to a file withing root_dir, the file should be opened on the frontend for edition
if path points to a file beyond root_dir we should either:
- a) do nothing in security sensitive setups
- b) ask kernel to provide source of such file and display it as read-only - this is already implemented in ipykernel using debugger adapter protocol source request (this would be necessary for remote kernels)
- c) have a custom server extension which would implement ContentsManager API allowing exposing specific files outside of root_dir based on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)

Problem

It is currently impossible to distinguish between 1 and 2 (whether we are within root_dir or outside of it).

For server started in root_dir = "~/server_root", we can expect the following traceback from ipykernel:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 from a_file import test

File ~/server_root/a_file.py:1
----> 1 test

NameError: name 'test' is not defined

The problem is that frontend cannot tell whether ~/server_root/a_file.py is within root_dir or not.

This is the case even if frontend knows what the root_dir is. For example if root_dir is /home/my-username/server_root, the frontend does not know what is the expansion of ~ in the kernel space (it may well be /home/another-username/).

"Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.

Proposed Solution

Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.

This could account for kernels which are spawned in a filesystem different from where the root_dir resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started within root_dir), only one of them, or neither.

jupyter_server/jupyter_server/services/kernels/kernelmanager.py

Line 195 in 09c15ce

# in the case of documents and kernels not being on the same filesystem,

Examples

For simplicity, let's call the proposed endpoint /api/resolve (although maybe it should be integrated with existing file ID manager, in which case it could be /api/fileid/resolve). In pseudocode it would be described as:

class PathResolver(Protocol):
     def resolve_path(self, path: str) -> str: ...

class ContentsManager(..., PathResolver): ...
class KernelManager(..., PathResolver): ...

def handle_resolve(self, path: str, kernel_uuid: str):
    scopes = [
        self.contents_manger,
        self.multi_kernel_manager.get_kernel(kernel_uuid),
        *self.get_additional_scopes(kernel_uuid)
    ]
    return [
        scope.resolve_path(path)
        for scope in scopes
        if hasattr(scope, 'resolve_path')
    ]

For a server spawned at ~/server_root with a kernel spawned in the same location:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

For a server spawned at ~/server_root with a kernel spawned in ~/server_root/kernel:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=~/server_root/kernel/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'kernel/test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

For a server spawned at ~/server_root with a kernel spawned in /tmp/kernel:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=/tmp/kernel/test.py&kernel={uuid}
[{'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.

Additional context

Additional scope for exposing `source` access

As noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn cwd) reusing existing DAP source request. The /api/resolve response could advertise that a path is known by the kernel's source handler. Augmenting the first example:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}, {'scope': 'source', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}, {'scope': 'source', 'relative': 'test.py'}]
# /api/resolve?path=~/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/home/user/test.py`}]
# /api/resolve?path=/lib/python/library/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/lib/python/library/test.py`}]

Additional scope for broader `filesystem` access

Per (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond root_dir. This would benefit other uses where access to files on filesystem is desirable (jupyter-lsp/jupyterlab-lsp#850).
A scope provider configured to expose files under ~/shared with server (as in first example) spawned at ~/server_root and kernel spawned in the same location would resolve the following:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/shared/test.py&kernel={uuid}
[{'scope': 'filesystem', 'relative': `~/shared/test.py`}]  # filesystem is relative to filesystem root (de facto absolute)
# /api/resolve?path=~/not-allowed/test.py&kernel={uuid}
[]

The difference between filesystem and source scope is subtle but noticeable when:

the kernel is running in different separate filesystem than server
there are multiple contents managers

Impact on multiplexed content managers

A number of ways to provide multiple content managers was proposed over the years:

Jupyter(Lab) IDrive frontend API which may be connected to alternative /api/conents endpoint
jpmorganchase/jupyter-fs using MetaManager where drives are managed on the server side rather than frontend
viaduct-ai/hybridcontents - status not clear
jupyter/jupyter-drive (MixedContentsManager) - deprecated

With the proposed solution:

the IDrive would need to amended to allow providing an URL for alternative /api/resolve.
the drive-aware meta-managers like jupyter-fs should be able to handle for /api/resolve by overriding implementation of ContentsManager.resolve_path to account for drive prefixes.

C-f jupyter/notebook#3233

Impact on security by obscurity

The proposed solution would make it easier to find out root_dir from the frontend because a user could check numerous paths and deduce root_dir path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.

Related discussions

The text was updated successfully, but these errors were encountered:

krassowski added the enhancement label May 21, 2023

krassowski mentioned this issue May 21, 2023

Open files from errors jupyterlab/jupyterlab#13390

Merged

Zsailer mentioned this issue Jul 7, 2023

Meeting Notes 2023 jupyter-server/team-compass#45

Closed

fcollonval mentioned this issue Jul 13, 2023

feat(filebrowser): copy absolute path instead of relative path jupyterlab/jupyterlab#14539

Closed

krassowski mentioned this issue Sep 28, 2023

Path resolver API #1331

Open

krassowski mentioned this issue Oct 23, 2023

Accessing files of a custom drive from a kernel jupyterlab/jupyterlab#15289

Open

This was referenced Jan 31, 2024

Path resolution by kernel manager and providers jupyter/jupyter_client#1005

Open

Exposing kernel contents (file system) via comms jupyter/jupyter_client#1006

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve server absolute path to server or kernel-relative path #1280

Resolve server absolute path to server or kernel-relative path #1280

krassowski commented May 21, 2023

Resolve server absolute path to server or kernel-relative path #1280

Resolve server absolute path to server or kernel-relative path #1280

Comments

krassowski commented May 21, 2023

Motivation

Problem

Proposed Solution

Examples

Additional context

Additional scope for exposing source access

Additional scope for broader filesystem access

Impact on multiplexed content managers

Impact on security by obscurity

Related discussions

Additional scope for exposing `source` access

Additional scope for broader `filesystem` access