Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve server absolute path to server or kernel-relative path #1280

Open
krassowski opened this issue May 21, 2023 · 0 comments
Open

Resolve server absolute path to server or kernel-relative path #1280

krassowski opened this issue May 21, 2023 · 0 comments

Comments

@krassowski
Copy link
Collaborator

Motivation

It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (jupyterlab/jupyterlab#13277). The logic would be as follows:

  1. if path points to a file withing root_dir, the file should be opened on the frontend for edition
  2. if path points to a file beyond root_dir we should either:
    • a) do nothing in security sensitive setups
    • b) ask kernel to provide source of such file and display it as read-only - this is already implemented in ipykernel using debugger adapter protocol source request (this would be necessary for remote kernels)
    • c) have a custom server extension which would implement ContentsManager API allowing exposing specific files outside of root_dir based on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)

Problem

It is currently impossible to distinguish between 1 and 2 (whether we are within root_dir or outside of it).

For server started in root_dir = "~/server_root", we can expect the following traceback from ipykernel:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 from a_file import test

File ~/server_root/a_file.py:1
----> 1 test

NameError: name 'test' is not defined

The problem is that frontend cannot tell whether ~/server_root/a_file.py is within root_dir or not.

This is the case even if frontend knows what the root_dir is. For example if root_dir is /home/my-username/server_root, the frontend does not know what is the expansion of ~ in the kernel space (it may well be /home/another-username/).

"Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.

Proposed Solution

Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.

This could account for kernels which are spawned in a filesystem different from where the root_dir resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started within root_dir), only one of them, or neither.

# in the case of documents and kernels not being on the same filesystem,

Examples

For simplicity, let's call the proposed endpoint /api/resolve (although maybe it should be integrated with existing file ID manager, in which case it could be /api/fileid/resolve). In pseudocode it would be described as:

class PathResolver(Protocol):
     def resolve_path(self, path: str) -> str: ...

class ContentsManager(..., PathResolver): ...
class KernelManager(..., PathResolver): ...

def handle_resolve(self, path: str, kernel_uuid: str):
    scopes = [
        self.contents_manger,
        self.multi_kernel_manager.get_kernel(kernel_uuid),
        *self.get_additional_scopes(kernel_uuid)
    ]
    return [
        scope.resolve_path(path)
        for scope in scopes
        if hasattr(scope, 'resolve_path')
    ]

For a server spawned at ~/server_root with a kernel spawned in the same location:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

For a server spawned at ~/server_root with a kernel spawned in ~/server_root/kernel:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=~/server_root/kernel/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'kernel/test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

For a server spawned at ~/server_root with a kernel spawned in /tmp/kernel:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=/tmp/kernel/test.py&kernel={uuid}
[{'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]

I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.

Additional context

Additional scope for exposing source access

As noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn cwd) reusing existing DAP source request. The /api/resolve response could advertise that a path is known by the kernel's source handler. Augmenting the first example:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}, {'scope': 'source', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}, {'scope': 'source', 'relative': 'test.py'}]
# /api/resolve?path=~/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/home/user/test.py`}]
# /api/resolve?path=/lib/python/library/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/lib/python/library/test.py`}]

Additional scope for broader filesystem access

Per (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond root_dir. This would benefit other uses where access to files on filesystem is desirable (jupyter-lsp/jupyterlab-lsp#850).
A scope provider configured to expose files under ~/shared with server (as in first example) spawned at ~/server_root and kernel spawned in the same location would resolve the following:

# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/shared/test.py&kernel={uuid}
[{'scope': 'filesystem', 'relative': `~/shared/test.py`}]  # filesystem is relative to filesystem root (de facto absolute)
# /api/resolve?path=~/not-allowed/test.py&kernel={uuid}
[]

The difference between filesystem and source scope is subtle but noticeable when:

  • the kernel is running in different separate filesystem than server
  • there are multiple contents managers

Impact on multiplexed content managers

A number of ways to provide multiple content managers was proposed over the years:

With the proposed solution:

  • the IDrive would need to amended to allow providing an URL for alternative /api/resolve.
  • the drive-aware meta-managers like jupyter-fs should be able to handle for /api/resolve by overriding implementation of ContentsManager.resolve_path to account for drive prefixes.

C-f jupyter/notebook#3233

Impact on security by obscurity

The proposed solution would make it easier to find out root_dir from the frontend because a user could check numerous paths and deduce root_dir path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.

Related discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant