Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] New widget : RemoteFileSelector #6301

Closed
wants to merge 7 commits into from

Conversation

pierrotsmnrd
Copy link
Collaborator

@pierrotsmnrd pierrotsmnrd commented Feb 4, 2024

This PR is a work in progress

Purpose

This PR introduces a new widget : RemoteFileSelector

It's inspired by and based on the standard panel FileSelector, but its goal is to display a remote filesystem.

RemoteFileSelector is only responsible for showing the data, and selecting files/dirs. It relies on a RemoteFileProvider that is responsible for fetching the data from the remote filesystem.

This PR contains S3RemoteFileProvider, used to display data from a S3 Bucket via a RemoteFileProvider.

In the future, we can imagine other file providers to fetch data from a Github repo, a FTP server, etc.

Example

import panel as pn
from dotenv import load_dotenv
import os
import s3fs

""" 
Setup a .env file with a content similar to :

S3_API_KEY=
S3_API_SECRET=
S3_BUCKET_NAME=
S3_SOURCE_URL= 
"""

load_dotenv()

def index_page():
    
    bucket_name = os.environ['S3_BUCKET_NAME']
    fs = s3fs.S3FileSystem(
        key=os.environ['S3_API_KEY'],
        secret=os.environ['S3_API_SECRET'],
    
        # only necessary if you are not using AWS S3, remove it otherwise
        client_kwargs={"endpoint_url": os.environ['S3_SOURCE_URL']},
        skip_instance_cache=True,
    )

    remote_file_provider = pn.widgets.S3RemoteFileProvider(fs=fs, 
                                                          buckets=[bucket_name], 
                                                )
    remote_file_selector = pn.widgets.RemoteFileSelector(provider=remote_file_provider)

    return pn.Column(
        pn.pane.Markdown("## Remote File Selector example"),
        remote_file_selector
    )

pn.serve({"/": index_page})

This example, given a proper .env file, yields a result similar to :
Capture d’écran 2024-02-04 à 14 56 28

Remaining to do :

  • Tests : I have added a DummyRemoteFileProvider for this purpose in panel/tests/widgets/test_remote_file_selector.py. I need to write a proper test using it
  • Documentation : I need to finish writing example/reference/widgets/RemoteFileSelector.ipynb

I can use some help, guidance and advice, for these two points.

Copy link

codecov bot commented Feb 4, 2024

Codecov Report

Attention: 133 lines in your changes are missing coverage. Please review.

Comparison is base (79512c7) 82.74% compared to head (2f53a6b) 71.69%.

Files Patch % Lines
panel/widgets/remote_file_selector.py 26.16% 127 Missing ⚠️
panel/tests/widgets/test_remote_file_selector.py 50.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #6301       +/-   ##
===========================================
- Coverage   82.74%   71.69%   -11.06%     
===========================================
  Files         301      303        +2     
  Lines       45248    45433      +185     
===========================================
- Hits        37442    32573     -4869     
- Misses       7806    12860     +5054     
Flag Coverage Δ
ui-tests ?
unitexamples-tests 71.69% <28.10%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@philippjfr
Copy link
Member

Really nice, will review this week!

Copy link
Member

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the PR is still in draft mode, but I have left some comments.

You don't need to answer them before the PR is ready to review.

raise NotImplementedError()

# for S3RemoteFileProvider
import s3fs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to make this a required dependency.

As far as I can, see this can be put into a if TYPE_CHECKING

Comment on lines +32 to +33
def __init__(self):
super().__init__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self):
super().__init__()
def __init__(self, **params):
super().__init__(**params)

Otherwise, parameters from inherited classes will not work.

(The __init__ is not needed here, but since this PR is still in draft, things can change)

self.buckets = buckets
self.file_pattern = file_pattern

async def ls(self, path:PurePosixPath):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this is async with no await inside it.

super().__init__()

async def ls(self, path:PurePosixPath):
time.sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
time.sleep(1)
asyncio.sleep(1)

So it makes sense with async.


# Set up state
self._stack = []
self._cwd = PurePosixPath("/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always the case it is a PosixPath?

There could be a RemoteFileProvider that connects to a Windows Server.

@philippjfr
Copy link
Member

My main question surrounds RemoteFileProvider. I like the idea that this is extensible but I also wonder if we can provide a generic wrapper that supports s3fs, gcfs, adlfs etc. without having to bring a bunch of provider specific logic into Panel.

@Coderambling
Copy link
Contributor

Coderambling commented Mar 2, 2024

Extending to Google Drive would be great as well (or is that what is meant by gcfs)? Google Drive has a Python API with OAuth 2, but I guess that it is provider specific logic. But isn't that unavoidable in these cases, to enable remote file access to S3, Google Drive, Microsoft, etc. that are widely used? Would definitely be useful to have this though.

Isn't there a python library somewhere that can access multiple remote filesystems that can be leveraged? So a (well-maintained) python equivalent to something like this? https://github.com/rclone/rclone?tab=readme-ov-file

@Coderambling
Copy link
Contributor

Or start with the Top 3 besides S3 if that is already there? So Google Drive, Microsoft and ...

@Coderambling
Copy link
Contributor

Coderambling commented Mar 11, 2024

Related to this PR ?

@Coderambling
Copy link
Contributor

Coderambling commented Mar 29, 2024

Recent reported working example of using fsspec.gui.FileSelector to access remote files in a Notebook with screenshot: fsspec/community#9

Could this be an already existing solution for this issue @philippjfr ?

Especially because fsspec supports many different filesystems.

fsspec/community#9

@philippjfr
Copy link
Member

This was super helpful as a starting point, thanks @pierrotsmnrd. I've picked this up and integrated the remote file support in the regular FileSelector and also the new FileTreeSelector in #6837.

@philippjfr philippjfr closed this Jun 25, 2024
@Coderambling
Copy link
Contributor

Coderambling commented Jul 30, 2024

Hi @pierrotsmnrd . Great work!

You mentioned the below document, but I can't find it in the Panel repo or in your fork.

Do you have a link to the doc?

Documentation : I need to finish writing example/reference/widgets/RemoteFileSelector.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants