Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip protocol from paths in fsspec.generic.GenericFileSystem._copy #1577

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ryaminal
Copy link

Seems to fix a problem I was having.
Can't do it upstream in the rsync method because at least one file needs the protocol to resolve the proper filesystem.

@martindurant
Copy link
Member

This is probably reasonable (even given a fix in sshfs), but it would be nice if we could engineer a case that failed before this fix and now passes, as a test against regression.

@ryaminal
Copy link
Author

ryaminal commented Apr 17, 2024

You are referencing this fix in sshfs for the _info/isdir stuff, right? Yeah, that fix got around one problem I was having with GenericFileSystem when it checks for a dir.

This fix is to get around another problem i was having after I resolved the sshfs one above.

And yeah, a breaking/fixing test case is a grand idea. I will work on that. I imagine I can use a mockssh server for this, perhaps? hmmm.

fs = fsspec.url_to_fs("sftp://user@host:1234")[0]

fsspec.generic.rsync(
    "sftp:///stuff",
    "gs://dbucket/dir1/dir2",
    inst_kwargs={"default_method": "current"},
)

is what i was using locally to test this... i wonder if the existing test_rsync isn't catching this here because it doesn't use the default_method=current? the other tests, however, do but not rsync.

@martindurant
Copy link
Member

I imagine I can use a mockssh server for this, perhaps?

memory, local and ftp FSs are the typical ones used in tests (in that order). You could make a mock, but if any of those show the problem, it would be easier.

…col. this might have to be done in all _info and info methods; or just fix genericfilesystem somehow?
@ryaminal
Copy link
Author

hmmm, seems like any filesystem that doesn't strip in their _info or info calls has a similar problem with GenericFileSystem.

I decided to try this with the built in implementation of sftp instead of sshfs and sure enough, same issue with isdir failing and the same fix.

also was able to reproduce this in the test by commenting out the strip in the memory impl here.

with that line commented out the test fails in the same way as sshfs and the sftp impl.

i don't think the fix can be done in AbstractFileSystem or AsyncAbstractFileSystem so every impl needs to make sure they strip the protocol in their info and _info methods?

@martindurant
Copy link
Member

Yes I think all backends should be able to cope with paths with or without protocol specifiers.

@ryaminal
Copy link
Author

bleh, this is becoming quite tricky to test. not certain how to do it.

would it be better to move this out of fsspec and just add another fix to sshfs? in sshfs i think this will have to be done in almost all locations that use a path.

@martindurant
Copy link
Member

If you can state the problem as "sshfs should always use _strip_protocol, but doesn't", then yes.

@martindurant
Copy link
Member

The logic in Generic changed recently, not to update the "name" field of the dircache of its target filesystems (part of #1633 ). That might be enough to fix the issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants