-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to use fsspec.generic.rsync across different filesystems #1398
Comments
This could be documented better... You may even be right that The You could for instance do
where the URLs in |
ok wow. ya thats exactly what I wanted to do but really not clear from the docs how to do that. So ok the idea is to define a new "backend" and then mangle my uris to use the new I guess the downside of this is that now I can't use the URIs that I would use in other places. Ideally I want to have the inputs here to rsync both be It seems far cleaner to have the api I proposed above with a source_fs and target_fs instead of having to mangle the uris. Any reason not to also support that? Glad there is a workaround though. Will try this out. |
No reason, except that the generic filesystem came first, and so reused code rather than tailor something that was easier to use. Would you like to work on this? |
Ya. If this would be of value would love to contribute. Ill draft up a PR. |
I am still missing something about how the solution you mentioned can work. Trying to use a generic to load a different s3 config doesnt seem to work as expected.
throws an error about since the whole path gets passed now to the s3fs which tries to validate that path and |
Ah, that is annoying! I guess "s3" and "s3a" would work, since those are different, but still prefixes that we know and expect. You are right that a better solution is certainly warranted! |
I think the issue is with _strip_protocol() is broken for generic in this case.
doesn't give what you expect. it gives |
ya https://github.com/fsspec/filesystem_spec/blob/master/fsspec/generic.py#L177 is wrong. It tries to strip the protocol using the s3fs which would remove an |
You can set the instance's value of protocol, that might be enough; but using "s3" and "s3a" would be a workaround for the moment. |
ok s3 and s3a are different protocols though. Also not sure how this would then work with things like hdfs if you wanted to copy from 1 hdfs cluster to another. but ok interesting there is a workaround to just s3 and s3a for the two. I will still propose a PR to implement a rsync like method that works without the generic and uses 2 explicit filesystems. |
The rsync method looks like exactly what I have been looking for, but I am not sure how one would use it to say sync data from two different s3 buckets which required difference credentials.
What I would want is something like
but the rsync method only takes an
fs=
not a to_fs and from_fs. So how is one supposed to pass in both values? Why does the rsync method only take one if it is meant to be able to copy cross systems?The text was updated successfully, but these errors were encountered: