-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for s3 buckets in OLCI and ABI l1 readers #1439
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1439 +/- ##
==========================================
- Coverage 90.74% 90.58% -0.17%
==========================================
Files 238 239 +1
Lines 34139 34313 +174
==========================================
+ Hits 30980 31081 +101
- Misses 3159 3232 +73
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
DeepCode's analysis on #d7d227 found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Is this intended as an alternative or replacement for #1321? |
Yes. |
I haven't tried it yet, but it looks interesting. To be even more useful for my use case, |
Yes, that is the plan indeed, for an upcoming PR. I wanted to keep this PR at a reasonable size. We'll see who gets there first. I also have a idea for an enhancement to find_files_and_reader that, when passed an archive and not finding a matching reader, will put an fsspec wrapper around the archive to be able to look for files inside it for a matching reader. |
Just a note though that this PR just fixes the olci readers to accept FSFiles. The other will for now probably not work with it. |
@carloshorn you might want to have a look at this PR too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks interesting, good work. I'm looking forward to trying it out. Just a few questions for clarification (see comments for details).
|
||
@total_ordering | ||
class FSFile(os.PathLike): | ||
"""Implementation of a PathLike file object, that can be opened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this line. Is it a file object or a path object? If it's a file object it's already opened? I think os.PathLike
represents a path, not an open file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a path indeed, but can be opened to return a file object.
satpy/readers/__init__.py
Outdated
|
||
This is made to be use in conjuction with fsspec or s3fs. For example:: | ||
|
||
zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appear to be some string-delimiting quotes missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed thanks.
satpy/readers/__init__.py
Outdated
This is made to be use in conjuction with fsspec or s3fs. For example:: | ||
|
||
zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip | ||
filename = "sentinel-s3-ol2wfr-zips/2020/11/03/" + filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be + zipfile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
satpy/readers/__init__.py
Outdated
|
||
zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip | ||
filename = "sentinel-s3-ol2wfr-zips/2020/11/03/" + filename | ||
the_files = fsspec.open_files("simplecache::zip://**/*.nc::s3://" + filename, s3={'anon': False}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're passing 'anon': False
— might you have an example that is available with anonymous access such that any user can easily try out the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have open an example for olci files unfortunately, and this is the only reader supporting this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed through switching to an abi example.
satpy/readers/olci_nc.py
Outdated
@property | ||
@lru_cache(maxsize=2) | ||
def nc(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly for my curiosity: I've never seen lru_cache
used on a property
before. I don't know much about OLCI. What is the purpose and effect of a cache of size 2 on this property? Does that mean that if I open 3 OLCI files that the one I opened first is closed again? Or just that there is work redone if .nc
is accessed on the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put size 2 for safety, but in principle only 1 is needed, as this is a property on the instance. So each instance of the file handler will have a different cache for this accessor, and allows repeated calls to the property not to trigger a reread (especially important when this is a remote file of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this cached_property before? What's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was, but cached_property
was introduced in python 3.8, so I have to pass on that for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. What about a try/except ImportError on that? Can lru_cache take a maxsize=1? Does it default to that? Basically can we use cached_property when it exists, but use lru_cache for backwards compatibility for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I thought this would do it, but no... Will investigate more tomorrow
def cached_property(func):
return property(lru_cache(maxsize=1)(func))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it to work now.
with suppress(PermissionError): | ||
os.remove(self.zip_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this lead to PermissionError
when you just created this file in setUp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ask windows...
@mraspaud thanks for adding it. Good idea to pass the file system with the file path object. This preserves the satpy interface and avoids passing the file system down the road as in my draft. I am already convinced of using fsspec filesystems, because it gives the developers a hook where they can reconstruct any expected directory structure and filename conventions without forcing any changes/copies/downloads on the physical filesystem. If this PR gets merged, I would volunteer to add the |
👍
Interesting, we should discuss this next week (PCW) ?
Sounds great! |
It was introduced in python 3.8, so we have to wait a bit for it.
This is ready for re-review and merging I think |
I don't have much expertise here, but to me it looks very thin and simple 👍 Far from re-inventing the wheel in my opinion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to re-title this since it includes changes to ABI? Maybe "Add experimental S3 support to OLCI and ABI readers"?
Do you want to do the self.nc
property trick to the abi base reader too?
Will do (both) |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small/optional request and one question. Otherwise LGTM.
satpy/readers/olci_nc.py
Outdated
|
||
def cached_property(func): | ||
"""Port back functools.cached_property.""" | ||
return property(lru_cache(maxsize=None)(func)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now you're going to hate me. What about moving this into it's own _compat.py
module so I can use it in other readers and it can be shared by the readers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. And no I don't hate you :)
self.filename = str(filename) | ||
else: | ||
self.filename = filename | ||
self.filename = filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will break pathlib objects for all other readers right? (assuming the low-level I/O library doesn't support them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I'm ok with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably yes... Is this bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @gerritholl was the first user to point out that Satpy didn't work with pathlib objects so he should maybe make the final decision. It probably isn't great that this breaks it for all other readers, but I'm not sure how many readers need strings for their lower-level I/O libraries either.
satpy/_compat.py
Outdated
|
||
def cached_property(func): | ||
"""Port back functools.cached_property.""" | ||
return property(lru_cache(maxsize=None)(func)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was thinking putting the import of cached_property in here too. That way I could do from satpy._compat import cached_property
with no try/except in my reader module. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Merge when you can and I'll use this cached_property compat decorator in my GAASP reader.
This PR allows s3 file objects to be passed to the olci readers. To do this, is implements a new class called
FSFile
that abides to the PathLike protocol.flake8 satpy