Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for s3 buckets in OLCI and ABI l1 readers #1439

Merged
merged 23 commits into from
Dec 2, 2020

Conversation

mraspaud
Copy link
Member

@mraspaud mraspaud commented Nov 16, 2020

This PR allows s3 file objects to be passed to the olci readers. To do this, is implements a new class called FSFile that abides to the PathLike protocol.

  • Tests added
  • Passes flake8 satpy
  • Fully documented

@mraspaud mraspaud self-assigned this Nov 16, 2020
@mraspaud mraspaud added component:readers enhancement code enhancements, features, improvements labels Nov 16, 2020
@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #1439 (d7d227b) into master (d8d5e9e) will decrease coverage by 0.16%.
The diff coverage is 97.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1439      +/-   ##
==========================================
- Coverage   90.74%   90.58%   -0.17%     
==========================================
  Files         238      239       +1     
  Lines       34139    34313     +174     
==========================================
+ Hits        30980    31081     +101     
- Misses       3159     3232      +73     
Flag Coverage Δ
behaviourtests 4.53% <5.05%> (?)
unittests 91.05% <97.97%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/readers/olci_nc.py 91.42% <89.47%> (+1.28%) ⬆️
satpy/readers/abi_base.py 94.59% <94.11%> (+2.05%) ⬆️
satpy/_compat.py 100.00% <100.00%> (ø)
satpy/readers/__init__.py 97.45% <100.00%> (+0.35%) ⬆️
satpy/readers/file_handlers.py 97.18% <100.00%> (+1.23%) ⬆️
satpy/tests/reader_tests/test_abi_l1b.py 100.00% <100.00%> (ø)
satpy/tests/reader_tests/test_olci_nc.py 100.00% <100.00%> (ø)
satpy/tests/test_file_handlers.py 100.00% <100.00%> (ø)
satpy/tests/test_readers.py 99.19% <100.00%> (+0.14%) ⬆️
satpy/utils.py 24.09% <0.00%> (-46.39%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8d5e9e...d7d227b. Read the comment docs.

@coveralls
Copy link

coveralls commented Nov 16, 2020

Coverage Status

Coverage increased (+0.009%) to 90.755% when pulling 36ff447 on mraspaud:feature-olci-s3 into d8d5e9e on pytroll:master.

@ghost
Copy link

ghost commented Nov 18, 2020

DeepCode's analysis on #d7d227 found:

  • ℹ️ 2 minor issues. 👇
  • ✔️ 1 issue was fixed.

Top issues

Description Example fixes
Unused cached_property imported from functools Occurrences: 🔧 Example fixes
Value 'self.nc' is unsubscriptable Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@mraspaud mraspaud marked this pull request as ready for review November 20, 2020 13:06
@gerritholl
Copy link
Member

Is this intended as an alternative or replacement for #1321?

@mraspaud
Copy link
Member Author

Yes.

@gerritholl
Copy link
Member

I haven't tried it yet, but it looks interesting. To be even more useful for my use case, find_files_and_readers should, when passed a file_system object via its fs argument, return instances of your FSFile class. Do you plan to implement this? If not I could look into contributing that (not sure when).

@mraspaud
Copy link
Member Author

Yes, that is the plan indeed, for an upcoming PR. I wanted to keep this PR at a reasonable size. We'll see who gets there first.

I also have a idea for an enhancement to find_files_and_reader that, when passed an archive and not finding a matching reader, will put an fsspec wrapper around the archive to be able to look for files inside it for a matching reader.

@mraspaud
Copy link
Member Author

Just a note though that this PR just fixes the olci readers to accept FSFiles. The other will for now probably not work with it.

@mraspaud
Copy link
Member Author

@carloshorn you might want to have a look at this PR too.

Copy link
Member

@gerritholl gerritholl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks interesting, good work. I'm looking forward to trying it out. Just a few questions for clarification (see comments for details).


@total_ordering
class FSFile(os.PathLike):
"""Implementation of a PathLike file object, that can be opened.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this line. Is it a file object or a path object? If it's a file object it's already opened? I think os.PathLike represents a path, not an open file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a path indeed, but can be opened to return a file object.


This is made to be use in conjuction with fsspec or s3fs. For example::

zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appear to be some string-delimiting quotes missing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed thanks.

This is made to be use in conjuction with fsspec or s3fs. For example::

zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip
filename = "sentinel-s3-ol2wfr-zips/2020/11/03/" + filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be + zipfile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.


zipfile = S3B_OL_2_WFR____20201103T100807_20201103T101107_20201103T121330_0179_045_179_1980_MAR_O_NR_002.zip
filename = "sentinel-s3-ol2wfr-zips/2020/11/03/" + filename
the_files = fsspec.open_files("simplecache::zip://**/*.nc::s3://" + filename, s3={'anon': False})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're passing 'anon': False — might you have an example that is available with anonymous access such that any user can easily try out the example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have open an example for olci files unfortunately, and this is the only reader supporting this right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed through switching to an abi example.

Comment on lines 112 to 114
@property
@lru_cache(maxsize=2)
def nc(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly for my curiosity: I've never seen lru_cache used on a property before. I don't know much about OLCI. What is the purpose and effect of a cache of size 2 on this property? Does that mean that if I open 3 OLCI files that the one I opened first is closed again? Or just that there is work redone if .nc is accessed on the first one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put size 2 for safety, but in principle only 1 is needed, as this is a property on the instance. So each instance of the file handler will have a different cache for this accessor, and allows repeated calls to the property not to trigger a reread (especially important when this is a remote file of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this cached_property before? What's the difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was, but cached_property was introduced in python 3.8, so I have to pass on that for now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. What about a try/except ImportError on that? Can lru_cache take a maxsize=1? Does it default to that? Basically can we use cached_property when it exists, but use lru_cache for backwards compatibility for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I thought this would do it, but no... Will investigate more tomorrow

def cached_property(func):
    return property(lru_cache(maxsize=1)(func))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it to work now.

Comment on lines +876 to +877
with suppress(PermissionError):
os.remove(self.zip_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this lead to PermissionError when you just created this file in setUp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask windows...

@carloshorn
Copy link

@carloshorn you might want to have a look at this PR too.

@mraspaud thanks for adding it. Good idea to pass the file system with the file path object. This preserves the satpy interface and avoids passing the file system down the road as in my draft. I am already convinced of using fsspec filesystems, because it gives the developers a hook where they can reconstruct any expected directory structure and filename conventions without forcing any changes/copies/downloads on the physical filesystem.
Still, I do not like that many satpy readers extract metadata from filenames and therefore set constraints on them, but at least this PR gives a solution on how to deal with it programmatically.

If this PR gets merged, I would volunteer to add the FSFile support for the avhrr _l1b_gaclac reader.

@mraspaud
Copy link
Member Author

@carloshorn you might want to have a look at this PR too.

@mraspaud thanks for adding it. Good idea to pass the file system with the file path object. This preserves the satpy interface and avoids passing the file system down the road as in my draft. I am already convinced of using fsspec filesystems, because it gives the developers a hook where they can reconstruct any expected directory structure and filename conventions without forcing any changes/copies/downloads on the physical filesystem.

👍

Still, I do not like that many satpy readers extract metadata from filenames and therefore set constraints on them, but at least this PR gives a solution on how to deal with it programmatically.

Interesting, we should discuss this next week (PCW) ?

If this PR gets merged, I would volunteer to add the FSFile support for the avhrr _l1b_gaclac reader.

Sounds great!

@mraspaud
Copy link
Member Author

mraspaud commented Dec 1, 2020

This is ready for re-review and merging I think

@sfinkens
Copy link
Member

sfinkens commented Dec 1, 2020

I don't have much expertise here, but to me it looks very thin and simple 👍 Far from re-inventing the wheel in my opinion.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to re-title this since it includes changes to ABI? Maybe "Add experimental S3 support to OLCI and ABI readers"?

Do you want to do the self.nc property trick to the abi base reader too?

@mraspaud
Copy link
Member Author

mraspaud commented Dec 1, 2020

Will do (both)

@mraspaud mraspaud changed the title Add support for s3 buckets in olci reader Add support for s3 buckets in OLCI and ABI l1 readers Dec 2, 2020
@mraspaud
Copy link
Member Author

mraspaud commented Dec 2, 2020

Done

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small/optional request and one question. Otherwise LGTM.


def cached_property(func):
"""Port back functools.cached_property."""
return property(lru_cache(maxsize=None)(func))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now you're going to hate me. What about moving this into it's own _compat.py module so I can use it in other readers and it can be shared by the readers here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. And no I don't hate you :)

self.filename = str(filename)
else:
self.filename = filename
self.filename = filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will break pathlib objects for all other readers right? (assuming the low-level I/O library doesn't support them)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I'm ok with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably yes... Is this bad?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @gerritholl was the first user to point out that Satpy didn't work with pathlib objects so he should maybe make the final decision. It probably isn't great that this breaks it for all other readers, but I'm not sure how many readers need strings for their lower-level I/O libraries either.

satpy/_compat.py Outdated

def cached_property(func):
"""Port back functools.cached_property."""
return property(lru_cache(maxsize=None)(func))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was thinking putting the import of cached_property in here too. That way I could do from satpy._compat import cached_property with no try/except in my reader module. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you got it.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Merge when you can and I'll use this cached_property compat decorator in my GAASP reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants