-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/hdf4 subdatasets #410
Conversation
podpac/core/data/rasterio_source.py
Outdated
|
||
@property | ||
def subdatasets(self): | ||
return self.dataset.subdatasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially make the subdatasets as different outputs
and then open multiple datasets to composite the results. That would take another branch in the get_data
function but would results in a much nicer user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. A dataset can already have multiple outputs, at least if you open it with the xarray Dataset
or H5PY
nodes. Could you end up with a situation with several subdatasets that each have multiple outputs, but no way to have "nested" outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes me want to scream -- I don't know. I hope the answer is NO but I suspect it's a resounding Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the simple answer then is not to make the subdatasets different outputs, basically require one node per subdataset. Hopefully the file can be open concurrently for multiple read-only file-pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the research I've done, there should not be any multi-band subdatasets.
I think I'll end up making the subdatasets multiple outputs -- make it really easy for users. I've now heard from two people that they gave up on PODPAC because they couldn't read the HDF files and they didn't know why. So, it should just work without them having to know about the intricacies of subdatasets.
Also, the s3 case is not a real use-case since reading and hdf file from S3 seems to involve reading the whole file anyway. In other words, it's very slow.
A related questions: right now we read band 1 by default. Should we just go ahead and read all bands by default? What do you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Yeah, I guess we should read all of the bands by default. The FileKeysMixin
that is used by CSV
, Zarr
, Dataset
, and H5PY
loads all available data_keys
by default as separate outputs
, unless there is only one, in which case outputs
is None and the node is a standard single-output node.
@tl.default("data_key")
def _default_data_key(self):
if len(self.available_data_keys) == 1:
return self.available_data_keys[0]
else:
return self.available_data_keys
@tl.default("outputs")
def _default_outputs(self):
if not isinstance(self.data_key, list):
return None
else:
return self.data_key
Rasterio
should be basically the same, something like allowing band
to be either a list or a single value and using this:
@tl.default("band")
def _default_band(self):
if len(self.band_keys == 1:
return self.band_keys[0]
else:
return self.band_keys
@tl.default("outputs")
def _default_outputs(self):
if not isinstance(self.band, list):
return None
else:
return self.band
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to make an issue for this, or fix it now before merging this issue? Do you have time, or shall I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fixing this inside this PR is fine. And don't worry about timing -- this is complex enough that I think we should do it for the next release and not rush something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up creating #420 for this instead. This PR was getting pretty old and it should be in develop to support our team right now.
I don't really know what to do here. When I try to open the subdataset using rasterio (no podpac), I get "No such file or directory". Am I doing it correctly? >>> rasterio.open('HDF4_EOS:EOS_GRID:"MOD13Q1.A2013033.h08v05.006.2015256072248.hdf":MODIS_Grid_16DAY_250m_500m_VI:"250m 16 days NDVI"')
Traceback (most recent call last):
File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
File "rasterio/_shim.pyx", line 67, in rasterio._shim.open_dataset
File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: HDF4_EOS:EOS_GRID:"MOD13Q1.A2013033.h08v05.006.2015256072248.hdf":MODIS_Grid_16DAY_250m_500m_VI:"250m 16 days NDVI": No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jmilloy/Creare/Pipeline/_podpac38_/lib/python3.8/site-packages/rasterio/env.py", line 433, in wrapper
return f(*args, **kwds)
File "/home/jmilloy/Creare/Pipeline/_podpac38_/lib/python3.8/site-packages/rasterio/__init__.py", line 218, in open
s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
File "rasterio/_base.pyx", line 218, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:"MOD13Q1.A2013033.h08v05.006.2015256072248.hdf":MODIS_Grid_16DAY_250m_500m_VI:"250m 16 days NDVI": No such file or directory When I try to just use rasterio to open the hdf file directly, it doesn't work, either. Should it?
|
What version of rasterio are you using? Do you have hdf4 installed? |
rasterio 1.1.5 I'm checking hdf4 now. I guess it is not part of the extra dependencies? |
Oh, it's a system library. Okay, I would have expected a better error message from rasterio, but instally hdf4 should help! |
No difference. |
hdf4 should come with rasterio -- if installed through conda. I'm using 1.1.3 and it works fine. |
What's the |
So, df = rasterio.open('MOD13Q1.A2013033.h08v05.006.2015256072248.hdf')
df.datasets should work. Also, what's the filesize? Maybe something got broken in transfer. It should be 189MB (198,229,812 bytes) |
The file is in the cwd. The fact that the |
seems okay? 190M, or 198229812 bytes. md5sum: 1d41e6e86c0a247581b66666bd9c9f9b |
Yep, MD5SUM is correct.... hmmm... A mystery... here's some of my output: In [1]: import rasterio
In [2]: df = rasterio.open("MOD13Q1.A2013033.h08v05.006.2015256072248.hdf")
C:\Anaconda3\envs\soilmap\lib\site-packages\rasterio\__init__.py:219: NotGeoreferencedWarning: Dataset has no geotransform set. The identity matrix may be returned.
s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
In [3]: df.driver
Out[3]: 'HDF4'
In [4]: df.subdatasets
Out[4]:
['HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days NDVI',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days relative azimuth angle',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days composite day of the year',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days pixel reliability',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days EVI',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days VI Quality',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days red reflectance',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days NIR reflectance',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days blue reflectance',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days MIR reflectance',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days view zenith angle',
'HDF4_EOS:EOS_GRID:MOD13Q1.A2013033.h08v05.006.2015256072248.hdf:MODIS_Grid_16DAY_250m_500m_VI:250m 16 days sun zenith angle'] Do you have GDAL installed? You could try Just ideas... |
Try:
|
Yeah, there's no HDF4 driver. I still can't figure out how to install it. |
gdalinfo worked, the file is fine |
I'm trying conda now, if that doesn't work, you might be best off just merging this since it is working for you. I'd prefer an explicit check for the subdataset string instead of a try except. |
Okay, conda was able to include the hdf4 driver for rasterio. I had no trouble opening the file locally using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but with a preference for the one change.
Support reading subdatasets in the rasterio Node.
This is motivated by reading MODIS data downloaded from the source.