-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #166 from NREL/gb/zarr
added zarr example readme and added to docs
- Loading branch information
Showing
3 changed files
with
63 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,4 @@ Examples | |
examples.wind | ||
examples.us_wave | ||
examples.hsds | ||
examples.zarr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.. include:: ../../../examples/zarr/README.rst | ||
:start-line: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
Zarr | ||
==== | ||
|
||
You can use `Zarr <https://zarr.dev/>`_ to open NREL h5 resource files hosted on AWS S3. In our internal tests, this has comparable performance to reading data with an HSDS local server. The benefit of this approach is that you don't need to run an HSDS server, the drawback is that you need to handle a large meta data file for every h5 file you access on S3. | ||
|
||
We currently do not have this integrated into the ``rex`` resource handler classes and are evaluating whether or not this is worthwhile. | ||
|
||
Extra Requirements | ||
------------------ | ||
|
||
You may need some additional software beyond the rex requirements to run this example: | ||
|
||
.. code-block:: bash | ||
pip install s3fs zarr fsspec kerchunk | ||
Code Example | ||
------------ | ||
|
||
To open an h5 file hosted on AWS S3, follow the code example below. Here are some caveats to this approach: | ||
|
||
- Change ``s3_path`` and ``meta_path`` to your desired paths | ||
- The meta data file ``meta_path`` may take a long time to generate, typically a few minutes but in rare cases up to an hour. | ||
- The meta data file will be a few hundred MB and should be saved on your local hard drive. | ||
- The meta data file is unique to every h5 file regardless of spatial meta data or temporal time index. | ||
- Take care to apply dataset scale factors to convert from integer precision to physical units. In this case, the GHI scale factor is just 1, but it is often greater than 1. The rex resource handlers do this automatically but you need to do this manually when reading the data straight from disk. | ||
|
||
.. code-block:: python | ||
import fsspec | ||
import ujson | ||
import zarr | ||
from pathlib import Path | ||
from kerchunk.hdf import SingleHdf5ToZarr | ||
s3_path = 's3://nrel-pds-nsrdb/current/nsrdb_2020.h5' | ||
meta_path = "./nsrdb_2020.json" | ||
storage_opts = dict(mode="rb", anon=True, default_fill_cache=False, | ||
default_cache_type="none") | ||
h5chunks = SingleHdf5ToZarr(s3_path, storage_options=storage_opts, | ||
inline_threshold=0) | ||
metadata_json_path = Path(meta_path) | ||
if metadata_json_path.exists() is False: | ||
with open(metadata_json_path, 'wb') as out: | ||
out.write(ujson.dumps(h5chunks.translate()).encode()) | ||
with open(metadata_json_path, "rb") as f: | ||
mapper = fsspec.get_mapper("reference://", fo=ujson.load(f), | ||
remote_protocol="s3", | ||
remote_options={"anon": True}) | ||
data = zarr.open(mapper) | ||
arr = data['ghi'][:, 0] / data['ghi'].attrs["scale_factor"] | ||
print(list(data.keys())) | ||
print(data['ghi'], data['ghi'].shape, data['ghi'].attrs) | ||
print(arr) |