DMR++ provides direct access to data in S3, and we have made significant advances to supporting HDF5 files in the DMR++ builder and interpreter. We still have two gaps: support for certain Compound variables and support for some kinds of string arrays. This new release of Hyrax brings support for direct I/O transfers from HDF5 to NetCDF4 when using DMR++ .
We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system. We have added a BES module that can work with S3 using the DMR++ system. This provides a data flow that is similar to the one we provide for Hyrax in the Cloud as developer for NASA, but this new module does not make use of the NASA/ESDIS CMR system to resolve ‘NASA Granules’ to URLs. This will enable other groups to use the DMR++ system to serve data from S3.
We improved the performance of finding the effective URL for a data item when it is accessed via a series of redirect operations, the last of which is a signed AWS URL. This is a common case for data stored in S3.
We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system.
The BES can sign S3 URLs using the AWS V4 signing scheme. This uses the Credentials Manager system.
As of 1.16.8, we have added experimental support for DMR++ Aggregations in which multi file aggregations can be described in a single DMR++ file, reaping all of the efficiency benefits (and pitfalls) of DMR++ . Furthermore, Hyrax can generate signed S3 requests when processing DMR++ files whose data content live in S3 when the correct credentials are provided (injected) into the server.
Hyrax now implements lazy evaluation of DMR++ files. This change greatly improves efficiency/speed for requests that subset a dataset that contains a large number of variables as only the variables requested will have their Chunk information read and parsed.
Added version and configuration information to dmr files built using the `build_dmrpp` and get_dmrpp applications. This will enable people to recreate and understand the conditions which resulted in a particular + DMR + instance. This also includes a -z switch for get_dmrpp which will return its version.
The DMR++ production chain: get_dmrpp, build_dmrpp, check_dmrpp, merge_dmrpp, and reduce_mdf received the following updates:
-
Support for injecting configuration modifications to allow fine tuning of the dataset representation in the produced DMR++ file.
-
Optional creation and injection of missing (domain coordinate) data as needed.
-
Endian information carried in Chunks.
-
Updated command line options and help page.
Lastly, we have added support for S3 hosted granules to get_dmrpp. Added regression test suite for get_dmrpp.
Improved S3 reliability by adding retry efforts for common S3 error responses that indicate a retry is worth pursuing (because S3 just fails sometimes and a retry is suggested). We have also added caching of S3 “effective” URLs obtained from NGAP service chain.
For more on DMR++ , read the DMR++ wiki.