S3 Support

DMR++ provides direct access to data in S3, and we have made significant advances to supporting HDF5 files in the DMR++ builder and interpreter. We still have two gaps: support for certain Compound variables and support for some kinds of string arrays. This new release of Hyrax brings support for direct I/O transfers from HDF5 to NetCDF4 when using DMR++ .

We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system. We have added a BES module that can work with S3 using the DMR++ system. This provides a data flow that is similar to the one we provide for Hyrax in the Cloud as developer for NASA, but this new module does not make use of the NASA/ESDIS CMR system to resolve ‘NASA Granules’ to URLs. This will enable other groups to use the DMR++ system to serve data from S3.

We improved the performance of finding the effective URL for a data item when it is accessed via a series of redirect operations, the last of which is a signed AWS URL. This is a common case for data stored in S3.

We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system.

The BES can sign S3 URLs using the AWS V4 signing scheme. This uses the Credentials Manager system.

As of 1.16.8, we have added experimental support for DMR++ Aggregations in which multi file aggregations can be described in a single DMR++ file, reaping all of the efficiency benefits (and pitfalls) of DMR++ . Furthermore, Hyrax can generate signed S3 requests when processing DMR++ files whose data content live in S3 when the correct credentials are provided (injected) into the server.

Hyrax now implements lazy evaluation of DMR++ files. This change greatly improves efficiency/speed for requests that subset a dataset that contains a large number of variables as only the variables requested will have their Chunk information read and parsed.

Added version and configuration information to dmr files built using the `build_dmrpp` and get_dmrpp applications. This will enable people to recreate and understand the conditions which resulted in a particular + DMR + instance. This also includes a -z switch for get_dmrpp which will return its version.

The DMR++ production chain: get_dmrpp, build_dmrpp, check_dmrpp, merge_dmrpp, and reduce_mdf received the following updates:

Support for injecting configuration modifications to allow fine tuning of the dataset representation in the produced DMR++ file.
Optional creation and injection of missing (domain coordinate) data as needed.
Endian information carried in Chunks.
Updated command line options and help page.

Lastly, we have added support for S3 hosted granules to get_dmrpp. Added regression test suite for get_dmrpp.

Improved S3 reliability by adding retry efforts for common S3 error responses that indicate a retry is worth pursuing (because S3 just fails sometimes and a retry is suggested). We have also added caching of S3 “effective” URLs obtained from NGAP service chain.

For more on DMR++ , read the DMR++ wiki.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyrax_S3New.adoc

Hyrax_S3New.adoc

S3 Support

Files

Hyrax_S3New.adoc

Latest commit

History

Hyrax_S3New.adoc

File metadata and controls

S3 Support