Skip to content

Latest commit

 

History

History
38 lines (18 loc) · 3.27 KB

Hyrax_S3New.adoc

File metadata and controls

38 lines (18 loc) · 3.27 KB

S3 Support

DMR++ provides direct access to data in S3, and we have made significant advances to supporting HDF5 files in the DMR++ builder and interpreter. We still have two gaps: support for certain Compound variables and support for some kinds of string arrays. This new release of Hyrax brings support for direct I/O transfers from HDF5 to NetCDF4 when using DMR++ .

We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system. We have added a BES module that can work with S3 using the DMR++ system. This provides a data flow that is similar to the one we provide for Hyrax in the Cloud as developer for NASA, but this new module does not make use of the NASA/ESDIS CMR system to resolve ‘NASA Granules’ to URLs. This will enable other groups to use the DMR++ system to serve data from S3.

We improved the performance of finding the effective URL for a data item when it is accessed via a series of redirect operations, the last of which is a signed AWS URL. This is a common case for data stored in S3.

We have added generic Memory and File caching, tailored specifically toward the cases that arise when serving data from S3 using the DMR++ system.

The BES can sign S3 URLs using the AWS V4 signing scheme. This uses the Credentials Manager system.

As of 1.16.8, we have added experimental support for DMR++ Aggregations in which multi file aggregations can be described in a single DMR++ file, reaping all of the efficiency benefits (and pitfalls) of DMR++ . Furthermore, Hyrax can generate signed S3 requests when processing DMR++ files whose data content live in S3 when the correct credentials are provided (injected) into the server.

Hyrax now implements lazy evaluation of DMR++ files. This change greatly improves efficiency/speed for requests that subset a dataset that contains a large number of variables as only the variables requested will have their Chunk information read and parsed.

Added version and configuration information to dmr files built using the `build_dmrpp` and get_dmrpp applications. This will enable people to recreate and understand the conditions which resulted in a particular + DMR + instance. This also includes a -z switch for get_dmrpp which will return its version.

The DMR++ production chain: get_dmrpp, build_dmrpp, check_dmrpp, merge_dmrpp, and reduce_mdf received the following updates:

  • Support for injecting configuration modifications to allow fine tuning of the dataset representation in the produced DMR++ file.

  • Optional creation and injection of missing (domain coordinate) data as needed.

  • Endian information carried in Chunks.

  • Updated command line options and help page.

Lastly, we have added support for S3 hosted granules to get_dmrpp. Added regression test suite for get_dmrpp.

Improved S3 reliability by adding retry efforts for common S3 error responses that indicate a retry is worth pursuing (because S3 just fails sometimes and a retry is suggested). We have also added caching of S3 “effective” URLs obtained from NGAP service chain.

For more on DMR++ , read the DMR++ wiki.