GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

abarciauskas-bgse · 2025-01-07T17:14:52Z

Christine Smit at GES DISC shared that their time series service can pull out 15 years of half-hourly data in about 11-12 seconds, about 260,000 time points. She also shared a notebook and locust file for testing the service's performance.

This issue is to capture that we want to provide similar benchmarks for the GPM IMERG virtual and zarr datasets.

As noted #1, the currently virtual implementation may be quite slow to open and read the data back out due to the size of the chunk manifests. If possible, I think it would be good to use this type of benchmark for each stage of development - so we can test the performance as things currently stand and also if improvements are made.

rabernat · 2025-01-09T17:43:36Z

Please share more details about the underlying data (chunks, etc.)

abarciauskas-bgse mentioned this issue Jan 7, 2025

GES DISC Use Case Exploration for Icechunk NASA-IMPACT/veda-odd#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

abarciauskas-bgse commented Jan 7, 2025

rabernat commented Jan 9, 2025

GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

Comments

abarciauskas-bgse commented Jan 7, 2025

rabernat commented Jan 9, 2025