Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPM IMERG: benchmark virtual and zarr icechunk datasets for comparison with each other and current GES DISC time series service performance #2

Open
abarciauskas-bgse opened this issue Jan 7, 2025 · 1 comment

Comments

@abarciauskas-bgse
Copy link
Collaborator

Christine Smit at GES DISC shared that their time series service can pull out 15 years of half-hourly data in about 11-12 seconds, about 260,000 time points. She also shared a notebook and locust file for testing the service's performance.

This issue is to capture that we want to provide similar benchmarks for the GPM IMERG virtual and zarr datasets.

As noted #1, the currently virtual implementation may be quite slow to open and read the data back out due to the size of the chunk manifests. If possible, I think it would be good to use this type of benchmark for each stage of development - so we can test the performance as things currently stand and also if improvements are made.

@rabernat
Copy link
Contributor

rabernat commented Jan 9, 2025

Please share more details about the underlying data (chunks, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants