-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regridding benchmark #1557
Add regridding benchmark #1557
Conversation
Do we have enough context here to add this to the benchmark post? If you give me bullet points and an image (if one makes sense) I'm happy to write up words. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I'm one of the de facto maintainer of xESMF
and I think this benchmark is a good start!
To complexify the test and test areas where xESMF needs more improvement, I would suggest using a very large grid either as input or output and have some chunking across the spatial dimensions.
IIUC, these benchmarks are more geared towards dask ? Another bottleneck of xESMF is the generation of the weights (the Regridder
initialization) with very large grids and more complex methods ("conservative"). But that part is neither parallelized nor lazy, so benchmarking this might be out of scope here.
We do have some code to make the weights generation in parallel but I would say it is still experimental and of limited scope.
@aulemahal: Thanks for the input. I'll add a follow-up issue to look into some of the suggestions for increasing the complexity of this workload.
It is, but it's also aimed at reflecting real workloads. Would more complex methods also result in more complex computations or just in more complex weight generation?
Please let us know if we can help with anything from a Dask perspective.
I'll whip something up. At first glance, this benchmark seems to do alright; it's mostly an embarrassingly parallel computation. Performance and # of tasks could probably look better but that's already a lot better than some of the other benchmarks. |
Mostly more complex weight generation, which is totally on the ESMF side, so partly in C/Fortran I think. Maybe two very different grids (curvilinear ones for example) and more complex methods would make for weights with more connected nodes, but I don't think this would affect the computation so much. |
xarray-regrid is a much less established tool than The resulting dask workload is very similar to The case in which we have chunking along the dimensions to regrid would also be interesting to add to your benchmark, but I don't know of any publicly available equivalents to the GCP ERA5 ARCO stores with that sort of chunking. |
@slevang, thanks for the additional input! Would you be interested in contributing a benchmark implemented with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, otherwise lgtm
thx |
Closes #1556