Avoid excess allocations in buildUpdateMap #757

mattrjackson · 2024-03-20T16:36:59Z

First of all, thanks for an excellent library. I use TASMANIAN on almost a daily basis for a combination of uncertainty quantification and surrogate dataset generation.

When building larger surrogate datasets, I noticed runtimes grew rapidly compared to the number of points added during surplus refinement. For some of my faster running models, TASMANIAN was accounting for 99%+ of the runtime, and most of that was spent in buildUpdateMap.

I profiled the code and noticed that the vast majority of the calculations were spent allocating std::vector<int>, which I tracked down to the allocation of global_to_pnts inside the parallel loop. This PR significantly reduces the number of allocations by moving global_to_pnts outside of the parallel loop and allocating them for each thread using a firstprivate clause. On my laptop, this has improved performance for grids with millions of points by at least an order of magnitude.

mkstoyanov · 2024-03-20T16:56:20Z

Hey @mattrjackson

It's always great to hear from actual users, especially ones that have deployed Tasmanian in production.

I know Tasmanian has a not insignificant community, but only a very small group of people interact with me on regular basis. You're the first person that I know and is using the direction selective refinement, since I developed the method a while back. Thus the update builder hasn't received the proper attention.

Thanks for the PR and I think I see a few other opportunities to make improvements. Roughly speaking, what is the dimension and number of point for the expensive problem that you're solving? I want to run a few benchmarks myself to make sure I actually make improvements and not regressions.

mattrjackson · 2024-03-20T18:07:30Z

For the surrogate datasets, I've tried up to six dimensions and about 10M points thus far. That was initially due to the limitation of the model we were calling (which is developed by a third party and unfortunately is not thread-safe). After making a large number of optimizations to that model I can now afford to run tens of millions of calculations if need be, but how large the grids grow from here are a bit of a question mark.

As always, fewer points would be better, but the data from the model has sharp discontinuities in some spots, so preserving the features that need to be there without blowing up the grid size has proven challenging.

mkstoyanov · 2024-03-20T18:19:56Z

There are fundamental mathematical challenges when dealing with discontinuous response. Basically, you cannot have a convergence in the "inf" norm between the model and the interpolant, which means that the surpluses do not decay to zero, which in turn blows up the size of the grid.

The first question that you need to ask is what does it mean to approximate the discontinuous model, while understanding that there will be an area near the discontinuity that will have significant difference.

The second thing that you can try is to force the convergence (and reduce the blowup of points) with added scale_correction. The correction is just a multiplier to the surpluses, if you correct by zero then no refinement will happen near that particular point, if you set it to a small number, then refinement will happen only if the surplus is significantly larger than the tolerance. Using a correction of a large number will force refinement even if the surplus is small.

One good correction choice is the area (support) of the hierarchical basis functions. You can get the support from getHierarchicalSupport()

We can setup a Zoom/Teams call next week to discuss some of the details.

Avoid excess allocations in buildUpdateMap

8d19d5b

mkstoyanov merged commit ef62603 into ORNL:master Mar 20, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid excess allocations in buildUpdateMap #757

Avoid excess allocations in buildUpdateMap #757

mattrjackson commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024

mattrjackson commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024

Avoid excess allocations in buildUpdateMap #757

Avoid excess allocations in buildUpdateMap #757

Conversation

mattrjackson commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024

mattrjackson commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024