Performance improvements to heatmap path #4549

BioTurboNick · 2022-11-24T18:33:30Z

Partially addresses #4520

Type inferrence is often a big issue here:

const mat = randn(512, 512);
function f(x, n)
    plot()
    for _ in 1:n
        heatmap!(x)
    end
end
@btime f(mat, 100)

Master: 1.151 s (26455855 allocations: 614.24 MiB)
Adding one type annotation to `expand_extrema!(a::Axis, Surf::Surface)`: 546.681 ms (241455 allocations: 214.24 MiB)

EDIT: Looks like the switch to foreach in that method in the master branch actually sidestepped the inference issue, but the annotation is probably for the best.

Noticed that expand_extrema was constructing and discarding a vector of all heatmap square edges when it just needed the limits.

Before: 449.488 ms (42563 allocations: 206.52 MiB)
After: 444.067 ms (30763 allocations: 205.35 MiB)

As number of series grows, a lot of allocations are spent constructing color gradients - even when they're all the same - I changed it to look up the original object instead.

After: 446.839 ms (28363 allocations: 202.14 MiB

A big time sink is having to check for infinites when calculating the extrema. If I remove that check:

339.531 ms (28363 allocations: 202.14 MiB)

Borrowing the approach from NaNMath's functions, we can get this closer:

364.563 ms (28363 allocations: 202.14 MiB)

The NaNMath.extrema function (used by the colorbar limits) is also pretty slow; replacing it with minimum(x), maximum(x) we get

187.983 ms (28363 allocations: 202.14 MiB)

codecov · 2022-11-24T19:08:15Z

Codecov Report

Base: 90.96% // Head: 90.93% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (5fc7a06) compared to base (17e7d4c).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 5fc7a06 differs from pull request most recent head f9fdf4a. Consider uploading reports for the commit f9fdf4a to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4549      +/-   ##
==========================================
- Coverage   90.96%   90.93%   -0.03%     
==========================================
  Files          40       40              
  Lines        7744     7753       +9     
==========================================
+ Hits         7044     7050       +6     
- Misses        700      703       +3

Impacted Files	Coverage Δ
src/axes.jl	`90.61% <100.00%> (ø)`
src/components.jl	`90.07% <100.00%> (ø)`
src/utils.jl	`92.55% <100.00%> (-0.39%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

BioTurboNick · 2022-11-24T23:17:53Z

Any idea why colorbars use a different extrema logic than the spatial axis limits? @t-bltg

Colorbars ignore nans, axis limits ignore infinites

EDIT: Infs break the colorbar display, so I'm thinking it should skip them too.

t-bltg · 2022-11-25T08:20:09Z

Any idea why colorbars use a different extrema logic than the spatial axis limits ?

Clueless :)

If the current test suite passes with your modifications, I'd say go for it.
Cutting down run time to 50% is a nice improvement.
Thanks for working on this, Plots needs consolidation.

t-bltg · 2022-11-25T09:02:50Z

It would be nice to fix #2410 along the way ;)

BioTurboNick · 2022-11-26T07:33:58Z

I've tried immutable Extrema objects and other organizations; I often end up losing some optimization the compiler is doing in the current state of this PR.

I've poked around at this quite a bit right now, and I think we're up against the minimum possible time if we hold these invariants:

ignore nans, infinites
extrema guaranteed to exactly fit all values

If feels ridiculous that the vast majority of computation time, for large numbers of large series, is spent finding extrema.

Idea - Provide an option to control extrema-finding:

Exact (current)
Sampling (strided subset?)
Disabled (not performed, depends entirely on provided limits)

Sampling would maybe become the default, taking in all data up to a threshold and then sampling the rest. Strided would likely be more performant and deterministic than random, though slight danger of a patterned input missing important values.

Thoughts?

t-bltg · 2022-11-26T08:27:43Z

If feels ridiculous that the vast majority of computation time, for large numbers of large series, is spent finding extrema.

Agreed. Traversing large series can be expensive, but it is sometimes needed.
I've recently added such a traversal (https://github.com/JuliaPlots/Plots.jl/blob/master/src/pipeline.jl#L156-L159) in this bug fix PR #4503.

Providing an option is indeed a good idea, since OPs problem is a bit of a niche one.
IIUC, his data mostly consists of NaNs, with small non overlapping patches. Overlaying these patches creates the final heatmap. This is not how we regularly proceed. We usually create the final heatmap before plotting it (single series), by assembling patches.
The option would retain the current behavior as default and could be toggled upon performance issues. Considering this problem is a niche one, I'd like the "performance" option to be generic for other series too, and not only dedicated to heatmap plotting problems (maybe relevant for mesh3d patches, or large scatters).

OP provided another case, which would be worth investigating (are the slow paths the same as for heatmap ?):

x = randn(100, 1_000)
y = randn(100, 1_000)
z = randn(100, 1_000)
scatter(x, y, zcolor=z)  # slowish
scatter(vec(x), vec(y), zcolor=vec(z))  # fast ?

You can maybe find inspiration from #4536, which uses a deterministic "distance to series" algorithm for computing the legend position.

BioTurboNick · 2022-11-26T14:19:18Z

Yes, it's almost entirely in paths that affect all plots - updating colorbar limits and axis extents.

If I just simply say "take the first 1024 points, then sample ~1024 more in stride" for both I drastically cut the computation time.

65.056 ms (28453 allocations: 202.16 MiB)

BioTurboNick · 2022-11-28T14:38:27Z

I noticed that during the (I think last stage?) of series recipes, the data is copied out and also infinites and missings are replaced with NaNs at the same time.

So a couple notes:

Could we consolidate these passes, checking and storing the extrema at the same time as copying the data?
If we're guaranteed to not have infinites after that first pass, we can skip checking for them later, which would also save some time. Checking for nans is still a bit slow though.

t-bltg · 2022-11-28T14:41:17Z

Could we consolidate these passes, checking and storing the extrema at the same time as copying the data

Of course.

If we're guaranteed to not have infinites after that first pass, we can skip checking for them later, which would also save some time.

I agree that this is the way to go: store some minimal statistics about a series in a struct or a nt e.g. extrema, has/has not nans, has/has not inf, ...

BioTurboNick added 2 commits November 24, 2022 13:25

Add type annotation

833ec9e

Merge branch 'master' into faster-heatmaps

d7c5c63

BioTurboNick added 3 commits November 24, 2022 14:10

Add extrema-only heatmap extrema determination

5fc7a06

Avoid reconstructing the same gradient

f54db27

Borrow strategy from NaNMath to skip infinites

452bd9a

Experimental colorbar limit changes, sampling method

f9fdf4a

t-bltg added the performance speedups and slowdowns label Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements to heatmap path #4549

Performance improvements to heatmap path #4549

BioTurboNick commented Nov 24, 2022 •

edited

Loading

codecov bot commented Nov 24, 2022 •

edited

Loading

BioTurboNick commented Nov 24, 2022 •

edited

Loading

t-bltg commented Nov 25, 2022 •

edited

Loading

t-bltg commented Nov 25, 2022

BioTurboNick commented Nov 26, 2022

t-bltg commented Nov 26, 2022 •

edited

Loading

BioTurboNick commented Nov 26, 2022

BioTurboNick commented Nov 28, 2022

t-bltg commented Nov 28, 2022

Performance improvements to heatmap path #4549

Are you sure you want to change the base?

Performance improvements to heatmap path #4549

Conversation

BioTurboNick commented Nov 24, 2022 • edited Loading

codecov bot commented Nov 24, 2022 • edited Loading

Codecov Report

BioTurboNick commented Nov 24, 2022 • edited Loading

t-bltg commented Nov 25, 2022 • edited Loading

t-bltg commented Nov 25, 2022

BioTurboNick commented Nov 26, 2022

t-bltg commented Nov 26, 2022 • edited Loading

BioTurboNick commented Nov 26, 2022

BioTurboNick commented Nov 28, 2022

t-bltg commented Nov 28, 2022

BioTurboNick commented Nov 24, 2022 •

edited

Loading

codecov bot commented Nov 24, 2022 •

edited

Loading

BioTurboNick commented Nov 24, 2022 •

edited

Loading

t-bltg commented Nov 25, 2022 •

edited

Loading

t-bltg commented Nov 26, 2022 •

edited

Loading