Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to heatmap path #4549

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

BioTurboNick
Copy link
Member

@BioTurboNick BioTurboNick commented Nov 24, 2022

Partially addresses #4520

Type inferrence is often a big issue here:

const mat = randn(512, 512);
function f(x, n)
    plot()
    for _ in 1:n
        heatmap!(x)
    end
end
@btime f(mat, 100)

Master: 1.151 s (26455855 allocations: 614.24 MiB)
Adding one type annotation to `expand_extrema!(a::Axis, Surf::Surface)`: 546.681 ms (241455 allocations: 214.24 MiB)

EDIT: Looks like the switch to foreach in that method in the master branch actually sidestepped the inference issue, but the annotation is probably for the best.

Noticed that expand_extrema was constructing and discarding a vector of all heatmap square edges when it just needed the limits.

Before: 449.488 ms (42563 allocations: 206.52 MiB)
After: 444.067 ms (30763 allocations: 205.35 MiB)

As number of series grows, a lot of allocations are spent constructing color gradients - even when they're all the same - I changed it to look up the original object instead.

After: 446.839 ms (28363 allocations: 202.14 MiB

A big time sink is having to check for infinites when calculating the extrema. If I remove that check:

339.531 ms (28363 allocations: 202.14 MiB)

Borrowing the approach from NaNMath's functions, we can get this closer:

364.563 ms (28363 allocations: 202.14 MiB)

The NaNMath.extrema function (used by the colorbar limits) is also pretty slow; replacing it with minimum(x), maximum(x) we get

187.983 ms (28363 allocations: 202.14 MiB)

@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Base: 90.96% // Head: 90.93% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (5fc7a06) compared to base (17e7d4c).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 5fc7a06 differs from pull request most recent head f9fdf4a. Consider uploading reports for the commit f9fdf4a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4549      +/-   ##
==========================================
- Coverage   90.96%   90.93%   -0.03%     
==========================================
  Files          40       40              
  Lines        7744     7753       +9     
==========================================
+ Hits         7044     7050       +6     
- Misses        700      703       +3     
Impacted Files Coverage Δ
src/axes.jl 90.61% <100.00%> (ø)
src/components.jl 90.07% <100.00%> (ø)
src/utils.jl 92.55% <100.00%> (-0.39%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@BioTurboNick
Copy link
Member Author

BioTurboNick commented Nov 24, 2022

Any idea why colorbars use a different extrema logic than the spatial axis limits? @t-bltg

Colorbars ignore nans, axis limits ignore infinites

EDIT: Infs break the colorbar display, so I'm thinking it should skip them too.

@t-bltg
Copy link
Member

t-bltg commented Nov 25, 2022

Any idea why colorbars use a different extrema logic than the spatial axis limits ?

Clueless :)

If the current test suite passes with your modifications, I'd say go for it.
Cutting down run time to 50% is a nice improvement.
Thanks for working on this, Plots needs consolidation.

@t-bltg
Copy link
Member

t-bltg commented Nov 25, 2022

It would be nice to fix #2410 along the way ;)

@BioTurboNick
Copy link
Member Author

I've tried immutable Extrema objects and other organizations; I often end up losing some optimization the compiler is doing in the current state of this PR.

I've poked around at this quite a bit right now, and I think we're up against the minimum possible time if we hold these invariants:

  • ignore nans, infinites
  • extrema guaranteed to exactly fit all values

If feels ridiculous that the vast majority of computation time, for large numbers of large series, is spent finding extrema.

Idea - Provide an option to control extrema-finding:

  • Exact (current)
  • Sampling (strided subset?)
  • Disabled (not performed, depends entirely on provided limits)

Sampling would maybe become the default, taking in all data up to a threshold and then sampling the rest. Strided would likely be more performant and deterministic than random, though slight danger of a patterned input missing important values.

Thoughts?

@t-bltg
Copy link
Member

t-bltg commented Nov 26, 2022

If feels ridiculous that the vast majority of computation time, for large numbers of large series, is spent finding extrema.

Agreed. Traversing large series can be expensive, but it is sometimes needed.
I've recently added such a traversal (https://github.com/JuliaPlots/Plots.jl/blob/master/src/pipeline.jl#L156-L159) in this bug fix PR #4503.

Providing an option is indeed a good idea, since OPs problem is a bit of a niche one.
IIUC, his data mostly consists of NaNs, with small non overlapping patches. Overlaying these patches creates the final heatmap. This is not how we regularly proceed. We usually create the final heatmap before plotting it (single series), by assembling patches.
The option would retain the current behavior as default and could be toggled upon performance issues. Considering this problem is a niche one, I'd like the "performance" option to be generic for other series too, and not only dedicated to heatmap plotting problems (maybe relevant for mesh3d patches, or large scatters).

OP provided another case, which would be worth investigating (are the slow paths the same as for heatmap ?):

x = randn(100, 1_000)
y = randn(100, 1_000)
z = randn(100, 1_000)
scatter(x, y, zcolor=z)  # slowish
scatter(vec(x), vec(y), zcolor=vec(z))  # fast ?

You can maybe find inspiration from #4536, which uses a deterministic "distance to series" algorithm for computing the legend position.

@BioTurboNick
Copy link
Member Author

Yes, it's almost entirely in paths that affect all plots - updating colorbar limits and axis extents.

If I just simply say "take the first 1024 points, then sample ~1024 more in stride" for both I drastically cut the computation time.

65.056 ms (28453 allocations: 202.16 MiB)

@BioTurboNick
Copy link
Member Author

I noticed that during the (I think last stage?) of series recipes, the data is copied out and also infinites and missings are replaced with NaNs at the same time.

So a couple notes:

  1. Could we consolidate these passes, checking and storing the extrema at the same time as copying the data?
  2. If we're guaranteed to not have infinites after that first pass, we can skip checking for them later, which would also save some time. Checking for nans is still a bit slow though.

@t-bltg
Copy link
Member

t-bltg commented Nov 28, 2022

Could we consolidate these passes, checking and storing the extrema at the same time as copying the data

Of course.

If we're guaranteed to not have infinites after that first pass, we can skip checking for them later, which would also save some time.

I agree that this is the way to go: store some minimal statistics about a series in a struct or a nt e.g. extrema, has/has not nans, has/has not inf, ...

@t-bltg t-bltg added the performance speedups and slowdowns label Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance speedups and slowdowns
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants