Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow performance of create_annotated_heatmap for small dataset #2299

Closed
haphaeu opened this issue Mar 20, 2020 · 10 comments
Closed

Very slow performance of create_annotated_heatmap for small dataset #2299

haphaeu opened this issue Mar 20, 2020 · 10 comments

Comments

@haphaeu
Copy link

haphaeu commented Mar 20, 2020

This is taking over 4s to run, which seems excessively large for such small dataset:

import time
import numpy as np
import pandas as pd
import plotly.figure_factory as ff

dfi = pd.DataFrame(np.random.rand(3, 6))

t0 = time.time()

fig = ff.create_annotated_heatmap(
            z=dfi.values,
            x=dfi.columns.tolist(),
            y=dfi.index.tolist(),
            annotation_text=dfi.round(2).values,
        )

print(f'elapsed time {time.time()-t0:.3f} s')
@nicolaskruchten
Copy link
Contributor

Most of this time is spent in importing plotly.graph_objects... If you import this before you start timing you'll find that the figure factory itself is pretty fast.

The import speed is a known issue that we chip away at but have no clear path to resolving today, see e.g. #740 or #2174

@haphaeu
Copy link
Author

haphaeu commented Mar 20, 2020

But the import statement is left out of the timer. There's something else being imported after the call to create_annotated_heatmap the first time it runs.

This can be seen in the snippet below.

My further problem is that I'm using this within a dash callback, and it seems to import these objects every time, making every graph update take 4 seconds...

import time
import numpy as np
import pandas as pd
import plotly.figure_factory as ff


def create_fig(dfi):
    return ff.create_annotated_heatmap(
            z=dfi.values,
            x=dfi.columns.tolist(),
            y=dfi.index.tolist(),
            annotation_text=dfi.round(2).values,
        )

        
dfi = pd.DataFrame(np.random.rand(3, 6))

t0 = time.time()
create_fig(dfi)
print(f'elapsed time {time.time()-t0:.3f} s')

t0 = time.time()
create_fig(dfi)
print(f'elapsed time {time.time()-t0:.3f} s')

Output:

elapsed time 3.568 s
elapsed time 0.182 s

@nicolaskruchten
Copy link
Contributor

So in the output above the first one is slow, but the second one is quite fast... This should bear out for the third, fourth etc. Basically once things are loaded the performance should be quite good. This is really annoying for local development, however, admittedly.

@jonmmease
Copy link
Contributor

jonmmease commented Apr 10, 2020

Using Python 3.7 with PR at #2368, the code snippet in the original is much improved on my workstation:

plotly 4.6: 0.328 s
PR: 0.046 s
7x speedup.

@haphaeu
Copy link
Author

haphaeu commented May 8, 2020

Insisting a little on this one due to other factors not mentioned above. It seems that performance when run from a Dash app is not only related to importing.

I've recreated the snipped above, running in bare Python/plotly, and from within a Dash app.

There seems to be an 4 s overhead during first run when importing things. After that, heatmap creation takes 0.25 s to 0.3 s in bare python, while from Dash it takes 1.25 s to 1.3 s, hence around 5x slower.

Bare python/plotly:

elapsed time 4.270 s
elapsed time 0.250 s
elapsed time 0.343 s
elapsed time 0.371 s
elapsed time 0.241 s

From Dash:

Update took 5.778 s
Update took 1.258 s
Update took 1.342 s
Update took 1.306 s
Update took 1.233 s
Update took 1.261 s

And this is the snipped being time bench-marked from Dash. There are 4 dataframes (4 plots being created), each dataframe is flat 6 x 13.

    figs = list()   
 
    for limit, dfi in zip(limits, dfs):
    
        fig = ff.create_annotated_heatmap(
            z=dfi.values,
            x=dfi.columns.tolist(),
            y=dfi.index.tolist(),
            annotation_text=dfi.round(2).values,
            colorscale=colorscale(limit, dfi.max().max()),
        )
        fig.update_layout(
            xaxis=dict(title="Period [s]", dtick=1),
            yaxis=dict(title="Height [m]", dtick=0.25, autorange="reversed"),
            clickmode="event+select",
        )
        
        figs.append(fig)

@nicolaskruchten
Copy link
Contributor

@haphaeu can you confirm that these results were obtained using plotly version 4.7, which was just released and contains many performance enhancements?

@haphaeu
Copy link
Author

haphaeu commented May 8, 2020

No, those results were done with version 4.5.2. Well spotted.

Using same conda environment, I've updated only plotly, so all other packages are the same.

Here's the re-run with plotly=4.7.1:

Bare python/plotly:

elapsed time 1.167 s
elapsed time 0.036 s
elapsed time 0.038 s
elapsed time 0.066 s
elapsed time 0.074 s

From Dash:

Update took 1.120 s
Update took 0.172 s
Update took 0.177 s
Update took 0.164 s
Update took 0.201 s
Update took 0.188 s

Indeed significantly faster.

Thanks and nicely done!

@haphaeu haphaeu closed this as completed May 8, 2020
@nicolaskruchten
Copy link
Contributor

Great! Is this running on Python 3.7? If not, upgrading to Python 3.7 might increase import performance further :)

@haphaeu
Copy link
Author

haphaeu commented May 8, 2020

This is Python 3.8

@nicolaskruchten
Copy link
Contributor

Ah well, no free lunch there then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants