Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast image: add binary_string parameter to imshow #2691

Merged
merged 37 commits into from
Sep 3, 2020
Merged

Conversation

emmanuelle
Copy link
Contributor

@emmanuelle emmanuelle commented Aug 6, 2020

Work in progress!

This PR adds a use_binary_string parameter to imshow, which default value is

  • True for multichannel images (where the Image trace is always used)
  • False for single-channel images, in which case the Heatmap trace is used (when True, an Image trace is used)

I added another new parameter contrast_rescaling in order to reconcile the computation of zmin and zmax, which were computed differently for 2D/Heatmap and 3D/Image before (2D used the min and max of the array as Heatmap does, and 3D used 0 for the min and a max value deduced from the data type and the image range). The former behaviour corresponds to the image value of the newly introduced contrast_rescaling parameter, while dtype corresponds to the latter, and when the value is None it is set internally to ensure backwards compatibility with the current version.

To do

  • hovertemplate when using source
  • remove color from hover when data have been rescaled
  • update doc notebook

@emmanuelle
Copy link
Contributor Author

The scikit-image code which is used here will be vendored later on, but for now I imported the function from scikit-image for faster development.

@@ -405,6 +405,9 @@ jobs:
if [ "${CIRCLE_BRANCH}" != "doc-prod" ]; then
pip uninstall -y plotly
cd ../packages/python/plotly
# To be removed after plotly.js release
pip install inflect black tox
python3 setup.py updateplotlyjsdev --devbranch fast-image
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea here was to be able to take a look at the doc even when the corresponding feature is not released in plotly.js, but it does not work since the exported html asks for the regular plotly.js bundle so the feature is not available. Too bad! Or maybe I can find a way to link to another bundle...

@emmanuelle emmanuelle closed this Aug 7, 2020
@emmanuelle emmanuelle reopened this Aug 7, 2020
@nicolaskruchten
Copy link
Contributor

For the naming of contrast_rescaling how about z_bounds_mode and for the values how about "minmax" for the current heatmap logic and "heuristic" or "infer" or something like that for the current image logic?

Aside: I'm amused that basically we have two different sets of logic in this one function and we end up adding new parameters that have one value for one side and another value for the other side and the ability to force it either way :) Makes sense!

@emmanuelle
Copy link
Contributor Author

I've benchmarked Pillow and pypng for different images and different compression levels. The code of the benchmarks is

import numpy as np
from skimage import data
from plotly.express._imshow import _array_to_b64str
from time import time
import pandas as pd
import plotly.express as px

img_rgb = data.astronaut()

sparse_img = np.zeros((512, 512, 3), dtype=np.uint8)
sparse_img[100:100, 150:-150] = 200

img_scientific = data.retina()

images = [img_rgb, img_scientific, sparse_img]
sizes = [im.size for im in images]

timings = []

for mode, img, size in zip(
        ['natural', 'scientific', 'sparse'],
        images,
        sizes):
    for backend in ['png', 'pil']:
        for compression_level in range(0, 10):
            t_start = time()
            _ = (_array_to_b64str(img, backend=backend, compression=compression_level))
            t_end = time()
            timings.append({
                'type':mode, 
                'compression':compression_level,
                'backend': backend,
                'time': t_end - t_start,
                'size_reduction': size/len(_)
                })

df = pd.DataFrame(timings)
fig_1 = px.scatter(
        df,
        x='time', y='size_reduction',
        color='backend',
        hover_data=['compression'],
        height=400, width=900,
        facet_col='type',
        title='size reduction vs. time for a natural image (astronaut)')
fig_1.update_yaxes(matches=None, showticklabels=True)
fig_1.update_xaxes(matches=None)
fig_1.show()

fig_2 = px.scatter(
        df,
        x='compression', y='size_reduction',
        color='backend',
        hover_data=['compression'],
        height=400, width=900,
        facet_col='type',
        title='size reduction vs. time for a natural image (astronaut)')
fig_2.update_yaxes(matches=None, showticklabels=True)
fig_2.update_xaxes(matches=None)
fig_2.show()

fig_3 = px.scatter(
        df,
        x='compression', y='time',
        color='backend',
        hover_data=['compression'],
        height=400, width=900,
        facet_col='type',
        title='size reduction vs. time for a natural image (astronaut)')
fig_3.update_yaxes(matches=None, showticklabels=True)
fig_3.update_xaxes(matches=None)
fig_3.show()

and the figures are
image

Pillow is more efficient in the sense that it reaches a higher level of compression for a given time (except for the very sparse image for which pypng is a little bit better). Probably we should use Pillow when it's installed. How about having a backend argument in imshow with values auto, png and pil? Ideally we would also give the possibility to control the compression_level parameter. The default value of pil is 6, which is a bit overkill for this benchmark (the best trade-off is 4 for these cases).

@nicholas-esterer
Copy link
Contributor

I agree, it looks like Pillow is the bestest 👍
I'm all about benchmarks like this, this is great work!

@emmanuelle
Copy link
Contributor Author

I agree, it looks like Pillow is the bestest +1
I'm all about benchmarks like this, this is great work!

Thanks! And quite easy with px once you've got all your data in a dataframe! Although it would have been great here to have an option within px.scatter to unmatch the axes and leave a little more space for the ticks (@nicolaskruchten is this a parameter which we should add? It could make it easier for people to use px instead of graph_objects).

@nicolaskruchten
Copy link
Contributor

Great benchmark! I'd love to see this in a full Dash context with layout.images to weigh the tradeoff between png's speed and pillow's extra compression, especially after going through gzip compression and decompression on the other end. If it turns out that the optimum happens to be "less compression" then png's speed might make it a better fit. As we saw, the encoding-time was the biggest gain in this optimization, and the transfer time was not that important relatively speaking, so it could be that faster encoding dominates...

That said, I'm fine with a parameter that's auto/pillow/png also.

an option within px.scatter to unmatch the axes and leave a little more space for the ticks

This is pretty easy with .update_(x|y)axes(matches=None) and you can now leave more space for the ticks with facet_(row|col)_spacing ... this is all in the docs now too ;) https://plotly.com/python/facet-plots/

@emmanuelle
Copy link
Contributor Author

Oh yes I had forgotten about facet_col_spacing. I remembered the example about independent axes, maybe we should add one sentence about spacing or a link to the example with spacing in the independent axes example.

Also, png is not really faster than pil, if you take a look at the compression factor (ie the size of the original image divided by the length of the b64 string) vs time. In a diagram of time vs "compression level" (which is some index but it's not completely clear what this is), then png is faster but it achieves a smaller size reduction.

And yes it's interesting to try in a Dash app. Related to this, my previous Dash app demo was a bit flawed I think because I had used a tiled image (the same image on a 4x4 grid) and apparently png compression is capable of optimizing this kind of images quite a lot. I'll use the retina image (https://scikit-image.org/docs/stable/auto_examples/data/plot_scientific.html#sphx-glr-auto-examples-data-plot-scientific-py) for a Dash app, it's quite big (1440^2) with some sparsity, so it's quite representative of CZI-relevant images.

@nicolaskruchten
Copy link
Contributor

Yeah I know compression is an input and not an output but my point is that if the optimum is at level 0 then png dominates because it's faster... full dash benchmark will tell us :)

@emmanuelle
Copy link
Contributor Author

I think the PR is ready for a quick/temporary review, while the "real review" will happen after the source feature has been released with plotly js.

@nicolaskruchten
Copy link
Contributor

Can we get the CI to pass plz? I think there's a dependency issue with either png or PIL

@emmanuelle
Copy link
Contributor Author

Can we get the CI to pass plz? I think there's a dependency issue with either png or PIL

I think we can't get the CI to pass before the js release? Or am I missing something?

@nicolaskruchten
Copy link
Contributor

I think we can't get the CI to pass before the js release?

in principle yes but some of the error messages here seem to be from something other than "can't find the validator" :)

@nicolaskruchten
Copy link
Contributor

Just doing this results in no z or color in the hoverlabel, which is a bit of a downer... intentional?

import plotly.express as px
import numpy as np
img = np.arange(100).reshape((10, 10))
fig = px.imshow(img, binary_string=True)
fig.show()

@emmanuelle
Copy link
Contributor Author

Just doing this results in no z or color in the hoverlabel, which is a bit of a downer... intentional?

import plotly.express as px
import numpy as np
img = np.arange(100).reshape((10, 10))
fig = px.imshow(img, binary_string=True)
fig.show()

Good question. It is intentional, but subject to debate. The idea is not to display z/color if the numerical value is different from the original array element, because it can be misleading.

If you do

import plotly.express as px
import numpy as np
img = np.arange(100).reshape((10, 10)).astype(np.uint8)
fig = px.imshow(img, binary_string=True, contrast_rescaling='infer')
fig.show()

then you will have the hover because there is no rescaling.

The alternative would be to always display z, and add an example in the imshow tutorial warning that it's not the same value and showing how to modify the hovertemplate.

@nicolaskruchten
Copy link
Contributor

It is intentional, but subject to debate

Right, OK, I remember this now. The current behaviour is good, but I think should be mentioned in the docs if possible please.


```python
import plotly.express as px
from skimage import data
img = data.astronaut()
# Increase contrast by clipping the data range between 50 and 200
fig = px.imshow(img, zmin=50, zmax=200)
fig = px.imshow(img, zmin=50, zmax=200, binary_string=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should explain why the binary_string=False is necessary here, and/or explain what happens if you don't do it... in this case the hover ends up different, right? We don't do it in the example immediately below, though, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this example so that it does not use binary_string, which has not been introduced for imshow at this point of the tutorial, and then I added two examples about rescaling and hover in the section on imshow and binary_string.

@emmanuelle
Copy link
Contributor Author

Some updates:

  • I vendored the pypng code (just one file! 2000+ LOC though)
  • for the CI I fixed some dependency problems in the test and the current failures come from the fact that the plotly js version is a "dirty" one.

@nicolaskruchten nicolaskruchten merged commit 96cebe3 into master Sep 3, 2020
@nicolaskruchten
Copy link
Contributor

OK I fixed up the CI failures due to too-modern code in the inlined stuff for Python 2.7 and 3.5 tests to pass, and I updated Plotly.js and fixed up those failures in the docs. The remaining failures are CI artifacts that should go away on master so I also merged :)

@emmanuelle if you could add a changelog entry (straight onto master would be fine :) that'd be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants