Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactively rendering ~10 million points using datashader #6148

Open
ahnsws opened this issue Aug 14, 2023 · 10 comments
Open

Interactively rendering ~10 million points using datashader #6148

ahnsws opened this issue Aug 14, 2023 · 10 comments
Labels
feature New feature or request

Comments

@ahnsws
Copy link

ahnsws commented Aug 14, 2023

🚀 Feature

Hello everyone, I'm not sure what work's been done on optimizing the Points layer, but I thought I'd try and give a go at using datashader to render into an image that napari can handle.

Motivation

We work with large segmented images that have millions of cells, and visualizing them somehow would be great.

Pitch

I threw together a script as a proof of concept, with the gif below:
Peek 2023-08-14 13-24

Here's the script. I'm not really familiar with vispy or napari Layers, so please let me know if you see something that doesn't fit the intended API.

Edit: I moved the script to this gist

A few notes:

  • The above is of 1e7 points, but it gets a little laggy. 1e6 points is pretty close to 60 fps.
  • I have also tried rasterizing polygons with datashader, but it is significantly slower.
  • datashader's shade and dynspread can improve the interpretation of dense parts of the image, but I didn't include them here because they ended up being too slow.
  • Although datashader uses numba-optimized python code, maybe some better algorithms/dask parallelization on the spatialpandas/datashader side can improve performance.
  • At high resolution, the individual points appear as single pixels; maybe it'd be possible to seamlessly switch between datashader's downsampled view and an alternative view with better rendering of points at certain zoom levels.

I'd love to hear your thoughts!

@ahnsws ahnsws added the feature New feature or request label Aug 14, 2023
@melonora
Copy link
Contributor

Hi @ahnsws,

I am sorry that so far we didn't reply yet. Looks awesome, thank you! Would you be willing to join one of the community meetings some time to discuss? You can find the schedule here. Also, we have a zulip where I also post the agenda just a bit before the meeting to which people can freely add items.

@psobolewskiPhD
Copy link
Member

To echo what @melonora said: I'm one of the rocket emoji, but forgot to followup!
I hope you can make a community meeting to give us more context.
Looking at the datashader docs a smidge I almost see this as a different layer type than Points? More of a heat map? Could be really amazing to represent certain features, like classifier output, particularly on multiscale images?

@ahnsws
Copy link
Author

ahnsws commented Sep 20, 2023

Hi @melonora, I'd be happy to! Although the one tomorrow at 3am EST (my timezone) might be hard to join, so I will plan on joining the one next week (on 9/27).

Hi @psobolewskiPhD, yes exactly, this layer is kind of in between an Image and a Points... because there are so many more points than pixels when zoomed out, datashader acts as a 2D histogram, and I believe this is actually the intended use case. But I think it would be very useful if it acted as a true Points layer when zoomed in enough to see individual points.

I don't see why you couldn't use this to visualize other dynamically computed features based on the viewport dimensions and position; basically, anything that has the signature of the _update_draw method. Actually, I had hoped that millions of polygons (and not just points) could be visualized, which datashader does support, but is too slow for the time being, even with spatialpandas and its fancy spatial indexing. I wonder what qupath is doing there, since it is capable of buttery smooth interactivity even with millions of polygons.

Here are some docs that inspired me:

@melonora
Copy link
Contributor

@ahnsws Awesome as I won't be able to join today either haha. I will add you to the agenda for next week.

@melonora
Copy link
Contributor

I am familiar with the datashader in interactive plotting libraries like bokeh. I noticed there the same thing that it can get laggy when going above 10 million pixels. I could be wrong here but I believe this is because for creating the pixel buffer rendering it still depends on the whole dataset.
I see the point of it being inbetween Image and Points, though ultimately the data source is still points.
@andy-sweet @kephale I think it would be interesting to see how this compares to tiled rendering / async. In any case the approach of datashader wouldn't require a chunked data format. What do you think?

@melonora
Copy link
Contributor

melonora commented Sep 26, 2023

Hi @ahnsws, just as a reminder, tomorrow at 8:30 am PT. Hope to see you there!

Agenda: https://hackmd.io/BXWDZ3i8Q6OAEASrkaSNIQ
Zoom link: https://czi.zoom.us/j/759708263

@ahnsws
Copy link
Author

ahnsws commented Sep 27, 2023

Hi @melonora, yes planning on it, thank you for the links!

@ahnsws
Copy link
Author

ahnsws commented Sep 27, 2023

@melonora thanks for inviting me to the community meeting.

Here is a gist for reproducing the polygons example, using the NYC buildings dataset (zip; download warning).

Edit: the dataset has around 1 million polygons, and performance seems to depend on the datashader aggregation method.

And to clarify in general, the code just asks napari for the canvas dimensions, then asks datashader to make a 2D histogram of the geometries, and then asks napari to render that histogram directly, without any transforms.

@aganders3
Copy link
Contributor

Thanks @ahnsws for showing this off at the community meeting today and nice to meet you.

I am curious where the bottleneck is in napari, so started looking into it (points layer specifically). I can't even create a points layer in reasonable time with 10M points 😅. However just playing around it seems that vispy can handle it.

For example try this - first create a layer with only the first 10k points:

import numpy as np
qt_viewer = viewer.window.qt_viewer

rng = np.random.default_rng(seed=0)
N = 10_000_000
points = rng.standard_normal(size=(N, 2)) * 1000

# only tell napari about the first 0.1% of points
layer = viewer.add_points(points[:int(0.001 * N)])

then in the napari console update the visual directly to show all the points:

visual = qt_viewer.layer_to_visual[layer]
visual.node._subvisuals[0].set_data(points[:, ::-1])

This remains about as fast as the datashader example for me and suggests there may be a way to achieve this level of interactivity with fewer changes. I think some more tweaks (perhaps in vispy) could also improve the appearance.

Anyway as mentioned above adding some type of support for the (very nice looking) datashader representations has other benefits/applications, but I found this interesting.

@andy-sweet
Copy link
Member

andy-sweet commented Sep 27, 2023

I played around a bit with the image layer data protocol approach and got something kind of working. It's a little shorter than the approach above, but has some issues and also needs to access parts of the private API, so it's not really any better (maybe actually worse!).

Agreed with @aganders3 that there's something suspicious about the surprisingly slow behavior in napari. I poked around a bit more and did some profiling and found the main offending line to be in the symbols setter which is called in Points.__init__. The setter took an average of 11.2 seconds for 10^6 points - I didn't try for 10^7. Replacing that line with just self._symbol = symbol makes the setter take an average of 11.8 microseconds with the same number of points. I'll write up a separate issue to track that.

After changing that line, interacting with 10^6 points is pretty good on my machine (2020 Intel macbook pro). 10^7 kinda works, but it's a real struggle to do much. Whereas the example above works pretty well for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants