Interactively rendering ~10 million points using datashader #6148

ahnsws · 2023-08-14T17:37:45Z

🚀 Feature

Hello everyone, I'm not sure what work's been done on optimizing the Points layer, but I thought I'd try and give a go at using datashader to render into an image that napari can handle.

Motivation

We work with large segmented images that have millions of cells, and visualizing them somehow would be great.

Pitch

I threw together a script as a proof of concept, with the gif below:

Here's the script. I'm not really familiar with vispy or napari Layers, so please let me know if you see something that doesn't fit the intended API.

Edit: I moved the script to this gist

A few notes:

The above is of 1e7 points, but it gets a little laggy. 1e6 points is pretty close to 60 fps.
I have also tried rasterizing polygons with datashader, but it is significantly slower.
datashader's shade and dynspread can improve the interpretation of dense parts of the image, but I didn't include them here because they ended up being too slow.
Although datashader uses numba-optimized python code, maybe some better algorithms/dask parallelization on the spatialpandas/datashader side can improve performance.
At high resolution, the individual points appear as single pixels; maybe it'd be possible to seamlessly switch between datashader's downsampled view and an alternative view with better rendering of points at certain zoom levels.

I'd love to hear your thoughts!

melonora · 2023-09-19T13:02:06Z

Hi @ahnsws,

I am sorry that so far we didn't reply yet. Looks awesome, thank you! Would you be willing to join one of the community meetings some time to discuss? You can find the schedule here. Also, we have a zulip where I also post the agenda just a bit before the meeting to which people can freely add items.

psobolewskiPhD · 2023-09-19T14:22:57Z

To echo what @melonora said: I'm one of the rocket emoji, but forgot to followup!
I hope you can make a community meeting to give us more context.
Looking at the datashader docs a smidge I almost see this as a different layer type than Points? More of a heat map? Could be really amazing to represent certain features, like classifier output, particularly on multiscale images?

ahnsws · 2023-09-20T00:44:00Z

Hi @melonora, I'd be happy to! Although the one tomorrow at 3am EST (my timezone) might be hard to join, so I will plan on joining the one next week (on 9/27).

Hi @psobolewskiPhD, yes exactly, this layer is kind of in between an Image and a Points... because there are so many more points than pixels when zoomed out, datashader acts as a 2D histogram, and I believe this is actually the intended use case. But I think it would be very useful if it acted as a true Points layer when zoomed in enough to see individual points.

I don't see why you couldn't use this to visualize other dynamically computed features based on the viewport dimensions and position; basically, anything that has the signature of the _update_draw method. Actually, I had hoped that millions of polygons (and not just points) could be visualized, which datashader does support, but is too slow for the time being, even with spatialpandas and its fancy spatial indexing. I wonder what qupath is doing there, since it is capable of buttery smooth interactivity even with millions of polygons.

Here are some docs that inspired me:

melonora · 2023-09-20T03:06:20Z

@ahnsws Awesome as I won't be able to join today either haha. I will add you to the agenda for next week.

melonora · 2023-09-20T03:42:51Z

I am familiar with the datashader in interactive plotting libraries like bokeh. I noticed there the same thing that it can get laggy when going above 10 million pixels. I could be wrong here but I believe this is because for creating the pixel buffer rendering it still depends on the whole dataset.
I see the point of it being inbetween Image and Points, though ultimately the data source is still points.
@andy-sweet @kephale I think it would be interesting to see how this compares to tiled rendering / async. In any case the approach of datashader wouldn't require a chunked data format. What do you think?

melonora · 2023-09-26T23:40:22Z

Hi @ahnsws, just as a reminder, tomorrow at 8:30 am PT. Hope to see you there!

Agenda: https://hackmd.io/BXWDZ3i8Q6OAEASrkaSNIQ
Zoom link: https://czi.zoom.us/j/759708263

ahnsws · 2023-09-27T00:45:46Z

Hi @melonora, yes planning on it, thank you for the links!

ahnsws · 2023-09-27T16:18:07Z

@melonora thanks for inviting me to the community meeting.

Here is a gist for reproducing the polygons example, using the NYC buildings dataset (zip; download warning).

Edit: the dataset has around 1 million polygons, and performance seems to depend on the datashader aggregation method.

And to clarify in general, the code just asks napari for the canvas dimensions, then asks datashader to make a 2D histogram of the geometries, and then asks napari to render that histogram directly, without any transforms.

aganders3 · 2023-09-27T17:12:59Z

Thanks @ahnsws for showing this off at the community meeting today and nice to meet you.

I am curious where the bottleneck is in napari, so started looking into it (points layer specifically). I can't even create a points layer in reasonable time with 10M points 😅. However just playing around it seems that vispy can handle it.

For example try this - first create a layer with only the first 10k points:

import numpy as np
qt_viewer = viewer.window.qt_viewer

rng = np.random.default_rng(seed=0)
N = 10_000_000
points = rng.standard_normal(size=(N, 2)) * 1000

# only tell napari about the first 0.1% of points
layer = viewer.add_points(points[:int(0.001 * N)])

then in the napari console update the visual directly to show all the points:

visual = qt_viewer.layer_to_visual[layer]
visual.node._subvisuals[0].set_data(points[:, ::-1])

This remains about as fast as the datashader example for me and suggests there may be a way to achieve this level of interactivity with fewer changes. I think some more tweaks (perhaps in vispy) could also improve the appearance.

Anyway as mentioned above adding some type of support for the (very nice looking) datashader representations has other benefits/applications, but I found this interesting.

andy-sweet · 2023-09-27T23:23:10Z

I played around a bit with the image layer data protocol approach and got something kind of working. It's a little shorter than the approach above, but has some issues and also needs to access parts of the private API, so it's not really any better (maybe actually worse!).

Agreed with @aganders3 that there's something suspicious about the surprisingly slow behavior in napari. I poked around a bit more and did some profiling and found the main offending line to be in the symbols setter which is called in Points.__init__. The setter took an average of 11.2 seconds for 10^6 points - I didn't try for 10^7. Replacing that line with just self._symbol = symbol makes the setter take an average of 11.8 microseconds with the same number of points. I'll write up a separate issue to track that.

After changing that line, interacting with 10^6 points is pretty good on my machine (2020 Intel macbook pro). 10^7 kinda works, but it's a real struggle to do much. Whereas the example above works pretty well for me!

ahnsws added the feature New feature or request label Aug 14, 2023

andy-sweet mentioned this issue Sep 28, 2023

Adding one layer with many points is surprisingly slow #6275

Closed

ClementCaporal mentioned this issue Feb 6, 2024

Add How-to / Best practice around large datasets napari/docs#350

Open

LucaMarconato mentioned this issue Feb 9, 2024

Feature: on-the-fly rasterization of grid like data scverse/napari-spatialdata#194

Open

kephale mentioned this issue Feb 29, 2024

Support for Localization Microscopy data #1783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactively rendering ~10 million points using datashader #6148

Interactively rendering ~10 million points using datashader #6148

ahnsws commented Aug 14, 2023 •

edited

Loading

melonora commented Sep 19, 2023

psobolewskiPhD commented Sep 19, 2023

ahnsws commented Sep 20, 2023

melonora commented Sep 20, 2023

melonora commented Sep 20, 2023

melonora commented Sep 26, 2023 •

edited

Loading

ahnsws commented Sep 27, 2023

ahnsws commented Sep 27, 2023 •

edited

Loading

aganders3 commented Sep 27, 2023

andy-sweet commented Sep 27, 2023 •

edited

Loading

Interactively rendering ~10 million points using datashader #6148

Interactively rendering ~10 million points using datashader #6148

Comments

ahnsws commented Aug 14, 2023 • edited Loading

🚀 Feature

Motivation

Pitch

melonora commented Sep 19, 2023

psobolewskiPhD commented Sep 19, 2023

ahnsws commented Sep 20, 2023

melonora commented Sep 20, 2023

melonora commented Sep 20, 2023

melonora commented Sep 26, 2023 • edited Loading

ahnsws commented Sep 27, 2023

ahnsws commented Sep 27, 2023 • edited Loading

aganders3 commented Sep 27, 2023

andy-sweet commented Sep 27, 2023 • edited Loading

ahnsws commented Aug 14, 2023 •

edited

Loading

melonora commented Sep 26, 2023 •

edited

Loading

ahnsws commented Sep 27, 2023 •

edited

Loading

andy-sweet commented Sep 27, 2023 •

edited

Loading