Performance is slow for tables with a lot of columns (and cells) #68

afshin · 2022-02-15T15:58:19Z

beakerx_tabledisplay performance takes a big hit and can freeze the browser when there are a lot of cells.

To reproduce this issue:

Set up

conda create -y -n slowtable
conda activate slowtable
conda install -y jupyterlab
pip install beakerx_tabledisplay
beakerx_tabledisplay install
jupyter lab

Execute

This is the code I ran to manifest the issue. Here is a zipped notebook containing the code below: slow-tabledisplay.ipynb.zip

Cell 1

import beakerx_tabledisplay
import pandas as pd
table = pd.DataFrame({k: range(10) for k in range(1000)})

Cell 2

display(table)

Note on JupyterLab 2 vs. 3

Interestingly, while this is slow in both JupyterLab 2.x and also JupyterLab 3.x, it seems slightly slower in JupyterLab 3, which causes this warning to arise in Firefox:

The text was updated successfully, but these errors were encountered:

davidbrochart · 2022-02-16T16:31:18Z

I wanted to see how the new kernel protocol over websocket improved the situation, but it is being implemented in JupyterLab 4.0, which is not compatible with this widget.
Is there a development version that would support JupyterLab 4.0?

afshin · 2022-02-16T17:30:45Z

Hi @davidbrochart! This extension doesn't have a 4.x version yet. It's something we should work on soon.

But I don't think the websocket changes will improve this because you can actually save the notebook with all of its outputs in the document and open it without a kernel or any websocket traffic and it'll still be slow. This particular slowness is definitely a front-end phenomenon.

davidbrochart · 2022-02-16T17:32:47Z

Thanks @afshin, I managed to install from source. I'll run some benchmark anyway.

davidbrochart · 2022-02-16T17:54:54Z

I accumulated all traffic over websocket in Jupyter Server, and I can see that the new protocol is 5.5 times faster on this example:

table = pd.DataFrame({k: range(1000) for k in range(1000)})

Total websocket traffic:

old protocol: 0.088 s
new protocol: 0.016 s

afshin · 2022-02-16T18:15:35Z

That's excellent and I'm glad to see this optimization! But even if the transfer was instantaneous, the front-end chokes trying to render this table.

davidbrochart · 2022-02-16T18:19:22Z

With a bigger table, the new protocol is now 18x faster:

table = pd.DataFrame({k: range(1000) for k in range(10_000)})

Total websocket traffic:

old protocol: 0.8726 s
new protocol: 0.04865 s

davidbrochart · 2022-02-22T17:53:31Z

In terms of data rate (from ZMQ to WebSocket), I got the following results on a consumer laptop (i7 @1.80 GHz, 16 GB of RAM):

old protocol: 53 MB/s
new protocol: 922 MB/s

afshin · 2022-02-22T22:37:00Z

Thanks for investigating this further, @davidbrochart!

Just to clarify, are your results that this slowness is a data transmission / parsing issue? My guess was that this is a client-side performance issue; is that an incorrect assumption?

Also, these figures might also be useful as a case study in your protocol alignment PR in JupyterLab.

davidbrochart · 2022-02-22T22:43:46Z

No, this doesn't explain the slowness of the beakerx display, which as you said is a front-end issue. Actually, these benchmarks were done with a table of 10_000_000 elements, which never displays in the notebook, but I can see that the transfer over the websocket is done on the server side.

fcollonval · 2022-02-23T15:45:26Z

Bringing here a comparison of the same table using ipydatagrid that uses the Lumino DataGrid behind the scene too.

So this confirms the lumino datagrid can achieve good performance. Next step profiling the code to figure out the bottleneck.

fcollonval mentioned this issue Feb 23, 2022

Add benchmark for large table #70

Merged

fcollonval mentioned this issue Mar 8, 2022

Update input node position only if visible #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance is slow for tables with a lot of columns (and cells) #68

Performance is slow for tables with a lot of columns (and cells) #68

afshin commented Feb 15, 2022

davidbrochart commented Feb 16, 2022

afshin commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

afshin commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

davidbrochart commented Feb 22, 2022

afshin commented Feb 22, 2022

davidbrochart commented Feb 22, 2022

fcollonval commented Feb 23, 2022

Performance is slow for tables with a lot of columns (and cells) #68

Performance is slow for tables with a lot of columns (and cells) #68

Comments

afshin commented Feb 15, 2022

Set up

Execute

Cell 1

Cell 2

Note on JupyterLab 2 vs. 3

davidbrochart commented Feb 16, 2022

afshin commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

afshin commented Feb 16, 2022

davidbrochart commented Feb 16, 2022

davidbrochart commented Feb 22, 2022

afshin commented Feb 22, 2022

davidbrochart commented Feb 22, 2022

fcollonval commented Feb 23, 2022