Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is slow for tables with a lot of columns (and cells) #68

Open
afshin opened this issue Feb 15, 2022 · 10 comments
Open

Performance is slow for tables with a lot of columns (and cells) #68

afshin opened this issue Feb 15, 2022 · 10 comments

Comments

@afshin
Copy link

afshin commented Feb 15, 2022

beakerx_tabledisplay performance takes a big hit and can freeze the browser when there are a lot of cells.

To reproduce this issue:

Set up

conda create -y -n slowtable
conda activate slowtable
conda install -y jupyterlab
pip install beakerx_tabledisplay
beakerx_tabledisplay install
jupyter lab

Execute

This is the code I ran to manifest the issue. Here is a zipped notebook containing the code below: slow-tabledisplay.ipynb.zip

Cell 1

import beakerx_tabledisplay
import pandas as pd
table = pd.DataFrame({k: range(10) for k in range(1000)})

Cell 2

display(table)

Note on JupyterLab 2 vs. 3

Interestingly, while this is slow in both JupyterLab 2.x and also JupyterLab 3.x, it seems slightly slower in JupyterLab 3, which causes this warning to arise in Firefox:

firefox-warning

@davidbrochart
Copy link

I wanted to see how the new kernel protocol over websocket improved the situation, but it is being implemented in JupyterLab 4.0, which is not compatible with this widget.
Is there a development version that would support JupyterLab 4.0?

@afshin
Copy link
Author

afshin commented Feb 16, 2022

Hi @davidbrochart! This extension doesn't have a 4.x version yet. It's something we should work on soon.

But I don't think the websocket changes will improve this because you can actually save the notebook with all of its outputs in the document and open it without a kernel or any websocket traffic and it'll still be slow. This particular slowness is definitely a front-end phenomenon.

@davidbrochart
Copy link

Thanks @afshin, I managed to install from source. I'll run some benchmark anyway.

@davidbrochart
Copy link

I accumulated all traffic over websocket in Jupyter Server, and I can see that the new protocol is 5.5 times faster on this example:

table = pd.DataFrame({k: range(1000) for k in range(1000)})

Total websocket traffic:

  • old protocol: 0.088 s
  • new protocol: 0.016 s

@afshin
Copy link
Author

afshin commented Feb 16, 2022

That's excellent and I'm glad to see this optimization! But even if the transfer was instantaneous, the front-end chokes trying to render this table.

@davidbrochart
Copy link

With a bigger table, the new protocol is now 18x faster:

table = pd.DataFrame({k: range(1000) for k in range(10_000)})

Total websocket traffic:

  • old protocol: 0.8726 s
  • new protocol: 0.04865 s

@davidbrochart
Copy link

In terms of data rate (from ZMQ to WebSocket), I got the following results on a consumer laptop (i7 @1.80 GHz, 16 GB of RAM):

  • old protocol: 53 MB/s
  • new protocol: 922 MB/s

@afshin
Copy link
Author

afshin commented Feb 22, 2022

Thanks for investigating this further, @davidbrochart!

Just to clarify, are your results that this slowness is a data transmission / parsing issue? My guess was that this is a client-side performance issue; is that an incorrect assumption?


Also, these figures might also be useful as a case study in your protocol alignment PR in JupyterLab.

@davidbrochart
Copy link

No, this doesn't explain the slowness of the beakerx display, which as you said is a front-end issue. Actually, these benchmarks were done with a table of 10_000_000 elements, which never displays in the notebook, but I can see that the transfer over the websocket is done on the server side.

@fcollonval
Copy link
Contributor

Bringing here a comparison of the same table using ipydatagrid that uses the Lumino DataGrid behind the scene too.

ipydatagrid_vs_beakerx

So this confirms the lumino datagrid can achieve good performance. Next step profiling the code to figure out the bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants