Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

fujikosu · 2020-07-21T03:38:16Z

I found that databricks-connect supports jupyter and I made sure that jupyterlab works with databricks-connect with this link.
https://docs.databricks.com/dev-tools/databricks-connect.html#jupyter
Which one should I use for using jupyterlab with databricks, this library or databricks-connect? Does the development on this repo continue? I'm wondering where the databricks team will be putting effort into to integrate jupyterlab.

bernhard-42 · 2020-07-26T11:11:19Z

@fujikosu A few thoughts on your question whether to use DB Connect or JupyterLab Integration - it depends on your needs:

Assume,

you want to work locally (e.g. because you need integration with a local source code revision system)
the code that you want to write is all Spark code (no single node scikit learn on the driver, no deep learning)
you don't need Spark progress bars
you don't need Jupyterlab extensions, e.g. for visualisation libs like plotly, bokeh
Then I'd recommend to go for DB Connect. It is an officially supported way of using Jupyter(lab) with Databricks

Assume

you want to use Databricks clusters for all different workloads (not only Spark, see above)
you do not want to use use libraries whose output heavily depends on Jupyter Lab capabilities like ipywidgets
you have no issues in notebooks being stored in the Databricks Control plane
Then I'd recommend Databricks notebooks. They are best integrated into all workloads (e.g. MLflow integration, structured streaming integration, only collaborative environment at the moment, ...)

Finally,

you want to work locally (e.g. because you need integration with a local source code revision system)
you want to use Databricks clusters for all different workloads (not only Spark, see above)
you want to use use libraries whose output heavily depend on Jupyter Lab capabilities like ipywidgets
you do have issues in notebooks being stored in the Databricks Control plane
you can live with using a tool with community support, but not official support by Databricks
you are allowed and able to access your Databricks cluster drivers via ssh
Then give Jupyterlab Integration a try, I am happy to help you.

I will continue to work on it, fix bugs and help users, however - to emphasise again - this project currently is not an official Databricks project.

Hope this helps

bernhard-42 added the documentation Improvements or additions to documentation label Jan 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

fujikosu commented Jul 21, 2020

bernhard-42 commented Jul 26, 2020

Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

Comments

fujikosu commented Jul 21, 2020

bernhard-42 commented Jul 26, 2020