Skip to content
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.

Any guidance around whether jupyterlab-integration or jupyterlab with databricks-connect? #14

Open
fujikosu opened this issue Jul 21, 2020 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@fujikosu
Copy link

I found that databricks-connect supports jupyter and I made sure that jupyterlab works with databricks-connect with this link.
https://docs.databricks.com/dev-tools/databricks-connect.html#jupyter
Which one should I use for using jupyterlab with databricks, this library or databricks-connect? Does the development on this repo continue? I'm wondering where the databricks team will be putting effort into to integrate jupyterlab.

@bernhard-42
Copy link
Contributor

@fujikosu A few thoughts on your question whether to use DB Connect or JupyterLab Integration - it depends on your needs:

Assume,

  • you want to work locally (e.g. because you need integration with a local source code revision system)
  • the code that you want to write is all Spark code (no single node scikit learn on the driver, no deep learning)
  • you don't need Spark progress bars
  • you don't need Jupyterlab extensions, e.g. for visualisation libs like plotly, bokeh
    Then I'd recommend to go for DB Connect. It is an officially supported way of using Jupyter(lab) with Databricks

Assume

  • you want to use Databricks clusters for all different workloads (not only Spark, see above)
  • you do not want to use use libraries whose output heavily depends on Jupyter Lab capabilities like ipywidgets
  • you have no issues in notebooks being stored in the Databricks Control plane
    Then I'd recommend Databricks notebooks. They are best integrated into all workloads (e.g. MLflow integration, structured streaming integration, only collaborative environment at the moment, ...)

Finally,

  • you want to work locally (e.g. because you need integration with a local source code revision system)
  • you want to use Databricks clusters for all different workloads (not only Spark, see above)
  • you want to use use libraries whose output heavily depend on Jupyter Lab capabilities like ipywidgets
  • you do have issues in notebooks being stored in the Databricks Control plane
  • you can live with using a tool with community support, but not official support by Databricks
  • you are allowed and able to access your Databricks cluster drivers via ssh
    Then give Jupyterlab Integration a try, I am happy to help you.

I will continue to work on it, fix bugs and help users, however - to emphasise again - this project currently is not an official Databricks project.

Hope this helps

@bernhard-42 bernhard-42 added the documentation Improvements or additions to documentation label Jan 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants