Skip to content
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.

dbutils.library support #13

Open
vkrot-exos opened this issue Jul 15, 2020 · 2 comments
Open

dbutils.library support #13

vkrot-exos opened this issue Jul 15, 2020 · 2 comments

Comments

@vkrot-exos
Copy link

Hi,
Are there any plans to add support for dbutils.library module? Right now simple dbutils.library.help("install") produces an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-28186010f9f4> in <module>
----> 1 dbutils.library.help("install")

AttributeError: 'DbjlUtils' object has no attribute 'library'

In ML runtime there is also a great magic %pip - https://docs.databricks.com/notebooks/notebooks-python-libraries.html#enable-pip-and-conda-magic-commands
It installs libraries both to driver and executor nodes.
In contrast, when running %pip install inside jupyterlab notebook connected to databricks cluster - it installs libraries only on driver node. Which makes it unusable in case of udfs, cause executors need same libraries also.
Could you suggest any workaround? Or maybe there are some plans to bring such support to jupyterlab-integration?

Any way to install notebook scoped libraries interactively without init scripts?

Thanks in advance

@bernhard-42
Copy link
Contributor

Hi @vkrot-exos
agreed, notebook scoped libraries (dbutils, %pip, %conda) would be amazing and yes, I have it on my roadmap.
It just turned out to be not that simple and I need to do more research to find a way to support it ...

At least I know now that there is another person who would love to see them, too :-)

@vkrot-exos
Copy link
Author

@bernhard-42 , great to hear!
maybe you have some very rough estimates when you're gonna start the research?

now I'm investigating whether to use databricks notebooks or private jupyterhub with this integration library for data scientists.
Databricks notebooks are cool, but all notebooks are stored in databricks control plane and are not that well integrated with Github. Using private JupyterHub gives more flexibility in terms of building workflows/fine grained permissions etc.
But this %pip, dbutils.libraries feature is really missing.

Do you know, what else is missing in Jupyterlab-integration compared to Databricks Notebooks environment?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants