- Get minimal conda environment from RAPIDS-AI Container
- Do a set difference using
pip freeze
between RAPIDS-AI and Databricks for missing packages
Install conda-tree
conda install -c conda-forge conda-tree 'networkx>=2.5'
Generate minimal package
conda-tree leaves --export
Getting python package list from Databricks Runtime Release Notes by copy the table to a text file, then use these regex substitution to clean up
Find | Replace |
---|---|
\s+ |
nothing |
[\s\n]+(\d) |
(note space preceding) $1 |
[\s\n]+([a-z]) |
\n$1 |
Notes:
-
pipdeptree
output won't havexgboost
installed, needs to be installed manually -
GPU runtime has these NVIDIA library installed (need to check if RAPIDS-AI already has these):
- CUDA
- cuDNN
- NCCL
- TensorRT
-
Differences between CPU and GPU ML runtime:
ML CPU ML GPU prometheus-client
tensorflow-cpu
tensorflow
torch
CPUtorch
CUDAtorchvision
CPUtorchvision
CUDA
To get the minimal packages, use pipdeptree
in a databrick cluster notebook, excluding Databricks-exclusive libraries and default Python packages
pipdeptree --exclude pip,pipdeptree,setuptools,wheel,databricks-feature-store,databricks-automl-runtime --warn silence | grep -E '^\w+'