Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable model custom dependency installation using virtual environment #2910

Merged
merged 18 commits into from
Feb 9, 2024

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Jan 26, 2024

Description

The current implementation to handle installation of custom dependencies using a requirements.txt file installs packages to a target directory as follows:

commandParts.add(pythonRuntime);
commandParts.add("-m");
commandParts.add("pip");
commandParts.add("install");
commandParts.add("-U");
commandParts.add("-t");
commandParts.add(dependencyPath.getAbsolutePath());
commandParts.add("-r");
commandParts.add(requirementsFilePath.toString());

And includes the target directory in the PYTHONPATH environment variable:

File dependencyPath = new File(modelPath);
if (Files.isSymbolicLink(dependencyPath.toPath())) {
pythonPath
.append(dependencyPath.getParentFile().getAbsolutePath())
.append(File.pathSeparatorChar);
}
}

The outcome of this approach is:

  1. The custom dependencies specified in requirements.txt along with a copy of all their dependencies are installed to the target directory irrespective of the packages and their dependencies already available in the base python environment(site-packages).
  2. The most up to date supported dependencies of the packages in requirements.txt is installed in the target directory irrespective of a supported version already being present in the base python environment(site-packages).

The above approach can be improved for the following reasons:

  1. Supported dependencies already present in the base python environment can be reused and don't need to be downloaded and re-installed.
  2. In containerized environments that ship with specific package versions, for ex: specific version of torch binary that supports the combination of cuda version + GPU driver version on the target platform, it is desirable to use the pip package already available in the base python environment rather than upgrade to the most up to date version which may break compatibility with existing packages.

For example:
In the pytorch inference deep learning container 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:2.1.0-gpu-py310 from https://github.com/aws/deep-learning-containers/blob/master/available_images.md

$ pip freeze | grep torch
sagemaker-pytorch-inference==2.0.21
torch @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=93960a77d4b72fb6d32036912d8193a3a159b1b38f342c4e2b5ac82d279eff5a
torch-model-archiver @ file:///serve/model-archiver
torch-workflow-archiver @ file:///serve/workflow-archiver
torchaudio @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=56b8ca0bd6b72edb55fbd06114d94e9d3f3c4daf8d456a0dc929d072105d75ee
torchdata @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchdata-0.7.0%2B7c7597b-cp310-cp310-linux_x86_64.whl#sha256=f33afa5a8f8f6979fb7f35cd53e61da1da81a6be852985c2bf6cb4c9bb7fed94
torchserve @ file:///serve
torchtext @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchtext-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=2fe28a2da3a194e553eeda3683955f4a0821ba732fe10bcb5895c3293525807f
torchvision @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=bf8bfb5351e8d02591bf8833ca6df7621f060d4cb238d6c32511579e53507acb

$ cat test_req.txt 
segment-anything-py==1.0
opencv-python-headless==4.7.0.68
matplotlib==3.6.3

$ mkdir /tmp/deps
$ pip install -U --upgrade-strategy only-if-needed -t /tmp/deps -r test_req.txt

$ ls /tmp/deps
Jinja2-3.1.3.dist-info              idna                                       nvidia_cuda_nvrtc_cu12-12.1.105.dist-info    segment_anything
MarkupSafe-2.1.4.dist-info          idna-3.6.dist-info                         nvidia_cuda_runtime_cu12-12.1.105.dist-info  segment_anything_py-1.0.dist-info
PIL                                 isympy.py                                  nvidia_cudnn_cu12-8.9.2.26.dist-info         share
__pycache__                         jinja2                                     nvidia_cufft_cu12-11.0.2.54.dist-info        six-1.16.0.dist-info
bin                                 kiwisolver                                 nvidia_curand_cu12-10.3.2.106.dist-info      six.py
certifi                             kiwisolver-1.4.5.dist-info                 nvidia_cusolver_cu12-11.4.5.107.dist-info    sympy
certifi-2023.11.17.dist-info        markupsafe                                 nvidia_cusparse_cu12-12.1.0.106.dist-info    sympy-1.12.dist-info
charset_normalizer                  matplotlib                                 nvidia_nccl_cu12-2.18.1.dist-info            torch
charset_normalizer-3.3.2.dist-info  matplotlib-3.6.3-py3.10-nspkg.pth          nvidia_nvjitlink_cu12-12.3.101.dist-info     torch-2.1.2.dist-info
contourpy                           matplotlib-3.6.3.dist-info                 nvidia_nvtx_cu12-12.1.105.dist-info          torchgen
contourpy-1.2.0.dist-info           mpl_toolkits                               opencv_python_headless-4.7.0.68.dist-info    torchvision
cv2                                 mpmath                                     opencv_python_headless.libs                  torchvision-0.16.2.dist-info
cycler                              mpmath-1.3.0.dist-info                     packaging                                    torchvision.libs
cycler-0.12.1.dist-info             networkx                                   packaging-23.2.dist-info                     triton
dateutil                            networkx-3.2.1.dist-info                   pillow-10.2.0.dist-info                      triton-2.1.0.dist-info
filelock                            numpy                                      pillow.libs                                  typing_extensions-4.9.0.dist-info
filelock-3.13.1.dist-info           numpy-1.26.3.dist-info                     pylab.py                                     typing_extensions.py
fontTools                           numpy.libs                                 pyparsing                                    urllib3
fonttools-4.47.2.dist-info          nvfuser                                    pyparsing-3.1.1.dist-info                    urllib3-2.1.0.dist-info
fsspec                              nvidia                                     python_dateutil-2.8.2.dist-info
fsspec-2023.12.2.dist-info          nvidia_cublas_cu12-12.1.3.1.dist-info      requests
functorch                           nvidia_cuda_cupti_cu12-12.1.105.dist-info  requests-2.31.0.dist-info

Although torch-2.1.0+cu118 is already available in the base python environment and supports segment-anything-py==1.0, torch-2.1.2+cu121 is installed, which may break compatibility with the GPU driver on the host. This is expected behavior since when installing pip packages to a target directory, the requested packages and all their latest dependencies will get installed. Further, since the target directory is added to PYTHONPATH, torch-2.1.2+cu121 masks torch-2.1.0+cu118.

On the other hand, when using a virtual environment with access to system site packages to install custom dependencies, we see the following:

# original torch packages
root@b323f9ef66ce:/# pip freeze | grep torch
sagemaker-pytorch-inference==2.0.21
torch @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=93960a77d4b72fb6d32036912d8193a3a159b1b38f342c4e2b5ac82d279eff5a
torch-model-archiver @ file:///serve/model-archiver
torch-workflow-archiver @ file:///serve/workflow-archiver
torchaudio @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=56b8ca0bd6b72edb55fbd06114d94e9d3f3c4daf8d456a0dc929d072105d75ee
torchdata @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchdata-0.7.0%2B7c7597b-cp310-cp310-linux_x86_64.whl#sha256=f33afa5a8f8f6979fb7f35cd53e61da1da81a6be852985c2bf6cb4c9bb7fed94
torchserve @ file:///serve
torchtext @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchtext-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=2fe28a2da3a194e553eeda3683955f4a0821ba732fe10bcb5895c3293525807f
torchvision @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=bf8bfb5351e8d02591bf8833ca6df7621f060d4cb238d6c32511579e53507acb

# create virtual env
root@b323f9ef66ce:/# python -m venv --system-site-packages ./venvs/test
root@b323f9ef66ce:/# source ./venvs/test/bin/activate

(test) root@b323f9ef66ce:/# cat test_req.txt 
segment-anything-py==1.0
opencv-python-headless==4.7.0.68
matplotlib==3.6.3

(test) root@b323f9ef66ce:/# python -m pip install --upgrade --upgrade-strategy only-if-needed -r test_req.txt
Collecting segment-anything-py==1.0
  Downloading segment_anything_py-1.0-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.2/40.2 kB 5.1 MB/s eta 0:00:00
Collecting opencv-python-headless==4.7.0.68
  Downloading opencv_python_headless-4.7.0.68-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 41.7 MB/s eta 0:00:00
Collecting matplotlib==3.6.3
  Downloading matplotlib-3.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 103.3 MB/s eta 0:00:00
Requirement already satisfied: torch>=1.7 in /opt/conda/lib/python3.10/site-packages (from segment-anything-py==1.0->-r test_req.txt (line 1)) (2.1.0+cu118)
Requirement already satisfied: torchvision>=0.8 in /opt/conda/lib/python3.10/site-packages (from segment-anything-py==1.0->-r test_req.txt (line 1)) (0.16.0+cu118)
Requirement already satisfied: numpy>=1.17.0 in /opt/conda/lib/python3.10/site-packages (from opencv-python-headless==4.7.0.68->-r test_req.txt (line 2)) (1.24.4)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (1.4.5)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (1.2.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (4.47.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (2.8.2)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (3.1.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (0.12.1)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.10/site-packages (from matplotlib==3.6.3->-r test_req.txt (line 3)) (10.2.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib==3.6.3->-r test_req.txt (line 3)) (1.16.0)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (3.13.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (3.1.2)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (3.2.1)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (2023.12.2)
Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (4.9.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torchvision>=0.8->segment-anything-py==1.0->-r test_req.txt (line 1)) (2.31.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.8->segment-anything-py==1.0->-r test_req.txt (line 1)) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.8->segment-anything-py==1.0->-r test_req.txt (line 1)) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.8->segment-anything-py==1.0->-r test_req.txt (line 1)) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision>=0.8->segment-anything-py==1.0->-r test_req.txt (line 1)) (2023.11.17)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch>=1.7->segment-anything-py==1.0->-r test_req.txt (line 1)) (1.3.0)
Installing collected packages: opencv-python-headless, matplotlib, segment-anything-py
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.8.2
    Not uninstalling matplotlib at /opt/conda/lib/python3.10/site-packages, outside environment /venvs/test
    Can't uninstall 'matplotlib'. No files were found to uninstall.
Successfully installed matplotlib-3.6.3 opencv-python-headless-4.7.0.68 segment-anything-py-1.0

[notice] A new release of pip available: 22.3.1 -> 23.3.2
[notice] To update, run: pip install --upgrade pip

# Same torch packages are visible in venv
(test) root@b323f9ef66ce:/# pip freeze | grep torch
sagemaker-pytorch-inference==2.0.21
torch @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=93960a77d4b72fb6d32036912d8193a3a159b1b38f342c4e2b5ac82d279eff5a
torch-model-archiver @ file:///serve/model-archiver
torch-workflow-archiver @ file:///serve/workflow-archiver
torchaudio @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=56b8ca0bd6b72edb55fbd06114d94e9d3f3c4daf8d456a0dc929d072105d75ee
torchdata @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchdata-0.7.0%2B7c7597b-cp310-cp310-linux_x86_64.whl#sha256=f33afa5a8f8f6979fb7f35cd53e61da1da81a6be852985c2bf6cb4c9bb7fed94
torchserve @ file:///serve
torchtext @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchtext-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=2fe28a2da3a194e553eeda3683955f4a0821ba732fe10bcb5895c3293525807f
torchvision @ https://framework-binaries.s3.us-west-2.amazonaws.com/pytorch/v2.1.0/cuda11.8.0/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=bf8bfb5351e8d02591bf8833ca6df7621f060d4cb238d6c32511579e53507acb

We can see that the torch binary installed in the base python environment already satisfies the dependency of segment-anything-py==1.0 on torch and is not reinstalled:

Requirement already satisfied: torch>=1.7 in /opt/conda/lib/python3.10/site-packages (from segment-anything-py==1.0->-r test_req.txt (line 1)) (2.1.0+cu118)

User Experience

  • Existing model archives with custom requirements.txt should not be affected and dependencies will be installed in the specific target model directory(same as the existing behavior). Note: the logging has been improved to show what packages were downloaded and installed, this is shown below.
  • To enable an existing model to use virtual environment, the only change required will be is to set useVenv: true in model-config.yaml. Logs for virtual environment creation and dependency installation is shown below.

With useVenv disabled in model-config.yaml:

2024-02-07T18:31:36,984 [INFO ] main org.pytorch.serve.wlm.ModelManager - Installed custom pip packages for model mnist:
Collecting matplotlib (from -r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached matplotlib-3.7.4-cp38-cp38-macosx_10_12_x86_64.whl.metadata (5.7 kB)
Collecting contourpy>=1.0.1 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached contourpy-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl.metadata (5.9 kB)
Collecting cycler>=0.10 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached fonttools-4.48.1-cp38-cp38-macosx_10_9_x86_64.whl.metadata (158 kB)
Collecting kiwisolver>=1.0.1 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached kiwisolver-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl.metadata (6.4 kB)
Collecting numpy<2,>=1.20 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached numpy-1.24.4-cp38-cp38-macosx_10_9_x86_64.whl.metadata (5.6 kB)
Collecting packaging>=20.0 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pillow>=6.2.0 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached pillow-10.2.0-cp38-cp38-macosx_10_10_x86_64.whl.metadata (9.7 kB)
Collecting pyparsing>=2.3.1 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached pyparsing-3.1.1-py3-none-any.whl.metadata (5.1 kB)
Collecting python-dateutil>=2.7 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting importlib-resources>=3.2.0 (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached importlib_resources-6.1.1-py3-none-any.whl.metadata (4.1 kB)
Collecting zipp>=3.1.0 (from importlib-resources>=3.2.0->matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached zipp-3.17.0-py3-none-any.whl.metadata (3.7 kB)
Collecting six>=1.5 (from python-dateutil>=2.7->matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8bf761e997c948b7b4b3d2e8a3ed840c/requirements.txt (line 1))
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Using cached matplotlib-3.7.4-cp38-cp38-macosx_10_12_x86_64.whl (7.4 MB)
Using cached contourpy-1.1.1-cp38-cp38-macosx_10_9_x86_64.whl (247 kB)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Using cached fonttools-4.48.1-cp38-cp38-macosx_10_9_x86_64.whl (2.3 MB)
Using cached importlib_resources-6.1.1-py3-none-any.whl (33 kB)
Using cached kiwisolver-1.4.5-cp38-cp38-macosx_10_9_x86_64.whl (68 kB)
Using cached numpy-1.24.4-cp38-cp38-macosx_10_9_x86_64.whl (19.8 MB)
Using cached packaging-23.2-py3-none-any.whl (53 kB)
Using cached pillow-10.2.0-cp38-cp38-macosx_10_10_x86_64.whl (3.5 MB)
Using cached pyparsing-3.1.1-py3-none-any.whl (103 kB)
Using cached zipp-3.17.0-py3-none-any.whl (7.4 kB)
Installing collected packages: zipp, six, pyparsing, pillow, packaging, numpy, kiwisolver, fonttools, cycler, python-dateutil, importlib-resources, contourpy, matplotlib
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sagemaker 2.143.0 requires protobuf<4.0,>=3.1, but you have protobuf 4.25.1 which is incompatible.
sagemaker 2.143.0 requires PyYAML==5.4.1, but you have pyyaml 6.0 which is incompatible.
Successfully installed contourpy-1.1.1 cycler-0.12.1 fonttools-4.48.1 importlib-resources-6.1.1 kiwisolver-1.4.5 matplotlib-3.7.4 numpy-1.24.4 packaging-23.2 pillow-10.2.0 pyparsing-3.1.1 python-dateutil-2.8.2 six-1.16.0 zipp-3.17.0

All custom packages and their dependencies are installed to the target directory.

With useVenv enabled in model-config.yaml:

2024-02-07T18:28:55,027 [INFO ] main org.pytorch.serve.wlm.ModelManager - Created virtual environment for model mnist: /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/venv
2024-02-07T18:28:57,880 [INFO ] main org.pytorch.serve.wlm.ModelManager - Installed custom pip packages for model mnist:
Requirement already satisfied: matplotlib in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from -r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (3.7.4)
Requirement already satisfied: cycler>=0.10 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (0.11.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (1.1.1)
Requirement already satisfied: numpy<2,>=1.20 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (1.24.3)
Requirement already satisfied: packaging>=20.0 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (23.2)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (3.0.9)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (1.4.4)
Requirement already satisfied: pillow>=6.2.0 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (10.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (4.34.4)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (5.12.0)
Requirement already satisfied: zipp>=3.1.0 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (3.8.1)
Requirement already satisfied: six>=1.5 in /Users/namannan/.pyenv/versions/3.8.13/lib/python3.8/site-packages (from python-dateutil>=2.7->matplotlib->-r /var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/requirements.txt (line 1)) (1.16.0)
WARNING: You are using pip version 22.0.4; however, version 24.0 is available.
You should consider upgrading via the '/var/folders/l4/hfy8rtpx0755sys3m8m8c48w0000gs/T/models/8d0d560e458a4dd0bfb4763588a2416b/venv/bin/python -m pip install --upgrade pip' command.

Packages already available in the base python environment are not downloaded and reinstalled.

Type of change

  • New feature
  • This change requires a documentation update

Feature/Issue validation/testing

@namannandan namannandan changed the title Refactor custom package installation using virtual environment Refactor model custom package installation to virtual environment Jan 29, 2024
@namannandan namannandan changed the title Refactor model custom package installation to virtual environment Refactor model custom package installation to use virtual environment Jan 29, 2024
@namannandan namannandan marked this pull request as ready for review January 29, 2024 18:44
@namannandan namannandan changed the title Refactor model custom package installation to use virtual environment Refactor model custom dependency installation to use virtual environment Jan 29, 2024
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on the python command you're constructing, I did not review the new path manipulation utils

List<String> commandParts = new ArrayList<>();
commandParts.add(EnvironmentUtils.getPythonRunTime(model));
commandParts.add("-m");
commandParts.add("venv");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you make this name customizable? Part of the appeal of this PR is different workers should be able to have different virtual environments

Copy link
Collaborator Author

@namannandan namannandan Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this PR creates a virtual environment on a per model basis at model load time. All workers for a given model use the same virtual environment. This replaces installing dependencies on a per model basis in a target directory and is backwards compatible with the existing behavior with no change to customer experience. Although the same name venv is used, they are located within the individual model directories, for ex: /tmp/models/test-model/venv.
Would it be useful to extend this implementation to support separate venv per worker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the isolation is fine as is I think

commandParts.add(EnvironmentUtils.getPythonRunTime(model));
commandParts.add("-m");
commandParts.add("venv");
commandParts.add("--clear");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why clear? Seems beneficial to allow users to install their dependencies beforehand

It was always quite weird how we pip installed a bunch of stuff on launching a model

Copy link
Collaborator Author

@namannandan namannandan Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will check if venvs can safely be made portable, since the symlinks in the bin directory of the venv, for ex: venv/bin/python may need to be updated for them to work, since the python binary path may not be the same on the host on which the venv is created and the host on which the venv is used.

From the official docs: https://docs.python.org/3/library/venv.html
Warning: Because scripts installed in environments should not expect the environment to be activated, their shebang lines contain the absolute paths to their environment’s interpreters. Because of this, environments are inherently non-portable, in the general case. You should always have a simple means of recreating an environment (for example, if you have a requirements file requirements.txt, you can invoke pip install -r requirements.txt using the environment’s pip to install all of the packages needed by the environment).

commandParts.add("-m");
commandParts.add("venv");
commandParts.add("--clear");
commandParts.add("--system-site-packages");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why system site?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that the venv can see packages that are already installed in the base python environment and not have to install them again. If a newer version of an existing package is required or a non existing package is required, they will get installed to the venv site-packages and will take precedence over system-site-packages. This does not affect the base python environment. I've added more details with an example in the PR description: #2910 (comment)

commandParts.add("-t");
commandParts.add(dependencyPath.getAbsolutePath());
commandParts.add("--upgrade-strategy");
commandParts.add("only-if-needed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change but do you mind just telling me why you needed to make it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prior versions of pip (i.e pip<10.0) when pip install -U -r requirements.txt is used, all the packages listed in requirements.txt and their dependencies are upgraded since by default the --upgrade-strategy was eager. --upgrade-strategy applies to the handling of dependencies of the packages specified in requirements.txt. In pip>=10.0, the default --upgrade-stragegy is only-if-needed. This change is to explicitly make the --upgrade-strategy as only-if-needed irrespective of pip version.

From the pip docs: https://pip.pypa.io/en/stable/user_guide/#only-if-needed-recursive-upgrade

Copy link
Collaborator

@lxning lxning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is expensive to create python venv for each model and potentially increases the model loading latency. Do you know what the root cause that the existing pip install dependency installs entire pytorch again? Can we have a lightweight solution?

@agunapal
Copy link
Collaborator

@namannandan I like the overall idea. A couple of questions

  1. Is this breaking the previous behavior? Meaning, we should still support the existing behavior
  2. Does this support a single venv for all models if the customer wants this.

@namannandan
Copy link
Collaborator Author

it is expensive to create python venv for each model and potentially increases the model loading latency. Do you know what the root cause that the existing pip install dependency installs entire pytorch again? Can we have a lightweight solution?

Agreed, creation of virtual environment adds latency to the model load.
I will measure the latency overhead and add more details on this PR.

Root cause of existing pip install dependency installs entire pytorch again is as follows:
The command used to install dependencies is: python -m pip install -U -t <target-dir> -r requirements.txt.

This command will install all the packages listed in requirements.txt and all their dependencies to the target directory.

For ex: if we have say, segment-anything-py==1.0 in requirements.txt, segment-anything-py==1.0 requires torch>=1.7. Even if torch-2.1.0 is already installed, pip will ignore it and go ahead and install the latest supported version torch-2.1.2 in the target directory.

This is expected behavior of pip when using the -t flag since it will only check the target-dir for packages and not site-packages. Reference: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-t

Can we have a lightweight solution?
One potential option is to use the --no-deps flag(Reference: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-deps) when installing dependencies as follows:
python -m pip install -U --no-deps -t <target-dir> -r requirements.txt
This way, only the packages in requirements.txt will be installed and none of their dependencies will be installed.

Pros:

  • Avoids re-installation of dependencies

Cons:

  • Could break backwards compatibility since this will expect requirements.txt to specify packages and all their dependencies to be specified.

@lxning, @msaroufim, @agunapal what are your thoughts on the above approach?

@namannandan
Copy link
Collaborator Author

@namannandan I like the overall idea. A couple of questions

1. Is this breaking the previous behavior? Meaning, we should still support the existing behavior

2. Does this support a single venv for all models if the customer wants this.
  1. This does not break existing behavior and is backwards compatible with model archives that have already been created.
  2. Currently no, a separate venv is created per model that has a requirements.txt file associated with it. This is done in a manner that does not change the existing customer experience of including a requirements.txt file along with a model archive.

@namannandan
Copy link
Collaborator Author

Latency and storage analysis

Instance type: g5.2xlarge
OS: Ubuntu 20.04.6 LTS
Python: 3.10.9

Creation of empty virtual environment:

$ time python -m venv --clear --system-site-packages ./test

real    0m2.606s
user    0m2.407s
sys     0m0.180s

$ du -sh test
22M     test

Using the lama model as example from here: https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/torchserve/mme-gpu

Installing custom dependencies to target directory (Torchserve v0.9.0)

$ time curl -X POST "http://127.0.0.1:8081/models?url=lama"
{
  "status": "Model \"lama\" Version: 1.0 registered with 1 initial workers"
}

real    2m28.987s
user    0m0.011s
sys     0m0.000s

$ du -sh /tmp/models/c3f1def4c8bb4cd8bddd2023dffcaa95/lama/
402M    /tmp/models/c3f1def4c8bb4cd8bddd2023dffcaa95/lama/

$ du -sh /tmp/models/c3f1def4c8bb4cd8bddd2023dffcaa95/
6.6G    /tmp/models/c3f1def4c8bb4cd8bddd2023dffcaa95/

Installing custom dependencies using a virtual environment (using the implementation in this PR)

$ time curl -X POST "http://127.0.0.1:8081/models?url=lama"
{
  "status": "Model \"lama\" Version: 1.0 registered with 1 initial workers"
}

real    1m2.334s
user    0m0.000s
sys     0m0.009s

$ du -sh /tmp/models/9cbcb78b2f07441688d9056a2950e314/lama/
402M    /tmp/models/9cbcb78b2f07441688d9056a2950e314/lama/

$ du -sh /tmp/models/9cbcb78b2f07441688d9056a2950e314/
2.1G    /tmp/models/9cbcb78b2f07441688d9056a2950e314/

Summary

  1. Creating a virtual environment adds latency overhead of around 2.6 seconds and consumes 25M space on disk.
  2. Although the virtual environment adds a latency overhead, in a practical use case of loading a model with custom dependencies, it can be faster since it enables reusing existing packages and not have to download and reinstall them. In the above example 1m2.334s with virtual environment as compared to 2m28.987s when installing dependencies to a target directory.
  3. Although the virtual environment has a disk space overhead, in a practical use case of loading a model with custom dependencies, it can save space since it enables re-using existing packages and not have to replicate them. In the above example 2.1G with virtual environment as opposed to 6.6G when installing dependencies to a target directory.

@lxning
Copy link
Collaborator

lxning commented Jan 31, 2024

@namannandan thanks for the analysis, let's add this python venv option in model level config (ie. model-config.yaml).

@msaroufim msaroufim self-requested a review February 3, 2024 01:55
@agunapal
Copy link
Collaborator

agunapal commented Feb 6, 2024

Hi @namannandan Can you please update the PR description with how the user experience is going to be with/without venv and show any relevant logs which indicates that its working as expected and there is no BC issue

@namannandan
Copy link
Collaborator Author

@namannandan thanks for the analysis, let's add this python venv option in model level config (ie. model-config.yaml).

Updated implementation to support useVenv as a model level config.

@namannandan
Copy link
Collaborator Author

namannandan commented Feb 8, 2024

Hi @namannandan Can you please update the PR description with how the user experience is going to be with/without venv and show any relevant logs which indicates that its working as expected and there is no BC issue

Included a user experience section in the summary with logs: #2910 (comment)

Also included details about useVenv in Readme: https://github.com/pytorch/serve/pull/2910/files#diff-439e668fd67373383bd1cc408a01f95b5d5e9ac65eb7f9e756bc04ab23e8f257R177

@namannandan namannandan changed the title Refactor model custom dependency installation to use virtual environment Enable model custom dependency installation to use virtual environment Feb 8, 2024
@namannandan namannandan changed the title Enable model custom dependency installation to use virtual environment Enable model custom dependency installation using virtual environment Feb 8, 2024
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@namannandan namannandan dismissed msaroufim’s stale review February 9, 2024 20:06

Review comments have been addressed.

@namannandan namannandan added this pull request to the merge queue Feb 9, 2024
Merged via the queue into master with commit ba8f96a Feb 9, 2024
15 checks passed
@namannandan namannandan deleted the fix-requirements-upgrade branch February 14, 2024 08:47
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants