From 7d7e3b78a3c265ab3c57eeff43af56f509907998 Mon Sep 17 00:00:00 2001
From: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Date: Thu, 21 Sep 2023 18:26:47 -0700
Subject: [PATCH] Use `--ipc=host` in docker run for distributed inference
 (#1125)

---
 docs/source/getting_started/installation.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/source/getting_started/installation.rst b/docs/source/getting_started/installation.rst
index 7611bc2be228f..1105d050df69a 100644
--- a/docs/source/getting_started/installation.rst
+++ b/docs/source/getting_started/installation.rst
@@ -46,4 +46,5 @@ You can also build and install vLLM from source:
     .. code-block:: console
 
         $ # Pull the Docker image with CUDA 11.8.
-        $ docker run --gpus all -it --rm --shm-size=8g nvcr.io/nvidia/pytorch:22.12-py3
+        $ # Use `--ipc=host` to make sure the shared memory is large enough.
+        $ docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:22.12-py3