set use_docker to default to True (microsoft#1147)

* set use_docker to default to true * black formatting * centralize checking and add env variable option * set docker env flag for contrib tests * set docker env flag for contrib tests * better error message and cleanup * disable explicit docker tests * docker is installed so can't check for that in test * pr comments and fix test * rename and fix function descriptions * documentation * update notebooks so that they can be run with change in default * add unit tests for new code * cache and restore env var * skip on windows because docker is running in the CI but there are problems connecting the volume * update documentation * move header * update contrib tests
corleroux · Jan 30, 2024 · c7339f5 · c7339f5
1 parent 22e36cb
commit c7339f5
Show file tree

Hide file tree

Showing 36 changed files with 547 additions and 106 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -40,6 +40,12 @@ jobs:
           pip install -e .
           python -c "import autogen"
           pip install -e. pytest mock
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test with pytest
         if: matrix.python-version != '3.10'
         run: |

diff --git a/.github/workflows/contrib-tests.yml b/.github/workflows/contrib-tests.yml
@@ -45,6 +45,12 @@ jobs:
       - name: Install packages and dependencies for RetrieveChat
         run: |
           pip install -e .[retrievechat]
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test RetrieveChat
         run: |
           pytest test/test_retrieve_utils.py test/agentchat/contrib/test_retrievechat.py test/agentchat/contrib/test_qdrant_retrievechat.py --skip-openai
@@ -81,6 +87,12 @@ jobs:
       - name: Install packages and dependencies for Compression
         run: |
           pip install -e .
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test Compression
         if: matrix.python-version != '3.10' # diversify the python versions
         run: |
@@ -118,6 +130,12 @@ jobs:
       - name: Install packages and dependencies for GPTAssistantAgent
         run: |
           pip install -e .
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test GPTAssistantAgent
         if: matrix.python-version != '3.11' # diversify the python versions
         run: |
@@ -155,6 +173,12 @@ jobs:
       - name: Install packages and dependencies for Teachability
         run: |
           pip install -e .[teachable]
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test TeachableAgent
         if: matrix.python-version != '3.9' # diversify the python versions
         run: |
@@ -192,6 +216,12 @@ jobs:
       - name: Install packages and dependencies for LMM
         run: |
           pip install -e .[lmm]
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
       - name: Test LMM and LLaVA
         run: |
           pytest test/agentchat/contrib/test_img_utils.py test/agentchat/contrib/test_lmm.py test/agentchat/contrib/test_llava.py --skip-openai

diff --git a/README.md b/README.md
@@ -65,7 +65,7 @@ The easiest way to start playing is
 ## [Installation](https://microsoft.github.io/autogen/docs/Installation)
 ### Option 1. Install and Run AutoGen in Docker
 
-Find detailed instructions for users [here](https://microsoft.github.io/autogen/docs/Installation#option-1-install-and-run-autogen-in-docker), and for developers [here](https://microsoft.github.io/autogen/docs/Contribute#docker).
+Find detailed instructions for users [here](https://microsoft.github.io/autogen/docs/Installation#option-1-install-and-run-autogen-in-docker), and for developers [here](https://microsoft.github.io/autogen/docs/Contribute#docker-for-development).
 
 ### Option 2. Install AutoGen Locally
 
@@ -86,7 +86,7 @@ Find more options in [Installation](https://microsoft.github.io/autogen/docs/Ins
 
 <!-- Each of the [`notebook examples`](https://github.com/microsoft/autogen/tree/main/notebook) may require a specific option to be installed. -->
 
-Even if you are installing AutoGen locally out of docker,  we recommend performing [code execution](https://microsoft.github.io/autogen/docs/FAQ/#code-execution) in docker. Find more instructions [here](https://microsoft.github.io/autogen/docs/Installation#docker).
+Even if you are installing and running AutoGen locally outside of docker, the recommendation and default behavior of agents is to perform [code execution](https://microsoft.github.io/autogen/docs/FAQ/#code-execution) in docker. Find more instructions and how to change the default behaviour [here](https://microsoft.github.io/autogen/docs/Installation#code-execution-with-docker-(default)).
 
 For LLM inference configurations, check the [FAQs](https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints).
 
@@ -111,7 +111,7 @@ from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
 config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
 # You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]
 assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
-user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
+user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) # IMPORTANT: set to True to run code in docker, recommended
 user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")
 # This initiates an automated chat between the two agents to solve the task
 ```

diff --git a/autogen/agentchat/conversable_agent.py b/autogen/agentchat/conversable_agent.py
@@ -9,7 +9,18 @@
 from typing import Any, Awaitable, Callable, Dict, List, Literal, Optional, Tuple, Type, TypeVar, Union
 
 from .. import OpenAIWrapper
-from ..code_utils import DEFAULT_MODEL, UNKNOWN, content_str, execute_code, extract_code, infer_lang
+from ..code_utils import (
+    DEFAULT_MODEL,
+    UNKNOWN,
+    content_str,
+    check_can_use_docker_or_throw,
+    decide_use_docker,
+    execute_code,
+    extract_code,
+    infer_lang,
+)
+
+
 from ..function_utils import get_function_schema, load_basemodels_if_needed, serialize_to_str
 from .agent import Agent
 from .._pydantic import model_dump
@@ -89,11 +100,10 @@ def __init__(
                     The default working directory is the "extensions" directory under
                     "path_to_autogen".
                 - use_docker (Optional, list, str or bool): The docker image to use for code execution.
+                    Default is True, which means the code will be executed in a docker container. A default list of images will be used.
                     If a list or a str of image name(s) is provided, the code will be executed in a docker container
                     with the first image successfully pulled.
-                    If None, False or empty, the code will be executed in the current environment.
-                    Default is True when the docker python package is installed.
-                    When set to True, a default list will be used.
+                    If False, the code will be executed in the current environment.
                     We strongly recommend using docker for code execution.
                 - timeout (Optional, int): The maximum execution time in seconds.
                 - last_n_messages (Experimental, Optional, int or str): The number of messages to look back for code execution. Default to 1. If set to 'auto', it will scan backwards through all messages arriving since the agent last spoke (typically this is the last time execution was attempted).
@@ -128,6 +138,13 @@ def __init__(
         self._code_execution_config: Union[Dict, Literal[False]] = (
             {} if code_execution_config is None else code_execution_config
         )
+
+        if isinstance(self._code_execution_config, dict):
+            use_docker = self._code_execution_config.get("use_docker", None)
+            use_docker = decide_use_docker(use_docker)
+            check_can_use_docker_or_throw(use_docker)
+            self._code_execution_config["use_docker"] = use_docker
+
         self.human_input_mode = human_input_mode
         self._max_consecutive_auto_reply = (
             max_consecutive_auto_reply if max_consecutive_auto_reply is not None else self.MAX_CONSECUTIVE_AUTO_REPLY

diff --git a/autogen/agentchat/user_proxy_agent.py b/autogen/agentchat/user_proxy_agent.py
@@ -62,12 +62,11 @@ def __init__(
                     The default working directory is the "extensions" directory under
                     "path_to_autogen".
                 - use_docker (Optional, list, str or bool): The docker image to use for code execution.
+                    Default is True, which means the code will be executed in a docker container. A default list of images will be used.
                     If a list or a str of image name(s) is provided, the code will be executed in a docker container
                     with the first image successfully pulled.
-                    If None, False or empty, the code will be executed in the current environment.
-                    Default is True, which will be converted into a list.
-                    If the code is executed in the current environment,
-                    the code must be trusted.
+                    If False, the code will be executed in the current environment.
+                    We strongly recommend using docker for code execution.
                 - timeout (Optional, int): The maximum execution time in seconds.
                 - last_n_messages (Experimental, Optional, int): The number of messages to look back for code execution. Default to 1.
             default_auto_reply (str or dict or None): the default auto reply message when no code execution or llm based reply is generated.

diff --git a/autogen/code_utils.py b/autogen/code_utils.py
@@ -17,6 +17,7 @@
 except ImportError:
     docker = None
 
+SENTINEL = object()
 DEFAULT_MODEL = "gpt-4"
 FAST_MODEL = "gpt-3.5-turbo"
 # Regular expression for finding a code block
@@ -225,6 +226,70 @@ def _cmd(lang):
     raise NotImplementedError(f"{lang} not recognized in code execution")
 
 
+def is_docker_running():
+    """Check if docker is running.
+
+    Returns:
+        bool: True if docker is running; False otherwise.
+    """
+    if docker is None:
+        return False
+    try:
+        client = docker.from_env()
+        client.ping()
+        return True
+    except docker.errors.DockerException:
+        return False
+
+
+def in_docker_container():
+    """Check if the code is running in a docker container.
+
+    Returns:
+        bool: True if the code is running in a docker container; False otherwise.
+    """
+    return os.path.exists("/.dockerenv")
+
+
+def decide_use_docker(use_docker) -> bool:
+    if use_docker is None:
+        env_var_use_docker = os.environ.get("AUTOGEN_USE_DOCKER", "True")
+
+        truthy_values = {"1", "true", "yes", "t"}
+        falsy_values = {"0", "false", "no", "f"}
+
+        # Convert the value to lowercase for case-insensitive comparison
+        env_var_use_docker_lower = env_var_use_docker.lower()
+
+        # Determine the boolean value based on the environment variable
+        if env_var_use_docker_lower in truthy_values:
+            use_docker = True
+        elif env_var_use_docker_lower in falsy_values:
+            use_docker = False
+        elif env_var_use_docker_lower == "none":  # Special case for 'None' as a string
+            use_docker = None
+        else:
+            # Raise an error for any unrecognized value
+            raise ValueError(
+                f'Invalid value for AUTOGEN_USE_DOCKER: {env_var_use_docker}. Please set AUTOGEN_USE_DOCKER to "1/True/yes", "0/False/no", or "None".'
+            )
+    return use_docker
+
+
+def check_can_use_docker_or_throw(use_docker) -> None:
+    if use_docker is not None:
+        inside_docker = in_docker_container()
+        docker_installed_and_running = is_docker_running()
+        if use_docker and not inside_docker and not docker_installed_and_running:
+            raise RuntimeError(
+                "Code execution is set to be run in docker (default behaviour) but docker is not running.\n"
+                "The options available are:\n"
+                "- Make sure docker is running (advised approach for code execution)\n"
+                '- Set "use_docker": False in code_execution_config\n'
+                '- Set AUTOGEN_USE_DOCKER to "0/False/no" in your environment variables'
+            )
+
+
 def _sanitize_filename_for_docker_tag(filename: str) -> str:
     """Convert a filename to a valid docker tag.
     See https://docs.docker.com/engine/reference/commandline/tag/ for valid tag
@@ -253,7 +318,7 @@ def execute_code(
     timeout: Optional[int] = None,
     filename: Optional[str] = None,
     work_dir: Optional[str] = None,
-    use_docker: Optional[Union[List[str], str, bool]] = None,
+    use_docker: Union[List[str], str, bool] = SENTINEL,
     lang: Optional[str] = "python",
 ) -> Tuple[int, str, str]:
     """Execute code in a docker container.
@@ -273,15 +338,15 @@ def execute_code(
             If None, a default working directory will be used.
             The default working directory is the "extensions" directory under
             "path_to_autogen".
-        use_docker (Optional, list, str or bool): The docker image to use for code execution.
+        use_docker (list, str or bool): The docker image to use for code execution.
+            Default is True, which means the code will be executed in a docker container. A default list of images will be used.
             If a list or a str of image name(s) is provided, the code will be executed in a docker container
             with the first image successfully pulled.
-            If None, False or empty, the code will be executed in the current environment.
-            Default is None, which will be converted into an empty list when docker package is available.
+            If False, the code will be executed in the current environment.
             Expected behaviour:
-                - If `use_docker` is explicitly set to True and the docker package is available, the code will run in a Docker container.
-                - If `use_docker` is explicitly set to True but the Docker package is missing, an error will be raised.
-                - If `use_docker` is not set (i.e., left default to None) and the Docker package is not available, a warning will be displayed, but the code will run natively.
+                - If `use_docker` is not set (i.e. left default to True) or is explicitly set to True and the docker package is available, the code will run in a Docker container.
+                - If `use_docker` is not set (i.e. left default to True) or is explicitly set to True but the Docker package is missing or docker isn't running, an error will be raised.
+                - If `use_docker` is explicitly set to False, the code will run natively.
             If the code is executed in the current environment,
             the code must be trusted.
         lang (Optional, str): The language of the code. Default is "python".
@@ -296,23 +361,13 @@ def execute_code(
         logger.error(error_msg)
         raise AssertionError(error_msg)
 
-    if use_docker and docker is None:
-        error_msg = "Cannot use docker because the python docker package is not available."
-        logger.error(error_msg)
-        raise AssertionError(error_msg)
+    running_inside_docker = in_docker_container()
+    docker_running = is_docker_running()
 
-    # Warn if use_docker was unspecified (or None), and cannot be provided (the default).
-    # In this case the current behavior is to fall back to run natively, but this behavior
-    # is subject to change.
-    if use_docker is None:
-        if docker is None:
-            use_docker = False
-            logger.warning(
-                "execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change"
-            )
-        else:
-            # Default to true
-            use_docker = True
+    # SENTINEL is used to indicate that the user did not explicitly set the argument
+    if use_docker is SENTINEL:
+        use_docker = decide_use_docker(use_docker=None)
+    check_can_use_docker_or_throw(use_docker)
 
     timeout = timeout or DEFAULT_TIMEOUT
     original_filename = filename
@@ -324,15 +379,16 @@ def execute_code(
         filename = f"tmp_code_{code_hash}.{'py' if lang.startswith('python') else lang}"
     if work_dir is None:
         work_dir = WORKING_DIR
+
     filepath = os.path.join(work_dir, filename)
     file_dir = os.path.dirname(filepath)
     os.makedirs(file_dir, exist_ok=True)
+
     if code is not None:
         with open(filepath, "w", encoding="utf-8") as fout:
             fout.write(code)
-    # check if already running in a docker container
-    in_docker_container = os.path.exists("/.dockerenv")
-    if not use_docker or in_docker_container:
+
+    if not use_docker or running_inside_docker:
         # already running in a docker container
         cmd = [
             sys.executable if lang.startswith("python") else _cmd(lang),
@@ -376,7 +432,13 @@ def execute_code(
         return result.returncode, logs, None
 
     # create a docker client
+    if use_docker and not docker_running:
+        raise RuntimeError(
+            "Docker package is missing or docker is not running. Please make sure docker is running or set use_docker=False."
+        )
+
     client = docker.from_env()
+
     image_list = (
         ["python:3-slim", "python:3", "python:3-windowsservercore"]
         if use_docker is True

diff --git a/notebook/agentchat_auto_feedback_from_code_execution.ipynb b/notebook/agentchat_auto_feedback_from_code_execution.ipynb
@@ -348,7 +348,7 @@
     "    is_termination_msg=lambda x: x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE\"),\n",
     "    code_execution_config={\n",
     "        \"work_dir\": \"coding\",\n",
-    "        \"use_docker\": False,  # set to True or image name like \"python:3\" to use docker\n",
+    "        \"use_docker\": False,  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n",
     "    },\n",
     ")\n",
     "# the assistant receives a message from the user_proxy, which contains the task description\n",

diff --git a/notebook/agentchat_compression.ipynb b/notebook/agentchat_compression.ipynb
@@ -561,7 +561,9 @@
     "mathproxyagent = MathUserProxyAgent(\n",
     "    name=\"mathproxyagent\",\n",
     "    human_input_mode=\"NEVER\",\n",
-    "    code_execution_config={\"use_docker\": False},\n",
+    "    code_execution_config={\n",
+    "        \"use_docker\": False\n",
+    "    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n",
     "    max_consecutive_auto_reply=5,\n",
     ")\n",
     "math_problem = (\n",
@@ -835,7 +837,10 @@
     "    is_termination_msg=lambda x: x.get(\"content\", \"\") and x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE\"),\n",
     "    human_input_mode=\"NEVER\",\n",
     "    max_consecutive_auto_reply=10,\n",
-    "    code_execution_config={\"work_dir\": \"coding\"},\n",
+    "    code_execution_config={\n",
+    "        \"work_dir\": \"coding\",\n",
+    "        \"use_docker\": False,\n",
+    "    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n",
     ")\n",
     "\n",
     "\n",
@@ -1259,7 +1264,10 @@
     "    max_consecutive_auto_reply=10,\n",
     "    is_termination_msg=lambda x: x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE\")\n",
     "    or x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE.\"),\n",
-    "    code_execution_config={\"work_dir\": \"web\"},\n",
+    "    code_execution_config={\n",
+    "        \"work_dir\": \"web\",\n",
+    "        \"use_docker\": False,\n",
+    "    },  # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n",
     "    system_message=\"\"\"Reply TERMINATE if the task has been solved at full satisfaction.\n",
     "Otherwise, reply CONTINUE, or the reason why the task is not solved yet.\"\"\",\n",
     ")\n",