modelscope · DavdGao · Apr 9, 2024 · Mar 20, 2024 · Mar 21, 2024 · Mar 21, 2024
diff --git a/README.md b/README.md
@@ -64,20 +64,21 @@ applications in a centralized programming manner for streamlined development.
 AgentScope provides a list of `ModelWrapper` to support both local model
 services and third-party model APIs.
 
-| API                    | Task            | Model Wrapper                                                                                                                   |
-|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|
-| OpenAI API             | Chat            | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 |
-|                        | Embedding       | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)            |
-|                        | DALL·E          | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                |
-| DashScope API          | Chat            | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)           |
-|                        | Image Synthesis | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
-|                        | Text Embedding  | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)  |
-| Gemini API             | Chat            | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)                 |
-|                        | Embedding       | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)            |
-| ollama                 | Chat            | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                 |
-|                        | Embedding       | [`OllamaEmbedding`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                   |
-|                        | Generation      | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)           |
-| Post Request based API | -               | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py)                 |
+| API                    | Task                    | Model Wrapper                                                |
+| ---------------------- |-------------------------| ------------------------------------------------------------ |
+| OpenAI API             | Chat                    | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+|                        | Embedding               | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+|                        | DALL·E                  | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+| DashScope API          | Chat                    | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Image Synthesis         | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Text Embedding          | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Multimodal Conversation | [`DashScopeMultiModalWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+| Gemini API             | Chat                    | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) |
+|                        | Embedding               | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) |
+| ollama                 | Chat                    | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+|                        | Embedding               | [`OllamaEmbedding`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+|                        | Generation              | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+| Post Request based API | -                       | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py) |
 
 **Supported Local Model Deployment**
 

diff --git a/README_ZH.md b/README_ZH.md
@@ -53,20 +53,21 @@ AgentScope是一个创新的多智能体开发平台，旨在赋予开发人员
 
 AgentScope提供了一系列`ModelWrapper`来支持本地模型服务和第三方模型API。
 
-| API                    | Task            | Model Wrapper                                                                                                                   |
-|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|
-| OpenAI API             | Chat            | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 |
-|                        | Embedding       | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)            |
-|                        | DALL·E          | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                |
-| DashScope API          | Chat            | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)           |
-|                        | Image Synthesis | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
-|                        | Text Embedding  | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)  |
-| Gemini API             | Chat            | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)                 |
-|                        | Embedding       | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)            |
-| ollama                 | Chat            | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                 |
-|                        | Embedding       | [`OllamaEmbedding`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                   |
-|                        | Generation      | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)           |
-| Post Request based API | -               | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py)                 |
+| API                    | Task                    | Model Wrapper                                                |
+| ---------------------- |-------------------------| ------------------------------------------------------------ |
+| OpenAI API             | Chat                    | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+|                        | Embedding               | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+|                        | DALL·E                  | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |
+| DashScope API          | Chat                    | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Image Synthesis         | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Text Embedding          | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+|                        | Multimodal Conversation | [`DashScopeMultiModalWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) |
+| Gemini API             | Chat                    | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) |
+|                        | Embedding               | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) |
+| ollama                 | Chat                    | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+|                        | Embedding               | [`OllamaEmbedding`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+|                        | Generation              | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) |
+| Post Request based API | -                       | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py) |
 
 **支持的本地模型部署**
 

diff --git a/src/agentscope/models/__init__.py b/src/agentscope/models/__init__.py
@@ -21,6 +21,7 @@
     DashScopeChatWrapper,
     DashScopeImageSynthesisWrapper,
     DashScopeTextEmbeddingWrapper,
+    DashScopeMultiModalWrapper,
 )
 from .ollama_model import (
     OllamaChatWrapper,
@@ -48,6 +49,7 @@
     "DashScopeChatWrapper",
     "DashScopeImageSynthesisWrapper",
     "DashScopeTextEmbeddingWrapper",
+    "DashScopeMultiModalWrapper",
     "OllamaChatWrapper",
     "OllamaEmbeddingWrapper",
     "OllamaGenerationWrapper",

diff --git a/src/agentscope/models/dashscope_model.py b/src/agentscope/models/dashscope_model.py
@@ -153,7 +153,7 @@ def __call__(
             )
 
         # TODO: move is to prompt engineering
-        messages = self._preprocess_role(messages)
+        messages = _preprocess_role(messages)
         # step3: forward to generate response
         response = dashscope.Generation.call(
             model=self.model_name,
@@ -197,34 +197,6 @@ def __call__(
             raw=response,
         )
 
-    def _preprocess_role(self, messages: list) -> list:
-        """preprocess role rules for DashScope"""
-        # The models in this list require that the roles of messages must
-        # alternate between "user" and "assistant".
-        message_length = len(messages)
-        if message_length % 2 == 1:
-            # If the length of the message list is odd, roles will
-            # alternate, starting with "user"
-            roles = [
-                "user" if i % 2 == 0 else "assistant"
-                for i in range(message_length)
-            ]
-        else:
-            # If the length of the message list is even, the first role
-            # will be "system", followed by alternating "user" and
-            # "assistant"
-            roles = ["system"] + [
-                "user" if i % 2 == 1 else "assistant"
-                for i in range(1, message_length)
-            ]
-
-        # Assign the roles list to the "role" key for each message in
-        # the messages list
-        for message, role in zip(messages, roles):
-            message["role"] = role
-
-        return messages
-
 
 class DashScopeImageSynthesisWrapper(DashScopeWrapperBase):
     """The model wrapper for DashScope Image Synthesis API."""
@@ -426,3 +398,164 @@ def __call__(
                 ],
                 raw=response,
             )
+
+
+class DashScopeMultiModalWrapper(DashScopeWrapperBase):
+    """The model wrapper for DashScope Text Embedding API."""
+
+    model_type: str = "dashscope_multimodal"
+
+    def _register_default_metrics(self) -> None:
+        # Set monitor accordingly
+        # TODO: set quota to the following metrics
+        self.monitor.register(
+            self._metric("call_counter"),
+            metric_unit="times",
+        )
+        self.monitor.register(
+            self._metric("prompt_tokens"),
+            metric_unit="token",
+        )
+        self.monitor.register(
+            self._metric("completion_tokens"),
+            metric_unit="token",
+        )
+        self.monitor.register(
+            self._metric("total_tokens"),
+            metric_unit="token",
+        )
+
+    def __call__(
+        self,
+        messages: list,
+        **kwargs: Any,
+    ) -> ModelResponse:
+        """Embed the messages with DashScope MultiModal API.
+
+        Args:
+            messages (`list`):
+                A list of messages to process.
+            **kwargs (`Any`):
+                The keyword arguments to DashScope MultiModal API,
+                e.g. `stream`. Please refer to
+                https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api
+                for more detailed arguments.
+
+        Returns:
+            `ModelResponse`:
+                The response text in text field, and the raw response in
+                raw field.
+
+        Note:
+            If involving image links, then the messages should be of the
+            following form:
+                messages = [
+                    {
+                        "role": "system",
+                        "content": [
+                            {"text": "You are a helpful assistant."},
+                        ],
+                    },
+                    {
+                        "role": "user",
+                        "content": [
+                            {"text": "What does this picture depict？"},
+                            {"image": "http://example.com/image.jpg"},
+                        ],
+                    },
+                ]
+            Therefore, you should input a list matching the content value
+            above.
+            If only involving words, just input them.
+
+            `parse_func`, `fault_handler` and `max_retries` are reserved
+            for `_response_parse_decorator` to parse and check the response
+            generated by model wrapper. Their usages are listed as follows:
+                - `parse_func` is a callable function used to parse and
+                check the response generated by the model, which takes the
+                response as input.
+                - `max_retries` is the maximum number of retries when the
+                `parse_func` raise an exception.
+                - `fault_handler` is a callable function which is called
+                when the response generated by the model is invalid after
+                `max_retries` retries.
+        """
+        # step1: prepare keyword arguments
+        kwargs = {**self.generate_args, **kwargs}
+
+        for message in messages:
+            if not isinstance(message["content"], list):
+                message["content"] = [{"text": message["content"]}]
+        messages = _preprocess_role(messages)
+
+        # step2: forward to generate response
+        response = dashscope.MultiModalConversation.call(
+            model=self.model_name,
+            messages=messages,
+            **kwargs,
+        )
+
+        if response.status_code != HTTPStatus.OK:
+            error_msg = (
+                f" Request id: {response.request_id},"
+                f" Status code: {response.status_code},"
+                f" error code: {response.code},"
+                f" error message: {response.message}."
+            )
+            raise RuntimeError(error_msg)
+
+        # step3: record the model api invocation if needed
+        self._save_model_invocation(
+            arguments={
+                "model": self.model_name,
+                "messages": messages,
+                **kwargs,
+            },
+            response=response,
+        )
+
+        # step4: update monitor accordingly
+        self.update_monitor(
+            call_counter=1,
+            prompt_tokens=response.usage["input_tokens"],
+            completion_tokens=response.usage["output_tokens"],
+            total_tokens=response.usage["input_tokens"]
+            + response.usage["output_tokens"],
+        )
+
+        # step5: return response
+        return ModelResponse(
+            text=response.output["choices"][0]["message"]["content"][0][
+                "text"
+            ],
+            raw=response,
+        )
+
+
+def _preprocess_role(messages: list) -> list:
+    """preprocess role rules for DashScope"""
+    # The models in this list require that the roles of messages must
+    # alternate between "user" and "assistant".
+    message_length = len(messages)
+    if message_length % 2 == 1:
+        # If the length of the message list is odd, roles will
+        # alternate, starting with "user"
+        roles = [
+            "user" if i % 2 == 0 else "assistant"
+            for i in range(message_length)
+        ]
+    else:
+        # If the length of the message list is even, the first role
+        # will be "system", followed by alternating "user" and
+        # "assistant"
+        roles = ["system"] + [
+            "user" if i % 2 == 1 else "assistant"
+            for i in range(1, message_length)
+        ]
+
+    # Assign the roles list to the "role" key for each message in
+    # the messages list
+    for message, role in zip(messages, roles):
+        message["role"] = role
+
+    return messages