From d44765eb99cf26f4ed35656a70b2980e98c4f9c0 Mon Sep 17 00:00:00 2001 From: Xuye Qin Date: Wed, 15 Jan 2025 01:30:06 +0800 Subject: [PATCH] DOC: update new models in README and doc (#2761) --- README.md | 11 ++--- README_zh_CN.md | 11 ++--- doc/source/getting_started/installation.rst | 1 + doc/source/models/builtin/llm/cogagent.rst | 31 ++++++++++++++ doc/source/models/builtin/llm/index.rst | 14 ++++++ doc/source/models/builtin/llm/marco-o1.rst | 47 +++++++++++++++++++++ doc/source/user_guide/backends.rst | 1 + 7 files changed, 106 insertions(+), 10 deletions(-) create mode 100644 doc/source/models/builtin/llm/cogagent.rst create mode 100644 doc/source/models/builtin/llm/marco-o1.rst diff --git a/README.md b/README.md index 3c2c9f1a60..f4c5d5f58f 100644 --- a/README.md +++ b/README.md @@ -47,19 +47,20 @@ potential of cutting-edge AI models. - Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929) - Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906) ### New Models +- Built-in support for [CogAgent](https://github.com/THUDM/CogAgent): [#2740](https://github.com/xorbitsai/inference/pull/2740) +- Built-in support for [HunyuanVideo](https://github.com/Tencent/HunyuanVideo): [#2721](https://github.com/xorbitsai/inference/pull/2721) +- Built-in support for [HunyuanDiT](https://github.com/Tencent/HunyuanDiT): [#2727](https://github.com/xorbitsai/inference/pull/2727) +- Built-in support for [Macro-o1](https://github.com/AIDC-AI/Marco-o1): [#2749](https://github.com/xorbitsai/inference/pull/2749) - Built-in support for [Stable Diffusion 3.5](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838): [#2706](https://github.com/xorbitsai/inference/pull/2706) - Built-in support for [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B): [#2684](https://github.com/xorbitsai/inference/pull/2684) - Built-in support for [Fish Speech V1.5](https://huggingface.co/fishaudio/fish-speech-1.5): [#2672](https://github.com/xorbitsai/inference/pull/2672) - Built-in support for [F5-TTS](https://github.com/SWivid/F5-TTS): [#2626](https://github.com/xorbitsai/inference/pull/2626) -- Built-in support for [GLM Edge](https://github.com/THUDM/GLM-Edge): [#2582](https://github.com/xorbitsai/inference/pull/2582) -- Built-in support for [QwQ-32B-Preview](https://qwenlm.github.io/blog/qwq-32b-preview/): [#2602](https://github.com/xorbitsai/inference/pull/2602) -- Built-in support for [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325) -- Built-in support for [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292) ### Integrations - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable. - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization. -- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux. - [RAGFlow](https://github.com/infiniflow/ragflow): is an open-source RAG engine based on deep document understanding. +- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Base, it is a chatbot based on Large Language Models (LLM) and Retrieval-Augmented Generation (RAG). +- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux. ## Key Features diff --git a/README_zh_CN.md b/README_zh_CN.md index 97360ec41d..2b569cd0b4 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -43,19 +43,20 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布 - 支持语音识别模型: [#929](https://github.com/xorbitsai/inference/pull/929) - 增加 Metrics 统计信息: [#906](https://github.com/xorbitsai/inference/pull/906) ### 新模型 +- 内置 [CogAgent](https://github.com/THUDM/CogAgent): [#2740](https://github.com/xorbitsai/inference/pull/2740) +- 内置 [HunyuanVideo](https://github.com/Tencent/HunyuanVideo): [#2721](https://github.com/xorbitsai/inference/pull/2721) +- 内置 [HunyuanDiT](https://github.com/Tencent/HunyuanDiT): [#2727](https://github.com/xorbitsai/inference/pull/2727) +- 内置 [Macro-o1](https://github.com/AIDC-AI/Marco-o1): [#2749](https://github.com/xorbitsai/inference/pull/2749) - 内置 [Stable Diffusion 3.5](https://huggingface.co/collections/stabilityai/stable-diffusion-35-671785cca799084f71fa2838): [#2706](https://github.com/xorbitsai/inference/pull/2706) - 内置 [CosyVoice 2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B): [#2684](https://github.com/xorbitsai/inference/pull/2684) - 内置 [Fish Speech V1.5](https://huggingface.co/fishaudio/fish-speech-1.5): [#2672](https://github.com/xorbitsai/inference/pull/2672) - 内置 [F5-TTS](https://github.com/SWivid/F5-TTS): [#2626](https://github.com/xorbitsai/inference/pull/2626) -- 内置 [GLM Edge](https://github.com/THUDM/GLM-Edge): [#2582](https://github.com/xorbitsai/inference/pull/2582) -- 内置 [QwQ-32B-Preview](https://qwenlm.github.io/blog/qwq-32b-preview/): [#2602](https://github.com/xorbitsai/inference/pull/2602) -- 内置 [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325) -- 内置 [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292) ### 集成 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。 -- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端,支持 Windows,Mac,以及 Linux。 - [RAGFlow](https://github.com/infiniflow/ragflow): 是一款基于深度文档理解构建的开源 RAG 引擎。 +- [MaxKB](https://github.com/1Panel-dev/MaxKB): MaxKB = Max Knowledge Base,是一款基于大语言模型和 RAG 的开源知识库问答系统,广泛应用于智能客服、企业内部知识库、学术研究与教育等场景。 +- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端,支持 Windows,Mac,以及 Linux。 ## 主要功能 🌟 **模型推理,轻而易举**:大语言模型,语音识别模型,多模态模型的部署流程被大大简化。一个命令即可完成模型的部署工作。 diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst index b3ac515ad0..ad973c697a 100644 --- a/doc/source/getting_started/installation.rst +++ b/doc/source/getting_started/installation.rst @@ -59,6 +59,7 @@ Currently, supported models include: - ``qwen1.5-chat``, ``qwen1.5-moe-chat`` - ``qwen2-instruct``, ``qwen2-moe-instruct`` - ``QwQ-32B-Preview`` +- ``marco-o1`` - ``gemma-it``, ``gemma-2-it`` - ``orion-chat``, ``orion-chat-rag`` - ``c4ai-command-r-v01`` diff --git a/doc/source/models/builtin/llm/cogagent.rst b/doc/source/models/builtin/llm/cogagent.rst new file mode 100644 index 0000000000..275b8d1b26 --- /dev/null +++ b/doc/source/models/builtin/llm/cogagent.rst @@ -0,0 +1,31 @@ +.. _models_llm_cogagent: + +======================================== +cogagent +======================================== + +- **Context Length:** 4096 +- **Model Name:** cogagent +- **Languages:** en, zh +- **Abilities:** chat, vision +- **Description:** The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. + +Specifications +^^^^^^^^^^^^^^ + + +Model Spec 1 (pytorch, 9 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 9 +- **Quantizations:** 4-bit, 8-bit, none +- **Engines**: Transformers +- **Model ID:** THUDM/cogagent-9b-20241220 +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name cogagent --size-in-billions 9 --model-format pytorch --quantization ${quantization} + diff --git a/doc/source/models/builtin/llm/index.rst b/doc/source/models/builtin/llm/index.rst index 541ad4c735..38165aa947 100644 --- a/doc/source/models/builtin/llm/index.rst +++ b/doc/source/models/builtin/llm/index.rst @@ -91,6 +91,11 @@ The following is a list of built-in LLM in Xinference: - 32768 - Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash + * - :ref:`cogagent ` + - chat, vision + - 4096 + - The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. + * - :ref:`cogvlm2 ` - chat, vision - 8192 @@ -266,6 +271,11 @@ The following is a list of built-in LLM in Xinference: - 131072 - The Llama 3.3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. + * - :ref:`marco-o1 ` + - chat, tools + - 32768 + - Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions + * - :ref:`minicpm-2b-dpo-bf16 ` - chat - 4096 @@ -606,6 +616,8 @@ The following is a list of built-in LLM in Xinference: codestral-v0.1 + cogagent + cogvlm2 cogvlm2-video-llama3-chat @@ -676,6 +688,8 @@ The following is a list of built-in LLM in Xinference: llama-3.3-instruct + marco-o1 + minicpm-2b-dpo-bf16 minicpm-2b-dpo-fp16 diff --git a/doc/source/models/builtin/llm/marco-o1.rst b/doc/source/models/builtin/llm/marco-o1.rst new file mode 100644 index 0000000000..a85fb43127 --- /dev/null +++ b/doc/source/models/builtin/llm/marco-o1.rst @@ -0,0 +1,47 @@ +.. _models_llm_marco-o1: + +======================================== +marco-o1 +======================================== + +- **Context Length:** 32768 +- **Model Name:** marco-o1 +- **Languages:** en, zh +- **Abilities:** chat, tools +- **Description:** Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions + +Specifications +^^^^^^^^^^^^^^ + + +Model Spec 1 (pytorch, 7 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** pytorch +- **Model Size (in billions):** 7 +- **Quantizations:** 4-bit, 8-bit, none +- **Engines**: vLLM, Transformers (vLLM only available for quantization none) +- **Model ID:** AIDC-AI/Marco-o1 +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name marco-o1 --size-in-billions 7 --model-format pytorch --quantization ${quantization} + + +Model Spec 2 (ggufv2, 7 Billion) +++++++++++++++++++++++++++++++++++++++++ + +- **Model Format:** ggufv2 +- **Model Size (in billions):** 7 +- **Quantizations:** Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0 +- **Engines**: llama.cpp +- **Model ID:** QuantFactory/Marco-o1-GGUF +- **Model Hubs**: `Hugging Face `__, `ModelScope `__ + +Execute the following command to launch the model, remember to replace ``${quantization}`` with your +chosen quantization method from the options listed above:: + + xinference launch --model-engine ${engine} --model-name marco-o1 --size-in-billions 7 --model-format ggufv2 --quantization ${quantization} + diff --git a/doc/source/user_guide/backends.rst b/doc/source/user_guide/backends.rst index d322aa46e7..ba610e5d89 100644 --- a/doc/source/user_guide/backends.rst +++ b/doc/source/user_guide/backends.rst @@ -66,6 +66,7 @@ Currently, supported model includes: - ``qwen1.5-chat``, ``qwen1.5-moe-chat`` - ``qwen2-instruct``, ``qwen2-moe-instruct`` - ``QwQ-32B-Preview`` +- ``marco-o1`` - ``gemma-it``, ``gemma-2-it`` - ``orion-chat``, ``orion-chat-rag`` - ``c4ai-command-r-v01``