Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update multimodal doc #2785

Merged
merged 8 commits into from
Jan 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,11 +188,11 @@ Explore the API

Learn how to generate images with Xinference.

.. grid-item-card:: Vision
:link: vision
.. grid-item-card:: Multimodal
:link: multimodal
:link-type: ref

Learn how to process image with LLMs.
Learn how to process images and audio with LLMs.


.. grid:: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-01-21 14:27+0800\n"
"POT-Creation-Date: 2025-01-26 11:51+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand Down Expand Up @@ -144,8 +144,8 @@ msgstr ""
#: ../../source/getting_started/installation.rst:47
msgid ""
"``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-"
"instruct``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, "
"``deepseek-v2.5``"
"instruct``, ``deepseek-r1-distill-qwen``, ``deepseek-v2-chat``, "
"``deepseek-v2-chat-0628``, ``deepseek-v2.5``"
msgstr ""

#: ../../source/getting_started/installation.rst:48
Expand Down Expand Up @@ -296,25 +296,3 @@ msgstr "其他平台"
msgid ":ref:`Ascend NPU <installation_npu>`"
msgstr ""

#~ msgid ""
#~ "``llama-2``, ``llama-3``, ``llama-3.1``, "
#~ "``llama-2-chat``, ``llama-3-instruct``, "
#~ "``llama-3.1-instruct``"
#~ msgstr ""

#~ msgid "``baichuan``, ``baichuan-chat``, ``baichuan-2-chat``"
#~ msgstr ""

#~ msgid ""
#~ "``internlm-16k``, ``internlm-chat-7b``, "
#~ "``internlm-chat-8k``, ``internlm-chat-20b``"
#~ msgstr ""

#~ msgid ""
#~ "``deepseek``, ``deepseek-coder``, ``deepseek-"
#~ "chat``, ``deepseek-coder-instruct``"
#~ msgstr ""

#~ msgid "``vicuna-v1.3``, ``vicuna-v1.5``"
#~ msgstr ""

14 changes: 9 additions & 5 deletions doc/source/locale/zh_CN/LC_MESSAGES/index.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-01-22 14:24+0800\n"
"POT-Creation-Date: 2025-01-26 11:51+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand Down Expand Up @@ -120,12 +120,12 @@ msgid "Learn how to generate images with Xinference."
msgstr "学习如何使用Xinference生成图像。"

#: ../../source/index.rst:191
msgid "Vision"
msgstr "视觉"
msgid "Multimodal"
msgstr "多模态"

#: ../../source/index.rst:195
msgid "Learn how to process image with LLMs."
msgstr "学习如何使用 LLM 处理图像。"
msgid "Learn how to process images and audio with LLMs."
msgstr "学习如何使用 LLM 处理图像和音频。"

#: ../../source/index.rst:200
msgid "Audio"
Expand Down Expand Up @@ -182,3 +182,7 @@ msgstr "贡献"
#: ../../source/index.rst:276
msgid ":fab:`github` Create a pull request"
msgstr ":fab:`github` 在 Github 上提 PR"

#~ msgid "Vision"
#~ msgstr "视觉"

10 changes: 5 additions & 5 deletions doc/source/locale/zh_CN/LC_MESSAGES/models/index.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-08-13 17:44+0800\n"
"POT-Creation-Date: 2025-01-26 11:51+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand Down Expand Up @@ -184,12 +184,12 @@ msgid "Learn how to generate images with Xinference."
msgstr "学习如何使用Xinference生成图像。"

#: ../../source/models/index.rst:202
msgid "Vision"
msgstr "视觉"
msgid "Multimodal"
msgstr "多模态"

#: ../../source/models/index.rst:206
msgid "Learn how to process image with LLMs."
msgstr "学习如何使用 LLM 处理图像。"
msgid "Learn how to process images and audio with LLMs."
msgstr "学习如何使用 LLM 处理图像和音频。"

#: ../../source/models/index.rst:211
msgid "Audio"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-12-12 06:41+0000\n"
"POT-Creation-Date: 2025-01-22 14:14+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand All @@ -17,7 +17,7 @@ msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.16.0\n"
"Generated-By: Babel 2.14.0\n"

#: ../../source/models/model_abilities/audio.rst:5
msgid "Audio (Experimental)"
Expand Down Expand Up @@ -152,7 +152,7 @@ msgid "CosyVoice"
msgstr ""

#: ../../source/models/model_abilities/audio.rst:66
msgid "FishSpeech-1.4"
msgid "FishSpeech-1.5"
msgstr ""

#: ../../source/models/model_abilities/audio.rst:67
Expand Down Expand Up @@ -288,15 +288,15 @@ msgstr ""

#: ../../source/models/model_abilities/audio.rst:391
msgid ""
"Clone voice, launch model ``FishSpeech-1.4``. Please use `prompt_speech` "
"Clone voice, launch model ``FishSpeech-1.5``. Please use `prompt_speech` "
"instead of `reference_audio` and `prompt_text` instead of "
"`reference_text` to clone voice from the reference audio for the "
"FishSpeech model. This arguments is aligned to voice cloning of "
"CosyVoice."
msgstr ""
"克隆语音,启动模型 ``FishSpeech-1.4``。请使用 `prompt_speech`而不是 `"
"reference_audio` 以及 `prompt_text` 而不是 `reference_text` "
"来为 FishSpeech 模型提供参考音频。这个参数和 CosyVoice 的语音克隆保持一致。"
"克隆语音,启动模型 ``FishSpeech-1.5``。请使用 `prompt_speech`而不是 `"
"reference_audio` 以及 `prompt_text` 而不是 `reference_text` 来为 "
"FishSpeech 模型提供参考音频。这个参数和 CosyVoice 的语音克隆保持一致。"

#: ../../source/models/model_abilities/audio.rst:417
msgid "SenseVoiceSmall Offline usage"
Expand Down Expand Up @@ -334,4 +334,3 @@ msgstr ""
"然后当用 Web UI 加载 SenseVoiceSmall 时,添加额外选项,key 是 ``vad_model"
"``,值是之前的下载路径 ``/path/to/fsmn-vad``。用命令行加载时,增加选项 ``"
"--vad_model /path/to/fsmn-vad``。"

61 changes: 25 additions & 36 deletions doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-12-26 18:49+0800\n"
"POT-Creation-Date: 2025-01-21 14:27+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand Down Expand Up @@ -116,6 +116,7 @@ msgstr ""

#: ../../source/models/model_abilities/image.rst:46
#: ../../source/models/model_abilities/image.rst:155
#: ../../source/models/model_abilities/image.rst:184
msgid "sd3.5-large-turbo"
msgstr ""

Expand Down Expand Up @@ -311,7 +312,18 @@ msgstr "支持 GGUF 量化格式"
msgid "F16, Q2_K, Q3_K_S, Q4_0, Q4_1, Q4_K_S, Q5_0, Q5_1, Q5_K_S, Q6_K, Q8_0"
msgstr ""

#: ../../source/models/model_abilities/image.rst:187
#: ../../source/models/model_abilities/image.rst:180
msgid ""
"F16, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, "
"Q5_K_S, Q6_K, Q8_0"
msgstr ""

#: ../../source/models/model_abilities/image.rst:182
#: ../../source/models/model_abilities/image.rst:184
msgid "F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0"
msgstr ""

#: ../../source/models/model_abilities/image.rst:189
msgid ""
"We stronly recommend to enable additional option ``cpu_offload`` with "
"value ``True`` for WebUI, or specify ``--cpu_offload True`` for command "
Expand All @@ -320,18 +332,17 @@ msgstr ""
"我们强烈推荐在 WebUI 上开启额外选项 ``cpu_offload`` 并指定为 ``True``,或"
"对命令行,指定 ``--cpu_offload True``。"

#: ../../source/models/model_abilities/image.rst:190
#: ../../source/models/model_abilities/image.rst:192
msgid "Example:"
msgstr "例如:"

#: ../../source/models/model_abilities/image.rst:196
#: ../../source/models/model_abilities/image.rst:198
msgid ""
"With ``Q2_K`` quantization, you only need around 5 GiB GPU memory to run "
"Flux.1-dev."
msgstr ""
"使用 ``Q2_K`` 量化,你只需要大约 5GB 的显存来运行 Flux.1-dev。"
msgstr "使用 ``Q2_K`` 量化,你只需要大约 5GB 的显存来运行 Flux.1-dev。"

#: ../../source/models/model_abilities/image.rst:198
#: ../../source/models/model_abilities/image.rst:200
msgid ""
"For those models gguf options are not supported internally, or you want "
"to download gguf files on you own, you can specify additional option "
Expand All @@ -342,53 +353,31 @@ msgstr ""
"Web UI 指定额外选项 ``gguf_model_path`` 或者用命令行指定 ``--gguf_model_"
"path /path/to/model_quant.gguf`` 。"

#: ../../source/models/model_abilities/image.rst:204
#: ../../source/models/model_abilities/image.rst:206
msgid "Image-to-image"
msgstr "图生图"

#: ../../source/models/model_abilities/image.rst:206
#: ../../source/models/model_abilities/image.rst:208
msgid "You can find more examples of Images API in the tutorial notebook:"
msgstr "你可以在教程笔记本中找到更多 Images API 的示例。"

#: ../../source/models/model_abilities/image.rst:210
#: ../../source/models/model_abilities/image.rst:212
msgid "Stable Diffusion ControlNet"
msgstr ""

#: ../../source/models/model_abilities/image.rst:213
#: ../../source/models/model_abilities/image.rst:215
msgid "Learn from a Stable Diffusion ControlNet example"
msgstr "学习一个 Stable Diffusion 控制网络的示例"

#: ../../source/models/model_abilities/image.rst:216
#: ../../source/models/model_abilities/image.rst:218
msgid "OCR"
msgstr ""

#: ../../source/models/model_abilities/image.rst:218
#: ../../source/models/model_abilities/image.rst:220
msgid "The OCR API accepts image bytes and returns the OCR text."
msgstr "OCR API 接受图像字节并返回 OCR 文本。"

#: ../../source/models/model_abilities/image.rst:220
#: ../../source/models/model_abilities/image.rst:222
msgid "We can try OCR API out either via cURL, or Xinference's python client:"
msgstr "可以通过 cURL 或 Xinference 的 Python 客户端来尝试 OCR API。"

#~ msgid ""
#~ "If you are trying to run large "
#~ "image models liek sd3-medium or FLUX.1"
#~ " series on GPU card that has "
#~ "less memory than 24GB, you may "
#~ "encounter OOM when launching or "
#~ "inference. Try below solutions."
#~ msgstr ""
#~ "如果你试图在显存小于24GB的GPU上运行像"
#~ "sd3-medium或FLUX.1系列这样的大型图像模型"
#~ ",你在启动或推理过程中可能会遇到显存"
#~ "溢出(OOM)的问题。尝试以下解决方案。"

#~ msgid "For FLUX.1 series, try to apply quantization."
#~ msgstr "对于 FLUX.1 系列,尝试应用量化。"

#~ msgid "For sd3-medium, apply quantization to ``text_encoder_3``."
#~ msgstr "对于 sd3-medium 模型,对 ``text_encoder_3`` 应用量化。"

#~ msgid "Or removing memory-intensive T5-XXL text encoder for sd3-medium."
#~ msgstr "或者,移除 sd3-medium 模型中内存密集型的 T5-XXL 文本编码器。"

Loading
Loading