diff --git a/assets/wechat.jpg b/assets/wechat.jpg index af45cab08..6f985e8e8 100644 Binary files a/assets/wechat.jpg and b/assets/wechat.jpg differ diff --git a/docs/docs/application/started_tutorial/chat_knowledge.md b/docs/docs/application/started_tutorial/chat_knowledge.md index 5fd72eda3..4d4c57652 100644 --- a/docs/docs/application/started_tutorial/chat_knowledge.md +++ b/docs/docs/application/started_tutorial/chat_knowledge.md @@ -56,6 +56,23 @@ and click Process, it will take a few minutes to complete the document segmentat

+:::tip +**Automatic: The document is automatically segmented according to the document type.** + +**Chunk size: The number of words in each segment of the document. The default is 512 words.** + - chunk size: The number of words in each segment of the document. The default is 512 words. + - chunk overlap: The number of words overlapped between each segment of the document. The default is 50 words. +** Separator:segmentation by separator ** + - separator: The separator of the document. The default is `\n`. + - enable_merge: Whether to merge the separator chunks according to chunk_size after splits. The default is `False`. +** Page: page segmentation, only support .pdf and .pptx document.** + +** Paragraph: paragraph segmentation, only support .docx document.** + - separator: The paragraph separator of the document. The default is `\n`. + +** Markdown header: markdown header segmentation, only support .md document.** +::: + ### Waiting for document vectorization diff --git a/setup.py b/setup.py index fafd2da51..b4b931749 100644 --- a/setup.py +++ b/setup.py @@ -638,7 +638,7 @@ def init_install_requires(): setuptools.setup( name="db-gpt", packages=find_packages(exclude=("tests", "*.tests", "*.tests.*", "examples")), - version="0.4.4", + version="0.4.5", author="csunny", author_email="cfqcsunny@gmail.com", description="DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment."