Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

知识库召回内容score偏移的太夸张 #5023

Open
xiaoxin-creactor opened this issue Oct 21, 2024 · 6 comments
Open

知识库召回内容score偏移的太夸张 #5023

xiaoxin-creactor opened this issue Oct 21, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@xiaoxin-creactor
Copy link

问题描述 / Problem Description
UserWarning: Relevance scores must be between 0 and 1, got [(Document(page_content='1.飞行器一般有哪些问题?\n一、电子设备故障\n电子设备是飞行器不可或缺的组成部分,它们能够通过自动控制完成飞机的稳定和导航等功能。常见的电子设备故障包括:\n1.通讯故障:飞行器的通讯设备可能会出现故障,如天线故障、电缆连接问题等,造成通讯中断或不稳定。\n2.控制系统故障:如自动驾驶系统、航空电子设备、飞行控制计算机等故障,有可能导致飞机失控或无法正常操作。', metadata={'source': '(非密)飞行器通用故障.txt'}), -8.43927631966375), (Document(page_content='1.通讯故障:飞行器的通讯设备可能会出现故障,如天线故障、电缆连接问题等,造成通讯中断或不稳定。\n2.控制系统故障:如自动驾驶系统、航空电子设备、飞行控制计算机等故障,有可能导致飞机失控或无法正常操作。\n3.供电故障:电力系统故障可能导致电源中断或不稳定,造成飞机系统失灵。\n二、机械结构故障', metadata={'source': '(非密)飞行器通用故障.txt'}), -9.483452246092112), (Document(page_content='2.控制系统故障:如自动驾驶系统、航空电子设备、飞行控制计算机等故障,有可能导致飞机失控或无法正常操作。\n3.供电故障:电力系统故障可能导致电源中断或不稳定,造成飞机系统失灵。\n二、机械结构故障\n机械结构是飞机的骨架和支撑结构,如果出现故障,可能会导致飞机失去稳定性或者无法维持正常飞行。常见的机械结构故障包括:', metadata={'source': '(非密)飞行器通用故障.txt'}), -9.84954204383923)]

召回的内容得分不正常

复现问题的步骤 / Steps to Reproduce

  1. 执行 '...' / Run '...'
  2. 点击 '...' / Click '...'
  3. 滚动到 '...' / Scroll to '...'
  4. 问题出现 / Problem occurs

预期的结果 / Expected Result
应该能匹配召回

实际结果 / Actual Result
实际得分不正确无法找回
image

环境信息 / Environment Information

  • Langchain-Chatchat 版本 / commit 号:0.3.1 / Langchain-Chatchat version / commit number:e.g., 0.3.1
  • 部署方式(pypi 安装 / 源码部署 / docker 部署):dev deployment
  • 使用的模型推理框架(Xinference / Ollama / OpenAI API 等):Ollama
  • 使用的 LLM 模型(GLM-4-9B / Qwen2-7B-Instruct 等):Qwen2-7B-Instruct
  • 使用的 Embedding 模型(bge-large-zh-v1.5 / m3e-base 等):bge-large-zh-v1.5
  • 使用的向量库类型 (faiss / milvus / pg_vector 等):pg_vector
  • 操作系统及版本 / Operating system and version: windows
  • Python 版本 / Python version: 3.9
  • 推理使用的硬件(GPU / CPU / MPS / NPU 等) / Inference hardware (GPU / CPU / MPS / NPU, etc.): CPU
  • 其他相关环境信息 / Other relevant environment information:
@xiaoxin-creactor xiaoxin-creactor added the bug Something isn't working label Oct 21, 2024
@xiaoxin-creactor
Copy link
Author

用的pg向量召回的 是不是langchain版本太低造成的 可以更新langchain版本吗

@xiaoxin-creactor
Copy link
Author

已解决 将pg召回策略改为余炫相似度

@Shujie-Wu
Copy link

请问这个召回策略在哪个文件中看呀

@fjksng
Copy link

fjksng commented Nov 16, 2024

同问,召回策略在哪里修改

@onewesong
Copy link

onewesong commented Nov 28, 2024

diff --git a/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py b/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
index 2f40f13e..bdff11b5 100644
--- a/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
+++ b/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
@@ -29,7 +29,7 @@ class PGKBService(KBService):
         self.pg_vector = PGVector(
             embedding_function=get_Embeddings(self.embed_model),
             collection_name=self.kb_name,
-            distance_strategy=DistanceStrategy.EUCLIDEAN,
+            distance_strategy=DistanceStrategy.COSINE,
             connection=PGKBService.engine,
             connection_string=Settings.kb_settings.kbs_config.get("pg").get("connection_uri"),
         )

@Shujie-Wu
Copy link

diff --git a/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py b/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
index 2f40f13e..bdff11b5 100644
--- a/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
+++ b/libs/chatchat-server/chatchat/server/knowledge_base/kb_service/pg_kb_service.py
@@ -29,7 +29,7 @@ class PGKBService(KBService):
         self.pg_vector = PGVector(
             embedding_function=get_Embeddings(self.embed_model),
             collection_name=self.kb_name,
-            distance_strategy=DistanceStrategy.EUCLIDEAN,
+            distance_strategy=DistanceStrategy.COSINE,
             connection=PGKBService.engine,
             connection_string=Settings.kb_settings.kbs_config.get("pg").get("connection_uri"),
         )

谢谢您!我学习一下!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants