You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template :) and fill in all the required fields.
Dify version
0.11.0
Cloud or Self Hosted
Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
Both the documents and document_segments in the database contain a word_count field, but there are issues with both:
In segments: If it is in qa mode, the field content is used as the question and the field answer as the answer, but when counting words, only content is counted, and the word count in answer is not included, as shown in the following figure:
In documents: It is only correct at the very beginning. Once segement undergoes create, batch create, update, delete, there is no associated change in the doucument, as shown in the following figure:
In datesete: The word count is the sum of all documents, leading to the same issue:
The existing data can be corrected by referring to the following SQL, For reference only:
First fix the segment
UPDATE document_segments
SET word_count = CHAR_LENGTH ( CONTENT ) + CASE WHEN answer IS NULL THEN 0 ELSE CHAR_LENGTH ( answer ) END;
Then fix the document:
UPDATE documents
SET word_count =T.word_countFROM
( SELECTSUM ( word_count ) AS word_count, document_id FROM document_segments GROUP BY document_id ) T
WHERE
ID =T.document_id;
✔️ Expected Behavior
word count is corrent.
❌ Actual Behavior
word count is incorrent.
The text was updated successfully, but these errors were encountered:
The issue you're experiencing with the word count in documents and document_segments seems to be related to how the word count is calculated and updated. Here are some insights based on the information available:
Segment Word Count: The word count for document_segments is initially set using the length of the content field. However, in qa mode, it appears that only the content is being counted, and the answer field is not included in the word count. This discrepancy needs to be addressed in the code to ensure both content and answer are considered when calculating the word count for segments.
Document Word Count: The word count for a document is supposed to be dynamically updated whenever segments are created, updated, or deleted. The Document class has a word_count property that calculates the total word count by summing up the word counts of all segments associated with the document. If this is not happening, it might be due to missing triggers or updates in the code that handle these changes [1][2].
Dataset Word Count: The word count for the entire dataset is aggregated from individual documents. If the document word counts are incorrect, this will naturally lead to incorrect dataset word counts as well [2].
To resolve these issues, you may need to review and modify the code responsible for updating word counts in both document_segments and documents to ensure they reflect changes accurately. This might involve ensuring that both content and answer fields are included in the word count calculation for segments and verifying that document word counts are recalculated whenever segment changes occur.
@nadirvishun currently, we only count the query length and it will be improved in next version @JzoNgKVO If the word count does not reach 1K, the front end displays 0K
Self Checks
Dify version
0.11.0
Cloud or Self Hosted
Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
Both the
documents
anddocument_segments
in the database contain aword_count
field, but there are issues with both:In segments: If it is in
qa
mode, the fieldcontent
is used as the question and the fieldanswer
as the answer, but when counting words, onlycontent
is counted, and the word count inanswer
is not included, as shown in the following figure:In documents: It is only correct at the very beginning. Once segement undergoes create, batch create, update, delete, there is no associated change in the doucument, as shown in the following figure:
In datesete: The word count is the sum of all documents, leading to the same issue:
The existing data can be corrected by referring to the following SQL, For reference only:
✔️ Expected Behavior
word count is corrent.
❌ Actual Behavior
word count is incorrent.
The text was updated successfully, but these errors were encountered: