Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel iteration detail log bug report #11117

Closed
5 tasks done
JungHyunMoon opened this issue Nov 26, 2024 · 5 comments · Fixed by #11158
Closed
5 tasks done

parallel iteration detail log bug report #11117

JungHyunMoon opened this issue Nov 26, 2024 · 5 comments · Fixed by #11158
Assignees

Comments

@JungHyunMoon
Copy link
Contributor

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.12

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Receive multiple images from the user.
  2. Use the OCR API to extract text from each image.
  3. Use the extracted text to perform a knowledge-based search to retrieve related documents.
  4. Retrieve 5 related documents (Top-k = 5).
  5. Perform masking and relevance judgment (isRelevant) on each of the retrieved documents.
  6. In the execution log, the iteration should show 3 iterations for 3 images, and 5 iterations for the 5 retrieved documents.
  7. However, when checking the detailed tracing logs, the number of iterations is incorrectly displayed as 8 in the second iteration, where the previous OCR API call count (3) is added, resulting in an incorrect total iteration count.

✔️ Expected Behavior

The OCR API should be called once for each input image.
After searching for related documents, 5 documents should be retrieved (Top-k = 5).
The log should show 3 iterations for the 3 images, and 5 iterations for the 5 retrieved documents.

❌ Actual Behavior

In the detailed tracing logs, the second iteration incorrectly adds the OCR API call count (3), resulting in a total of 8 iterations instead of 5 for document processing.
This discrepancy does not appear in the preview logs, but it is visible in the tracing logs.
image

image

image

image

Copy link

dosubot bot commented Nov 26, 2024

The issue you're experiencing with the iteration count in the detailed tracing logs seems to be related to how iteration metadata is being handled and displayed. In Dify, iteration counts are managed through events and metadata, specifically using the IterationNode class, which generates events like IterationRunStartedEvent and IterationRunNextEvent to track iteration progress [1].

The problem might be occurring due to incorrect handling of iteration metadata, where the OCR API call count is being incorrectly added to the document processing iteration count. This could be a bug in how the iteration index is being updated or managed in parallel execution paths.

To address this, you might want to check the implementation of the IterationNode class and how it handles event metadata, particularly focusing on the _handle_event_metadata method and the VariablePool management. Ensuring that the iteration index is correctly reset or managed between different stages (OCR and document processing) could resolve the discrepancy in the iteration count.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the 🐞 bug Something isn't working label Nov 26, 2024
@JungHyunMoon
Copy link
Contributor Author

@dosu
The option for max parallelism is set to 2 for iteration before and 3 for iteration after.

Additional questions: YML files are too large and contain private information, so how do you get them to the author?

Copy link

dosubot bot commented Nov 26, 2024

The max parallelism settings you've mentioned (2 for the iteration before and 3 for the iteration after) could be affecting how iterations are processed and logged. When parallelism is enabled, iterations can run concurrently, which might lead to complexities in how iteration counts are tracked and logged, especially if there are issues with synchronization or metadata handling [1].

Regarding sharing YML files that are large and contain private information, here are some best practices:

  1. Anonymize Data: Remove or obfuscate any sensitive information before sharing the files. This can include replacing private data with placeholders or generic data.

  2. Use Secure Channels: Share the files through secure, encrypted channels to ensure that they are not intercepted or accessed by unauthorized parties.

  3. Limit Access: Only share the files with individuals who absolutely need access, and ensure that they understand the sensitivity of the information.

  4. File Encryption: Encrypt the YML files before sharing them. This adds an additional layer of security, ensuring that only those with the decryption key can access the contents.

  5. Version Control: If possible, use a version control system with access controls to manage who can view or edit the files.

These steps can help you securely share YML files while protecting private information.

To continue talking to Dosu, mention @dosu.

@laipz8200
Copy link
Member

Did any errors occur during the execution process, or are the errors only shown in the logs?

@JungHyunMoon
Copy link
Contributor Author

JungHyunMoon commented Nov 26, 2024

@laipz8200
I suspect it only happens in logs

Attached a screen shot in English for your understanding
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants