Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge - Logs indicate ingestion of files when there is no changes made to them. #525

Open
sangee2004 opened this issue Sep 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@sangee2004
Copy link

sangee2004 commented Sep 18, 2024

Desktop build - 576ef7a6fd

Following are some of workflows were I see logs indicate ingestion of files when there is no changes made to them.
Steps to reproduce the problem:

  1. Create an assistant with 1 knowledge file from Notion.
  2. While still in edit assistant page, add another knowledge file from Notion.
    Logs will show ingestion for 2 files (existing one and the new one)
2024-09-18T23:42:10.075Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
2024/09/18 16:42:09 INFO Pruned files count=0 basePath=./data
2024/09/18 16:42:09 INFO Ingested document filename="Thirsty Crow.md" count=3 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Thirsty Crow.md"
2024/09/18 16:42:10 INFO Ingested document filename="Monkey and Crocodile.md" count=6 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Monkey and Crocodile.md"
  1. Quit the assistant edit mode and Enter the edit mode of an assistant with 2 knowledge files.
    Logs will show ingestion for 2 files . The 2 files being ingested are not listed here which may mean we are not actually ingesting the files in this case?
2024-09-18T23:45:52.419Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
2024/09/18 16:45:52 INFO Pruned files count=0 basePath=./data
  1. In cases were the knowledge files is from notion, When I do "Sync files" , i see ingestion of all existing knowledge files in the assistant even when there were no changes made to these files
2024-09-18T23:48:13.321Z [server] [INFO] logs for ingesting dataset 979:  Ingested 2 files from "./data" into dataset "979"
2024/09/18 16:48:12 INFO Pruned files count=0 basePath=./data
2024/09/18 16:48:13 INFO Ingested document filename="Thirsty Crow.md" count=3 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Thirsty Crow.md"
2024/09/18 16:48:13 INFO Ingested document filename="Monkey and Crocodile.md" count=6 absolute_path="/Users/sangeethahariharan/Library/Application Support/acorn/Acorn/workspace/knowledge/script_data/979/data/notion/Monkey and Crocodile.md"

Note

  1. When testing with local knowledge files - The logs relating to Ingested document filename seen in step 2 is not seen .
  2. When testing with knowledge files from onedrive - The logs relating to Ingested document filename seen in step 2 and step4 are not seen .

This issue seems to be only specific to Notion, if we can assume that in step3 there is no ingestion of files happening.

@sangee2004 sangee2004 added the bug Something isn't working label Sep 18, 2024
@sangee2004
Copy link
Author

sangee2004 commented Sep 19, 2024

Testing with latest build of desktop which uses v0.4.14-rc.11 of knowledge.

We see the following messages show up when entering the edit mode of an assistant with 2 knowledge files.

2024-09-19T20:50:23.168Z [server] [INFO] logs for ingesting dataset 969:  Ingested 2 files from "./data" into dataset "969"
2024/09/19 13:50:23 INFO Pruned files count=0 basePath=./data
2024/09/19 13:50:23 INFO Ignoring duplicate document flow=ingestion rootPath=./data filepath="data/notion/Monkey and Crocodile.md" phase=store filename="Monkey and Crocodile.md" filetype=.md status=skipped reason=duplicate
2024/09/19 13:50:23 INFO Ignoring duplicate document flow=ingestion rootPath=./data filepath="data/notion/Thirsty Crow.md" phase=store filename="Thirsty Crow.md" filetype=.md status=skipped reason=duplicate

So in case of Step 3 , I can confirm that ingestion is not happening.

Ingestion is still happening for Step 2 and Step 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant