You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Device and OS: MBP 16" M3 Max on MacOS Sonoma 14.5 in Docker Desktop
App/package versions: 0.10.0
Kubernetes distro being used: k3d
Steps to reproduce
Scale the Text Embeddings pod down to 0
Create a vector store via the API endpoint openai/v1/vector_stores
Upload a document via the API endpoint openai/v1/files
Submit the document for indexing via the API endpoint openai/v1/vector_stores/<store_id>/files
Observe that indexing will fail (as expected), and if you look in the Supabase dashboard, you will see that the file's entry in the vector_store_file table has a status of "failed" as expected
Scale the Text Embeddings pod back up to 1
Submit the document for indexing via the API endpoint openai/v1/vector_stores/<store_id>/files again
Observe that the request fails with a 500 Internal Server Error and that the API service logs say leapfrogai_api.backend.rag.index.FileAlreadyIndexedError: File already indexed
Expected result
Upon resubmission to the openai/v1/vector_stores/<store_id>/files endpoint, if the previous run was a failure, it should attempt running again. There is currently no way to retry a failed index run without deleting the vector store file. Not the biggest deal, but unnecessarily complicates retry logic for people integrating to our API
If we think it should still not reindex without deleting first, it should at least return a 409 conflict instead of a 500 error with no explanation. The explanation was only available in the API service logs
Actual Result
Returns 500 error with no details
Does not retry indexing
Visual Proof (screenshots, videos, text, etc)
response upon resubmission
{
"detail": "Failed to create vector store file"
}
leapfrogai_api service logs
ERROR:root:File already indexed: ecf89c04-c0b3-48aa-a0ed-638028c1029b
ERROR:root:Error indexing file
Traceback (most recent call last):
File "/Users/markvanaken/git/lar-lfai/src/leapfrogai_api/routers/openai/vector_stores.py", line 126, in create_vector_store_file
vector_store_file = await indexing_service.index_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/markvanaken/git/lar-lfai/src/leapfrogai_api/backend/rag/index.py", line 60, in index_file
raise FileAlreadyIndexedError("File already indexed")
leapfrogai_api.backend.rag.index.FileAlreadyIndexedError: File already indexed
INFO: 127.0.0.1:51945 - "POST /openai/v1/vector_stores/1cc15f24-0e2c-4d22-99bb-bba72b5213a4/files HTTP/1.1" 500 Internal Server Error
Severity/Priority
Medium
Additional Context
Dealing with non-happy-path situations like this gracefully are important for being resilient in a non-monolithic style app like this, so I think it's somewhat important that we fix this behavior, like in situations where networking temporarily fails, or if a user submits a file for indexing while we're upgrading our pods (where the embeddings pod may go down in the middle of a request)
The text was updated successfully, but these errors were encountered:
justinthelaw
changed the title
resubmitting a file for indexing that previously failed does not work
bug: resubmitting a file for indexing that previously failed does not work
Sep 4, 2024
Environment
Device and OS: MBP 16" M3 Max on MacOS Sonoma 14.5 in Docker Desktop
App/package versions: 0.10.0
Kubernetes distro being used: k3d
Steps to reproduce
openai/v1/vector_stores
openai/v1/files
openai/v1/vector_stores/<store_id>/files
vector_store_file
table has a status of "failed" as expectedopenai/v1/vector_stores/<store_id>/files
again500 Internal Server Error
and that the API service logs sayleapfrogai_api.backend.rag.index.FileAlreadyIndexedError: File already indexed
Expected result
openai/v1/vector_stores/<store_id>/files
endpoint, if the previous run was a failure, it should attempt running again. There is currently no way to retry a failed index run without deleting the vector store file. Not the biggest deal, but unnecessarily complicates retry logic for people integrating to our APIActual Result
Visual Proof (screenshots, videos, text, etc)
response upon resubmission
leapfrogai_api service logs
Severity/Priority
Medium
Additional Context
Dealing with non-happy-path situations like this gracefully are important for being resilient in a non-monolithic style app like this, so I think it's somewhat important that we fix this behavior, like in situations where networking temporarily fails, or if a user submits a file for indexing while we're upgrading our pods (where the embeddings pod may go down in the middle of a request)
The text was updated successfully, but these errors were encountered: