Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: resubmitting a file for indexing that previously failed does not work #926

Open
vanakema opened this issue Aug 17, 2024 · 0 comments
Open
Labels
possible-bug 🐛 Something may not be working

Comments

@vanakema
Copy link
Contributor

Environment

Device and OS: MBP 16" M3 Max on MacOS Sonoma 14.5 in Docker Desktop
App/package versions: 0.10.0
Kubernetes distro being used: k3d

Steps to reproduce

  1. Scale the Text Embeddings pod down to 0
  2. Create a vector store via the API endpoint openai/v1/vector_stores
  3. Upload a document via the API endpoint openai/v1/files
  4. Submit the document for indexing via the API endpoint openai/v1/vector_stores/<store_id>/files
  5. Observe that indexing will fail (as expected), and if you look in the Supabase dashboard, you will see that the file's entry in the vector_store_file table has a status of "failed" as expected
  6. Scale the Text Embeddings pod back up to 1
  7. Submit the document for indexing via the API endpoint openai/v1/vector_stores/<store_id>/files again
  8. Observe that the request fails with a 500 Internal Server Error and that the API service logs say leapfrogai_api.backend.rag.index.FileAlreadyIndexedError: File already indexed

Expected result

  • Upon resubmission to the openai/v1/vector_stores/<store_id>/files endpoint, if the previous run was a failure, it should attempt running again. There is currently no way to retry a failed index run without deleting the vector store file. Not the biggest deal, but unnecessarily complicates retry logic for people integrating to our API
  • If we think it should still not reindex without deleting first, it should at least return a 409 conflict instead of a 500 error with no explanation. The explanation was only available in the API service logs

Actual Result

  • Returns 500 error with no details
  • Does not retry indexing

Visual Proof (screenshots, videos, text, etc)

response upon resubmission

{
  "detail": "Failed to create vector store file"
}

leapfrogai_api service logs

ERROR:root:File already indexed: ecf89c04-c0b3-48aa-a0ed-638028c1029b
ERROR:root:Error indexing file
Traceback (most recent call last):
  File "/Users/markvanaken/git/lar-lfai/src/leapfrogai_api/routers/openai/vector_stores.py", line 126, in create_vector_store_file
    vector_store_file = await indexing_service.index_file(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/markvanaken/git/lar-lfai/src/leapfrogai_api/backend/rag/index.py", line 60, in index_file
    raise FileAlreadyIndexedError("File already indexed")
leapfrogai_api.backend.rag.index.FileAlreadyIndexedError: File already indexed
INFO:     127.0.0.1:51945 - "POST /openai/v1/vector_stores/1cc15f24-0e2c-4d22-99bb-bba72b5213a4/files HTTP/1.1" 500 Internal Server Error

Severity/Priority

Medium

Additional Context

Dealing with non-happy-path situations like this gracefully are important for being resilient in a non-monolithic style app like this, so I think it's somewhat important that we fix this behavior, like in situations where networking temporarily fails, or if a user submits a file for indexing while we're upgrading our pods (where the embeddings pod may go down in the middle of a request)

@vanakema vanakema added the possible-bug 🐛 Something may not be working label Aug 17, 2024
@justinthelaw justinthelaw changed the title resubmitting a file for indexing that previously failed does not work bug: resubmitting a file for indexing that previously failed does not work Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible-bug 🐛 Something may not be working
Projects
None yet
Development

No branches or pull requests

1 participant