Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor File Indexing for Background Asynchronous Processing #791

Merged
merged 6 commits into from
Feb 16, 2025

Conversation

ArnoChenFx
Copy link
Contributor

This PR refactors the file indexing process in LightRAG server to leverage background asynchronous processing, improving the responsiveness of the API and overall user experience.

Key Changes:

  • Asynchronous File Indexing: Introduces asynchronous tasks for file processing, allowing the API to return immediately while indexing occurs in the background.
  • Temporary File Handling: Implements a temporary file mechanism for uploaded files, ensuring proper handling and cleanup during asynchronous indexing.
  • Improved Error Handling: Enhances error handling and logging throughout the file indexing process, providing better insights into potential issues.
  • Response Handling: Updates API responses to provide immediate feedback to the user, indicating that file processing is in progress.
  • Consolidated File Parsing Logic: Identifies and merges duplicated file parsing implementations across different API endpoints into a single, reusable function.

Benefits:

  • Improved Responsiveness: The API remains responsive during file indexing, preventing delays for other requests.
  • Enhanced User Experience: Users receive immediate feedback upon uploading files, providing a smoother and more intuitive experience.
  • Robust Error Handling: Comprehensive error handling and logging provide better insights into potential issues, facilitating troubleshooting and maintenance.
  • Reduced Code Duplication: Consolidation of file parsing logic reduces code duplication, improving code maintainability and reducing the risk of inconsistencies.

Detailed Changes:

  • Replaced synchronous file indexing with asynchronous tasks using BackgroundTasks in FastAPI.
  • Introduced temporary file handling for uploaded files, saving them to a temporary location before indexing.
  • Implemented error handling and logging for file processing, capturing exceptions and providing detailed error messages.
  • Updated API responses to provide immediate feedback to the user, indicating that file processing is in progress.
  • Implemented cleanup of temporary files after successful or failed indexing.
  • Extracted common file parsing code into a dedicated function (e.g., parse_file_content) and reused it across the /documents/upload, /documents/file, and /documents/batch endpoints.
  • Removed duplicated parsing logic from individual endpoint handlers.

Additional Notes:

  • Consider testing the changes thoroughly to ensure they function as expected and do not introduce any regressions.

@YanSte
Copy link
Contributor

YanSte commented Feb 15, 2025

@ArnoChenFx by the way, thank you very much for this PR.

If you can apply, my feedback could be really awesome.

@ArnoChenFx
Copy link
Contributor Author

@ArnoChenFx by the way, thank you very much for this PR.

If you can apply, my feedback could be really awesome.

Applied already, thanks for your suggestion.

@YanSte
Copy link
Contributor

YanSte commented Feb 16, 2025

Super thanks a lot !

@LarFii LarFii merged commit 200319f into HKUDS:main Feb 16, 2025
1 check passed
@ArnoChenFx ArnoChenFx deleted the refactor-server branch February 18, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants