Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse free space from popped items #111

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

equals215
Copy link
Member

@equals215 equals215 commented Aug 4, 2024

The goal of this PR is to drastically slow down the growth of the queue by reusing disk space from popped items.
This freeSpace index will use a lock-free size-specific slot array aka LSSA for common item sizes (to be determined via existing indexes analysis) and a stratified list for uncommon free space sizes. Also thinking of a defragmentation algorithm ̶a̶n̶d̶ ̶a̶ ̶w̶a̶y̶ ̶t̶o̶ ̶s̶t̶o̶r̶e̶ ̶f̶r̶e̶e̶S̶p̶a̶c̶e̶ ̶i̶n̶d̶e̶x̶ ̶o̶p̶e̶r̶a̶t̶i̶o̶n̶s̶ ̶i̶n̶t̶o̶ ̶t̶h̶e̶ ̶W̶A̶L̶.̶ ̶ <- the freeSpace index is derived from index adds and pops...

@equals215 equals215 self-assigned this Aug 4, 2024
@equals215 equals215 added enhancement New feature or request internal-only This PR/Issue is reserved for the IA team labels Aug 4, 2024
@equals215
Copy link
Member Author

@CorentinB throw out everything that comes to mind related to that matter. Any ideas, features, must-have/must-do, warnings. Everything.

@CorentinB
Copy link
Collaborator

Do we want to have an option to disable this? (in order to save some disk I/O when we know the crawl will be short and we don't care about saving some disk space while it runs)

@equals215
Copy link
Member Author

equals215 commented Aug 4, 2024

Do we want to have an option to disable this? (in order to save some disk I/O when we know the crawl will be short and we don't care about saving some disk space while it runs)

Fully in memory, gets dumped at the same time as the rest of the index and is tracked using the queue index WAL : you can derive queue index add/pop operations and make them freeSpace index operations.

So yeah we can make that optional but I mean, it's in-memory so no disk I/O related performance issues

Base automatically changed from queue to main August 4, 2024 18:36
@equals215 equals215 linked an issue Aug 4, 2024 that may be closed by this pull request
@equals215 equals215 linked an issue Aug 4, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal-only This PR/Issue is reserved for the IA team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Queue and Index should reuse free space
2 participants