-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize interference of concurrent index requests #58
Comments
Another idea I've got is that we have one reader thread which puts data on a queue and all other threads just take data from this queue. This could provide similar benefits without adding too much complexity. |
I've written a demo program that mmaps a file and reads with 1, 2, 4 and 8 subprocesses (using the multiprocessing library) and we see a speedup indeed:
Contrast this with reading the same file with 1, 2, 4 and 8 threads:
Test platform in both cases:
But implementing this based on multiprocessing adds significant complexity which we'd like to avoid. Hence, the ticket moves to the backlog for now but stays open as a reminder. |
This ticket originates from a finding in #9. There we noted two things:
In this ticket we want to tackle these two issues. The idea revolves around mmapping the data file and using the Python multiprocessing library for indexing instead of threads. We'll implement the following steps:
multiprocessing.Pool
which will the bulk index the relevant parts of the data file.The text was updated successfully, but these errors were encountered: