Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Vamana indexing #2

Open
unroll opened this issue Nov 16, 2024 · 6 comments
Open

Parallelize Vamana indexing #2

unroll opened this issue Nov 16, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@unroll
Copy link
Owner

unroll commented Nov 16, 2024

Parallelize Vamana indexing over multiple cores.

Superficially, it should be "easy". Index simply means iterating over all point p and (1) doing greedy search, and (2) robust prune: update p's out-neighbours using the nodes visited during search, and create corresponding back-edges to p.

In reality, this is more challenging, for several reasons:

  • Python's poor support for multithreading due to the GIL.
    • Robust prune updates not simply p, but also its new neighbors to create backlinks. Moreover, when creating a backlink on some neighbour of p, we may need to robust prune the neighbor too if it now more than R out edges. (luckily this is not recursive! The neighbour's own neighbours already have a back-link to the neighbour).
  • The algorithm is iterative in design: after we update p_{i}, the greedy search for p_{i+1} should see whatever updates we did to p_{i} and its new neighbours. Changing that may hurt the goal of faithfully recreating Vamana as described.

Must be careful with tweakes, but something should be done, as indexing is not fast enough currently.

@unroll unroll added the enhancement New feature or request label Nov 16, 2024
@unroll
Copy link
Owner Author

unroll commented Nov 16, 2024

How does DiskANN deal with it?

First, it updates p's out edges under a lock. Second. at the end of robust prune (in Index::inter_insert), when creating back-edge to p from some new neighbor n, it is careful about such updates and uses locks and delayed pruning:

  1. Lock n
  2. Read list size
  3. If there is space: add the back-edge to n's list.
  4. Otherwise: make a copy of n's list, add p to it, and schedule for pruning.
  5. Unlock n
  6. If list copy scheduled for pruning:
    1. Prune copied list (without updating any back-edges, and without checking if pruned list size is larger)
    2. Lock(b)
    3. Set n's edge list to pruned list
    4. Unlock(b)

This is actually not a complete solution!

  1. The copied list size is allowed to be larger than R during construction. Edge lists of all nodes are later pruned in a final cleanup phase.. This is different from Vamana as described in the paper (and takes more memory).
  2. It can result in loss of added edges and wasted effort if two nodes a and b want to add a back edge to the same new neighbor n. Both might copy the same list under lock (i.e. neither a sees edge from n to b nor b sees edge from n to a). They do their own pruning. When updating, either b's updating win and overwrites the update with the edge from n to a, or a's update wins.
  3. Such a shared-memory approach won't work for Python.

@unroll
Copy link
Owner Author

unroll commented Nov 16, 2024

Delaying pruning of back-edge nodes can be a good solution for annindex. Need to be careful not to let them grow too far though.

@unroll
Copy link
Owner Author

unroll commented Nov 16, 2024

Another alternative is to do that part of building the index in one core, perhaps using minibatches whose size depend on the number of cores:

  1. Compute num_cores nodes in parallel jobs, Each job returns the new out-edges for the node.
  2. Update the out-edges for the num_core nodes, and create backedges.
  3. Ron jobs to robust prune any of new neighbours if needed.
  4. Update out-edges of neighbors.

@unroll unroll changed the title Parallelize indexing Parallelize Vamana indexing Nov 16, 2024
@unroll
Copy link
Owner Author

unroll commented Nov 18, 2024

Another option: build and merge, similar to what DiskANN does for large collections that don't fit in memory.
Divide vectors to cores (with duplicates to make sure final graph is connected), build a partial indexes on each core, and merge.

@unroll unroll closed this as completed in 477873d Nov 22, 2024
@unroll unroll reopened this Nov 22, 2024
@unroll
Copy link
Owner Author

unroll commented Nov 22, 2024

incorrect tag in comment :(

@unroll
Copy link
Owner Author

unroll commented Jan 2, 2025

Another alternative is to do that part of building the index in one core, perhaps using minibatches whose size depend on the number of cores:

  1. Compute num_cores nodes in parallel jobs, Each job returns the new out-edges for the node.
  2. Update the out-edges for the num_core nodes, and create backedges.
  3. Ron jobs to robust prune any of new neighbours if needed.
  4. Update out-edges of neighbors.

See also the ParlayANN paper at https://dl.acm.org/doi/abs/10.1145/3627535.3638475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant