Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MST optimization to improve the connectivity of CAGRA graphs. #2218

Closed
wants to merge 10 commits into from

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Mar 7, 2024

Close #2217

Improve connectivity of CAGRA graphs using approximate degree constrained MST (Minimum Spanning Tree).

@anaruse anaruse requested a review from a team as a code owner March 7, 2024 11:07
Copy link

copy-pr-bot bot commented Mar 7, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Mar 7, 2024
@anaruse anaruse marked this pull request as draft March 7, 2024 11:07
@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 10, 2024
@cjnolet cjnolet changed the base branch from branch-24.04 to branch-24.06 April 10, 2024 15:31
@cjnolet
Copy link
Member

cjnolet commented Apr 10, 2024

/ok to test

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Akira for this PR! Looks good overall, but it would be good to add some more comments and simplify at a few places, so that the code will be easier to maintain. I have some questions and suggestion along these lines.

@@ -224,9 +224,10 @@ template <typename IdxT = uint32_t,
host_device_accessor<std::experimental::default_accessor<IdxT>, memory_type::host>>
void optimize(raft::resources const& res,
mdspan<IdxT, matrix_extent<int64_t>, row_major, g_accessor> knn_graph,
raft::host_matrix_view<IdxT, int64_t, row_major> new_graph)
raft::host_matrix_view<IdxT, int64_t, row_major> new_graph,
const bool use_MST = false)
Copy link
Contributor

@tfeher tfeher Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enable by default, is it costly to run MST, or can it lead to any issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running an initial mst on a knn graph for the purposes of finding connected components should be really fast.

I also wonder if we should be exposing the fact that we are running an MST to guarantee connectivity- that's an implementation detail AFAIC. I suggest we instead name this argument something along the lines of "guarantee_connectivity".

@@ -90,12 +90,12 @@ void search_main_core(
CagraSampleFilterT sample_filter = CagraSampleFilterT())
{
RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RAFT_LOG_DEBUG("# dataset size = %lu, dim = %lu\n",
RAFT_LOG_DEBUG("# graph size = %lu, dim = %lu\n",

@@ -316,12 +571,468 @@ void sort_knn_graph(raft::resources const& res,
RAFT_LOG_DEBUG("# Sorting kNN graph time: %.1lf sec\n", time_sort_end - time_sort_start);
}

template <typename IdxT = uint32_t>
void mst_optimization(raft::resources const& res,
Copy link
Contributor

@tfeher tfeher Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a short docstring that describes what this optimization step does, and what approximations we have. Something along the lines:

  • We have a kNN graph as input
  • We have an approximate MST as output. Approximate, because the number of outgoing edges are limited.
  • Additionally the construction method is an approximation (seems similar to Kruskal's algorithm, but we solve in parallel which does not guarantee a global minimum).
  • If the input kNN graph is disconnected then random connection is added to the largest cluster.

The last point is confusing: why do we need an MST, if we just add random edges to the main cluster? (After reading further down how the MST is used, it becomes clearer: if we want to keep a set of edges that guarantees connectivity, then it is better to keep the MST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note- we have an mst implementation in RAFT already but it does not preserve the fixed degree constraint. MST is inherently coupled to neighborhood methods and we are planning to move the existing MST over to cuVS. It would be nice if we could also expose a public API for this MST implementation, since it's generally useful for cases like this where we want to guarantee (or increase) connectedness.

We do this trick in our single-linkage clustering algorithm also, with the only caveat that we need to guarantee the smallest edges are added by the MST because the single-linkage has an exact solution.

#pragma omp parallel for
for (uint64_t i = 0; i < graph_size; i++) {
for (uint64_t k = 0; k < mst_graph_degree; k++) {
mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an array access operator that makes this simpler

Suggested change
mst_graph_ptr[(mst_graph_degree * i) + k] = graph_size;
mst_graph(i, k) = graph_size;

Comment on lines +848 to +863
uint32_t flag_update = 1;
while (flag_update) {
flag_update = 0;
if (use_gpu) {
stats_ptr[0] = flag_update;
raft::copy(d_stats_ptr, stats_ptr, 1, resource::get_cuda_stream(res));

constexpr uint64_t n_threads = 256;
const dim3 threads(n_threads, 1, 1);
const dim3 blocks((graph_size + n_threads - 1) / n_threads, 1, 1);
kern_mst_opt_labeling<<<blocks, threads, 0, resource::get_cuda_stream(res)>>>(
d_label_ptr, d_mst_graph_ptr, graph_size, mst_graph_degree, d_stats_ptr);

raft::copy(stats_ptr, d_stats_ptr, 1, resource::get_cuda_stream(res));
resource::sync_stream(res);
flag_update = stats_ptr[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to the existing weak_cc function

* @brief Compute weakly connected components. Note that the resulting labels


for (uint64_t k = 0; k < output_graph_degree; k++) {
#pragma omp parallel for
for (uint64_t i = 0; i < graph_size; i++) {
dest_nodes.data_handle()[i] = output_graph_ptr[k + (output_graph_degree * i)];
dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introducing extra pointers, one could use the array access operator (once compiler is finished, it will boil down to the same pointer arithmetic)

Suggested change
dest_nodes_ptr[i] = pruned_graph_ptr[k + (output_graph_degree * i)];
dest_nodes(i) = pruned_graph(i, k);

Comment on lines +1276 to +1310
auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i);
auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i);
auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
uint32_t kf = 0;
uint32_t kr = 0;
uint32_t k = mst_graph_num_edges_ptr[i];
while (kf < output_graph_degree || kr < output_graph_degree) {
if (kf < output_graph_degree) {
if (my_fwd_graph[kf] < graph_size) {
auto flag_match = false;
for (uint32_t kk = 0; kk < k; kk++) {
if (my_out_graph[kk] == my_fwd_graph[kf]) {
flag_match = true;
break;
}
}
if (!flag_match) {
my_out_graph[k] = my_fwd_graph[kf];
k += 1;
}
}
kf += 1;
if (k >= output_graph_degree) break;
}
if (kr < output_graph_degree) {
if (my_rev_graph[kr] < graph_size) {
auto flag_match = false;
for (uint32_t kk = 0; kk < k; kk++) {
if (my_out_graph[kk] == my_rev_graph[kr]) {
flag_match = true;
break;
}
}
if (!flag_match) {
my_out_graph[k] = my_rev_graph[kr];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more concise with mdspan array access operator, and a helper for finding duplicate edges:

Suggested change
auto my_fwd_graph = pruned_graph_ptr + (output_graph_degree * i);
auto my_rev_graph = rev_graph_ptr + (output_graph_degree * i);
auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
uint32_t kf = 0;
uint32_t kr = 0;
uint32_t k = mst_graph_num_edges_ptr[i];
while (kf < output_graph_degree || kr < output_graph_degree) {
if (kf < output_graph_degree) {
if (my_fwd_graph[kf] < graph_size) {
auto flag_match = false;
for (uint32_t kk = 0; kk < k; kk++) {
if (my_out_graph[kk] == my_fwd_graph[kf]) {
flag_match = true;
break;
}
}
if (!flag_match) {
my_out_graph[k] = my_fwd_graph[kf];
k += 1;
}
}
kf += 1;
if (k >= output_graph_degree) break;
}
if (kr < output_graph_degree) {
if (my_rev_graph[kr] < graph_size) {
auto flag_match = false;
for (uint32_t kk = 0; kk < k; kk++) {
if (my_out_graph[kk] == my_rev_graph[kr]) {
flag_match = true;
break;
}
}
if (!flag_match) {
my_out_graph[k] = my_rev_graph[kr];
/*
* template <typename IdxT>
* RAFT_INLINE_FUNCTION bool check_duplicate_edge(IdxT* neighbors_ptr, uint32_t k, IdxT node_id)
* {
* for (uint32_t i = 0; i < k; i++)
* if (neighbors_ptr[i] == node_id) return true;
* return false;
* }
*/
uint32_t kf = 0;
uint32_t kr = 0;
uint32_t k = mst_graph_num_edges_ptr[i];
while (kf < output_graph_degree || kr < output_graph_degree) {
if (kf < output_graph_degree) {
if (pruned_graph(i, kf) < graph_size) {
if (!check_duplicate_edges(&output_graph(i,0), k, pruned_graph(i, kf)) {
output_graph(i, k) = pruned_graph(i, kf);
k += 1;
}
}
kf += 1;
if (k >= output_graph_degree) break;
}
if (kr < output_graph_degree) {
if (rev_graph(i, kr) < graph_size) {
if (!check_duplicate_edges(&output_graph(i,0), k, rev_graph(i, kr) {
my_out_graph[k] = my_rev_graph[kr];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that we leave the MST in output_graph, and append edges from the pruned and reverse graph? Can we add this to the PR description, and possibly as a comment here as well.

auto my_out_graph = output_graph_ptr + (output_graph_degree * i);
uint32_t kf = 0;
uint32_t kr = 0;
uint32_t k = mst_graph_num_edges_ptr[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below keeps k edges from the MST, and fills the rest from the forward and reverse graph. Typically how large is k compared to output_graph_degree?

@cjnolet
Copy link
Member

cjnolet commented May 17, 2024

@anaruse now that CAGRA has been migrated to cuVS, this PR will also have to be moved to cuVS when ready (no rush, obviously)

@cjnolet cjnolet added the 5 - DO NOT MERGE Hold off on merging; see PR for details label May 17, 2024
@cjnolet
Copy link
Member

cjnolet commented May 21, 2024

@tfeher @anaruse this is definiely a welcome change but just a heads up that since we've moved CAGRA over to cuVS, we'll need to migrate these changes over as well.

@cjnolet
Copy link
Member

cjnolet commented Jul 24, 2024

Cleaining up stale PRs. Please feel free to reopen if you would prefer this isn't closed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Improve connectivity of CAGRA graph index for search
3 participants