Handle degenerate case where all HNSW search candidates are filtered #11787

msokolov · 2022-09-19T17:30:49Z

Description

This test failure reproduces every time. What seems to happen is that we search with a filter that retains > 50% of documents yet we hit an unlucky condition where the graph is not fully connected and every candidate node we visit gets filtered, so we end up with 0 results. It's kind of a degenerate case that is pretty unlikely to arise in a real graph, yet it seems we ought to have some kind of fallback to exact search for this case.

./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnVectorQuery.testFilterWithSameScore" -Ptests.jvms=1 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=C5E04AD69C13E006 -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1

Version and environment details

No response

msokolov · 2022-09-19T22:56:40Z

This test is really testing a pathological case ... when the vectors are all the same everything is equidistant from everything else and "nearest neighbor" ceases to really even mean anything. I'm not sure we should actually have this test other than to verify that there is no crash. Maybe I'm misunderstanding, but what it the test really asserting?

jtibshirani · 2022-09-19T23:41:30Z

Thanks for digging into this! I added this test to exercise the tie-breaking logic. But now I think it wasn't a good idea -- HNSW is known to exhibit very poor performance when vectors are duplicated. And this test takes it to an extreme! It's not really a scenario we support well.

Maybe we could just remove this test. It wasn't critical, and I could always follow-up with a better way to test tie-breaking.

msokolov · 2022-09-20T00:13:54Z

We could keep the test if we did this: #11790 which would cause fallback to a full scan in this kind of case. It seems like a reasonable fallback to me

msokolov added the type:bug label Sep 19, 2022

msokolov mentioned this issue Sep 19, 2022

Fix bugs in HNSW diversity check introduced in LUCENE-10577 #11782

Closed

jpountz mentioned this issue Sep 20, 2022

Diversity check bugfix #11781

Merged

jtibshirani mentioned this issue Sep 20, 2022

Mark HNSW search results incomplete when fewer than topK are found #11790

Closed

benwtrent added a commit to benwtrent/lucene that referenced this issue Dec 12, 2024

Re-enabling test muted in apache#11787

a70cdd3

benwtrent added a commit that referenced this issue Dec 12, 2024

Re-enabling test muted in #11787 (#14061)

1a931e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle degenerate case where all HNSW search candidates are filtered #11787

Handle degenerate case where all HNSW search candidates are filtered #11787

msokolov commented Sep 19, 2022

msokolov commented Sep 19, 2022

jtibshirani commented Sep 19, 2022

msokolov commented Sep 20, 2022

Handle degenerate case where all HNSW search candidates are filtered #11787

Handle degenerate case where all HNSW search candidates are filtered #11787

Comments

msokolov commented Sep 19, 2022

Description

Version and environment details

msokolov commented Sep 19, 2022

jtibshirani commented Sep 19, 2022

msokolov commented Sep 20, 2022