Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.query.SearchQueryIT.testCommonTermsQuery is flaky for concurrent search as well as non-concurrent search case. #11208

Closed
kasundra07 opened this issue Nov 15, 2023 · 8 comments · Fixed by #11233
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@kasundra07
Copy link
Contributor

Describe the bug
These test cases are flaky -

  • SearchQueryIT.testCommonTermsQuery [p0={"search.concurrent_segment_search.enabled":"true"}]]
  • SearchQueryIT.testCommonTermsQuery [p0={"search.concurrent_segment_search.enabled":"false"}]]

To Reproduce

REPRODUCE WITH: ./gradlew 'null' --tests "org.opensearch.search.query.SearchQueryIT" -Dtests.method="testCommonTermsQuery [p0={"search.concurrent_segment_search.enabled":"true"}]" -Dtests.seed=D490856B1361B346 -Dtests.locale=bg -Dtests.timezone=America/Mexico_City -Druntime.java=11

java.lang.AssertionError: Count is 2 hits but 1 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:

	at __randomizedtesting.SeedInfo.seed([D490856B1361B346:EF5F4A1249FEA20A]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
	at org.opensearch.search.query.SearchQueryIT.testCommonTermsQuery(SearchQueryIT.java:401)
	at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:829)

REPRODUCE WITH: ./gradlew 'null' --tests "org.opensearch.search.query.SearchQueryIT" -Dtests.method="testCommonTermsQuery [p0={"search.concurrent_segment_search.enabled":"false"}]" -Dtests.seed=D490856B1361B346 -Dtests.locale=bg -Dtests.timezone=America/Mexico_City -Druntime.java=11

java.lang.AssertionError: Count is 2 hits but 1 was expected.  Total shards: 1 Successful shards: 1 & 0 shard failures:

	at __randomizedtesting.SeedInfo.seed([D490856B1361B346:EF5F4A1249FEA20A]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.opensearch.test.hamcrest.OpenSearchAssertions.assertHitCount(OpenSearchAssertions.java:303)
	at org.opensearch.search.query.SearchQueryIT.testCommonTermsQuery(SearchQueryIT.java:401)
	at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:829)

Expected behavior
The test must always pass for both the cases.

Additional context
https://build.ci.opensearch.org/job/gradle-check/29721/

@andrross
Copy link
Member

Pretty sure I've tracked this down to commit bc74731. I can't reproduce the error on the commit before that, but after bc74731 I can reproduce within 10 or 20 attempts. It is still reproducible after the latest commit on main. It seems to fail for both concurrent and non-concurrent cases. @neetikasinghal can you take a look?

@jed326
Copy link
Collaborator

jed326 commented Nov 15, 2023

Was able to confirm on the test seed above D490856B1361B346.
Tests pass when I comment out the following:

if (dummyDocuments) {
indexRandomForMultipleSlices(indicesArray);
}

However with those changes the test is flaky.

This is true for both concurrent search enabled and disabled cases.

@jed326
Copy link
Collaborator

jed326 commented Nov 15, 2023

Query:

{
  "common" : {
    "field1" : {
      "query" : "the huge fox",
      "high_freq_operator" : "OR",
      "low_freq_operator" : "OR",
      "cutoff_frequency" : 0.01,
      "minimum_should_match" : {
        "low_freq" : "2"
      },
      "boost" : 1.0
    }
  }
}

Flaky Result:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.3157668,
        "hits": [{
            "_index": "test",
            "_id": "2",
            "_score": 1.3157668,
            "_source": {
                "field1": "the quick lazy huge brown fox jumps over the tree"
            }
        }, {
            "_index": "test",
            "_id": "1",
            "_score": 1.1068254,
            "_source": {
                "field1": "the quick brown fox"
            }
        }]
    }
}

Correct Result:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.3157668,
        "hits": [{
            "_index": "test",
            "_id": "2",
            "_score": 1.3157668,
            "_source": {
                "field1": "the quick lazy huge brown fox jumps over the tree"
            }
        }]
    }
}

@jed326
Copy link
Collaborator

jed326 commented Nov 15, 2023

Match All query in the flaky scenario does show the expected 3 docs in the index and no bogus docs but it seems like somehow the score for doc _id 1 is high enough to get returned in the flaky case sometimes.

Additionally, I did notice that this test is using a 1/1 shard configuration. After setting the index to 0 primaries I ran the test on the same seed D490856B1361B346 500 times and saw no failures. Before we would see around 20% of the tests fail in this flaky manner.

Current hypothesis is that the replica shard is not being refreshed even when the primary shard is so search requests routed to the replica will encounter this issue. However, the _refresh API is supposed to be synchronous.

@jed326
Copy link
Collaborator

jed326 commented Nov 15, 2023

I wonder if there is an issue with _refresh not waiting on the replicas then -- I'm not sure how else to explain why removing the replicas could fix the flakiness.

@andrross @reta do you know of any issues like this related to _refresh off the top of your head?

@jed326
Copy link
Collaborator

jed326 commented Nov 16, 2023

From TRACE logs I can see that there is test flakiness even when the search request is hitting the primary. Attaching some sample logs:

testCommonTermsQuery_1.txt
testCommonTermsQuery_2.txt

Both are flaky test examples, in the first we see the search request hit the replica and in the second we see the search request hit the primary.

Quick excerpt:

[2023-11-15T19:32:50,238][TRACE][o.o.a.a.i.r.TransportShardRefreshAction] [node_s5] [test][0] refresh request executed on primary
[2023-11-15T19:32:50,318][TRACE][o.o.a.a.i.r.TransportShardRefreshAction] [node_s3] [test][0] refresh request executed on replica
[2023-11-15T19:32:50,325][DEBUG][o.o.s.q.QueryPhaseSearcherWrapper] [node_s3] Using non-concurrent aggregation processor over segments for request with context id [-ftBpaDxQ6-s9uvgqbm-1A][1]
[2023-11-15T19:32:50,325][DEBUG][o.o.s.q.QueryPhaseSearcherWrapper] [node_s3] Using non-concurrent search over segments for request with context id [-ftBpaDxQ6-s9uvgqbm-1A][1]
[2023-11-15T19:32:50,325][TRACE][o.o.s.f.FetchPhase       ] [node_s3] [test][0] source[{"timeout":"1d","query":{"match_all":{"boost":1.0}}}], id[], 

cluster uuid: 0G3YDLHTTpmWJZrGFEBIwA [committed: true]
version: 25
state uuid: MmpCNYRNRHGpyFVnnr2LBw
from_diff: false
meta data version: 20
   coordination_metadata:
      term: 1
      last_committed_config: VotingConfiguration{qyM6N6JnTzC20m5eT6YiNQ,VewvEbaXQMeeqnL9EVM1cw,xeiWmyoIStSZl02qvB3WsA}
      last_accepted_config: VotingConfiguration{qyM6N6JnTzC20m5eT6YiNQ,VewvEbaXQMeeqnL9EVM1cw,xeiWmyoIStSZl02qvB3WsA}
      voting tombstones: []
   [test/VYTqcztNQ8mZP5spA2-HmQ]: v[5], mv[2], sv[1], av[1]
      0: p_term [1], isa_ids [oIyDaBU5RJCXL98jM8pOHg, xRjt2CCHQuiHlKD45x631A]
metadata customs:
   index-graveyard: IndexGraveyard[[[index=[test/RxQIvuKTRjSPkdTQFWDwqA], deleteDate=2023-11-16T01:32:48.222Z]]]
nodes: 
   {node_s4}{OhN8Eeh3SuiVD7m3P0ly7Q}{_UyNQsmGSXehWfb1VP1XxA}{127.0.0.1}{127.0.0.1:61514}{dir}{shard_indexing_pressure_enabled=true}
   {node_s5}{pWGsJ26ZTSmfrCNxbu-KoQ}{ykC9RMfESaGahq9O7sFFyw}{127.0.0.1}{127.0.0.1:61513}{dir}{shard_indexing_pressure_enabled=true}
   {node_s1}{xeiWmyoIStSZl02qvB3WsA}{yI5sjc5dRxmWyVaz3t3pxA}{127.0.0.1}{127.0.0.1:61517}{imr}{shard_indexing_pressure_enabled=true}
   {node_s0}{qyM6N6JnTzC20m5eT6YiNQ}{9x3UO5FBTu6JcDHHjxzGAQ}{127.0.0.1}{127.0.0.1:61515}{imr}{shard_indexing_pressure_enabled=true}, local, cluster-manager
   {node_s2}{VewvEbaXQMeeqnL9EVM1cw}{f9oUv8jDQpux45DKb3cF2A}{127.0.0.1}{127.0.0.1:61512}{imr}{shard_indexing_pressure_enabled=true}
   {node_s3}{9SWyj4IxQoKQhSS_1-SQNQ}{YIS1ZCFbQROMwtYM1dq4xg}{127.0.0.1}{127.0.0.1:61516}{dir}{shard_indexing_pressure_enabled=true}
routing_table (version 11):
-- index [[test/VYTqcztNQ8mZP5spA2-HmQ]]
----shard_id [test][0]
--------[test][0], node[pWGsJ26ZTSmfrCNxbu-KoQ], [P], s[STARTED], a[id=oIyDaBU5RJCXL98jM8pOHg]
--------[test][0], node[9SWyj4IxQoKQhSS_1-SQNQ], [R], s[STARTED], a[id=xRjt2CCHQuiHlKD45x631A]

routing_nodes:
-----node_id[pWGsJ26ZTSmfrCNxbu-KoQ][V]
--------[test][0], node[pWGsJ26ZTSmfrCNxbu-KoQ], [P], s[STARTED], a[id=oIyDaBU5RJCXL98jM8pOHg]
-----node_id[9SWyj4IxQoKQhSS_1-SQNQ][V]
--------[test][0], node[9SWyj4IxQoKQhSS_1-SQNQ], [R], s[STARTED], a[id=xRjt2CCHQuiHlKD45x631A]
-----node_id[OhN8Eeh3SuiVD7m3P0ly7Q][V]
---- unassigned
[2023-11-15T19:32:53,886][TRACE][o.o.a.a.i.r.TransportShardRefreshAction] [node_s5] [test][0] refresh request executed on primary
[2023-11-15T19:32:53,980][TRACE][o.o.a.a.i.r.TransportShardRefreshAction] [node_s3] [test][0] refresh request executed on replica
[2023-11-15T19:32:53,988][DEBUG][o.o.s.q.QueryPhaseSearcherWrapper] [node_s5] Using non-concurrent aggregation processor over segments for request with context id [m2pJBLRpTdKCot2Vjykfzg][5]
[2023-11-15T19:32:53,988][DEBUG][o.o.s.q.QueryPhaseSearcherWrapper] [node_s5] Using non-concurrent search over segments for request with context id [m2pJBLRpTdKCot2Vjykfzg][5]
[2023-11-15T19:32:53,989][TRACE][o.o.s.f.FetchPhase       ] [node_s5] [test][0] source[{"query":{"common":{"field1":{"query":"the huge fox","high_freq_operator":"OR","low_freq_operator":"OR","cutoff_frequency":0.01,"minimum_should_match":{"low_freq":"2"},"boost":1.0}}}}], id[], 

luster uuid: 0G3YDLHTTpmWJZrGFEBIwA [committed: true]
version: 49
state uuid: xWMpjrUtT_Gyv5QeRAdHig
from_diff: false
meta data version: 42
   coordination_metadata:
      term: 1
      last_committed_config: VotingConfiguration{qyM6N6JnTzC20m5eT6YiNQ,VewvEbaXQMeeqnL9EVM1cw,xeiWmyoIStSZl02qvB3WsA}
      last_accepted_config: VotingConfiguration{qyM6N6JnTzC20m5eT6YiNQ,VewvEbaXQMeeqnL9EVM1cw,xeiWmyoIStSZl02qvB3WsA}
      voting tombstones: []
   [test/o5D6DEvlS5e2pYuTaBIphQ]: v[5], mv[2], sv[1], av[1]
      0: p_term [1], isa_ids [4eP3v5KMTreJgsW3xhmU_Q, oFE2grSERpGUPOylAATlmQ]
metadata customs:
   index-graveyard: IndexGraveyard[[[index=[test/RxQIvuKTRjSPkdTQFWDwqA], deleteDate=2023-11-16T01:32:48.222Z], [index=[test/VYTqcztNQ8mZP5spA2-HmQ], deleteDate=2023-11-16T01:32:50.437Z], [index=[test/LnMqX2UcRSew8jYZFqbOPA], deleteDate=2023-11-16T01:32:52.198Z]]]
nodes: 
   {node_s4}{OhN8Eeh3SuiVD7m3P0ly7Q}{_UyNQsmGSXehWfb1VP1XxA}{127.0.0.1}{127.0.0.1:61514}{dir}{shard_indexing_pressure_enabled=true}
   {node_s5}{pWGsJ26ZTSmfrCNxbu-KoQ}{ykC9RMfESaGahq9O7sFFyw}{127.0.0.1}{127.0.0.1:61513}{dir}{shard_indexing_pressure_enabled=true}
   {node_s1}{xeiWmyoIStSZl02qvB3WsA}{yI5sjc5dRxmWyVaz3t3pxA}{127.0.0.1}{127.0.0.1:61517}{imr}{shard_indexing_pressure_enabled=true}
   {node_s0}{qyM6N6JnTzC20m5eT6YiNQ}{9x3UO5FBTu6JcDHHjxzGAQ}{127.0.0.1}{127.0.0.1:61515}{imr}{shard_indexing_pressure_enabled=true}, local, cluster-manager
   {node_s2}{VewvEbaXQMeeqnL9EVM1cw}{f9oUv8jDQpux45DKb3cF2A}{127.0.0.1}{127.0.0.1:61512}{imr}{shard_indexing_pressure_enabled=true}
   {node_s3}{9SWyj4IxQoKQhSS_1-SQNQ}{YIS1ZCFbQROMwtYM1dq4xg}{127.0.0.1}{127.0.0.1:61516}{dir}{shard_indexing_pressure_enabled=true}
routing_table (version 25):
-- index [[test/o5D6DEvlS5e2pYuTaBIphQ]]
----shard_id [test][0]
--------[test][0], node[pWGsJ26ZTSmfrCNxbu-KoQ], [P], s[STARTED], a[id=oFE2grSERpGUPOylAATlmQ]
--------[test][0], node[9SWyj4IxQoKQhSS_1-SQNQ], [R], s[STARTED], a[id=4eP3v5KMTreJgsW3xhmU_Q]

routing_nodes:
-----node_id[pWGsJ26ZTSmfrCNxbu-KoQ][V]
--------[test][0], node[pWGsJ26ZTSmfrCNxbu-KoQ], [P], s[STARTED], a[id=oFE2grSERpGUPOylAATlmQ]
-----node_id[9SWyj4IxQoKQhSS_1-SQNQ][V]
--------[test][0], node[9SWyj4IxQoKQhSS_1-SQNQ], [R], s[STARTED], a[id=4eP3v5KMTreJgsW3xhmU_Q]
-----node_id[OhN8Eeh3SuiVD7m3P0ly7Q][V]
---- unassigned

@jed326
Copy link
Collaborator

jed326 commented Nov 16, 2023

It seems like there's an underlying problem with this query type itself instead of anything related to the new bogus docs indexing.
I added a match all query with both primary and replica preference before the test query here and even in the flaky test cases the match all query is showing no bogus docs in either replica or primary.

@andrross
Copy link
Member

andrross commented Nov 16, 2023

Thanks @jed326. Today @msfroh and I spent some time on this. The addition of extra deleted docs does indeed change the behavior of one of the queries in a really non-obvious way that was difficult to track down. I'll post a PR tomorrow that should fix the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants