Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing ES Promotion: discover app discover tab field data search php should show the correct hit count #104466

Closed
mistic opened this issue Jul 6, 2021 · 19 comments · Fixed by #104571
Labels
blocker failed-es-promotion impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:medium Medium Level of Effort skipped-test v7.15.0 v8.0.0

Comments

@mistic
Copy link
Member

mistic commented Jul 6, 2021

This failure is preventing the promotion of the current Elasticsearch nightly snapshot.

For more information on the Elasticsearch snapshot promotion process including how to reproduce using the unverified nightly ES build: https://www.elastic.co/guide/en/kibana/master/development-es-snapshots.html

CI failure logs: https://kibana-ci.elastic.co/job/elasticsearch+snapshots+verify/3032/testReport/junit/Chrome%20UI%20Functional%20Tests/test_functional_apps_discover__field_data%C2%B7ts/Kibana_Pipeline___discover_app_discover_tab_field_data_search_php_should_show_the_correct_hit_count_2/

Error: retry.try timeout: Error: expected '14,004' to equal '445'
    at Assertion.assert (node_modules/@kbn/expect/expect.js:100:11)
    at Assertion.be.Assertion.equal (node_modules/@kbn/expect/expect.js:227:8)
    at Assertion.be (node_modules/@kbn/expect/expect.js:69:22)
    at /dev/shm/workspace/parallel/23/kibana/test/functional/apps/discover/_field_data.ts:43:31
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at runAttempt (test/common/services/retry/retry_for_success.ts:27:15)
    at retryForSuccess (test/common/services/retry/retry_for_success.ts:66:21)
    at RetryService.try (test/common/services/retry/retry.ts:31:12)
    at Context.<anonymous> (test/functional/apps/discover/_field_data.ts:39:9)
    at Object.apply (node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)
    at onFailure (test/common/services/retry/retry_for_success.ts:17:9)
    at retryForSuccess (test/common/services/retry/retry_for_success.ts:57:13)
    at RetryService.try (test/common/services/retry/retry.ts:31:12)
    at Context.<anonymous> (test/functional/apps/discover/_field_data.ts:39:9)
    at Object.apply (node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@tylersmalley
Copy link
Contributor

@thelink2012, I bisected the changes, and looks like this might have been caused by elastic/elasticsearch@1fc09f9

Mind taking a look?

@thelink2012
Copy link

Sure @tylersmalley, I can take a look at this by the weekend.

Is this number that's coming up wrong (14,004 neq 445) something that comes off the highlighting section of a ES query response, or is that from another section of the response?

@tylersmalley
Copy link
Contributor

tylersmalley commented Jul 7, 2021

@thelink2012, I believe it's actually the hit count - @elastic/kibana-app-services should be able to provide more insights.

Here is a screenshot of the failure - looks like it's also raising an offset out of bounds error that could be related.

@tsullivan
Copy link
Member

Here is a screenshot of the failure - looks like it's also raising an offset out of bounds error that could be related.

@tylersmalley the link is expired FYI

@tylersmalley
Copy link
Contributor

Will embed it this time:

image

@tsullivan
Copy link
Member

tsullivan commented Jul 7, 2021

I ran the test locally using the latest master of Elasticsearch:

Started server:

node scripts/functional_tests_server.js --esFrom source

Ran the test:

node scripts/functional_test_runner --grep='discover tab field data search php should show the correct hit count'

I saw the test was failing on this URL: http://localhost:5620/app/discover?_t=1625679054850#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2015-09-18T06:31:44.000Z',to:'2015-09-23T18:31:44.000Z'))&_a=(columns:!(_source),filters:!(),index:'logstash-*',interval:auto,query:(language:kuery,query:php),sort:!(!('@timestamp',desc)))

I stopped the test runner and opened that URL in a new window.

http://localhost:5620/app/discover?_t=1625679054850#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:'2015-09-18T06:31:44.000Z',to:'2015-09-23T18:31:44.000Z'))&_a=(columns:!(_source),filters:!(),index:'logstash-*',interval:auto,query:(language:kuery,query:php),sort:!(!('@timestamp',desc)))

Looks like when the test runs, there is a hit count in Discover showing 14,004 hits, because it is showing a stale number from a previous test. When you open this URL directly in a browser, there is no UI shown - just the error:

image

I'm not seeing any errors in the console, or any contextual info shown in the error toast.

I searched in the code to find where this error gets formatted, and I found:

src/plugins/data/public/search/search_interceptor/search_interceptor.ts
421:          defaultMessage: 'Cannot retrieve search results',

I added a console.log(e) on line 414 of that file, and refreshed the browser. I could see the error stringified in the console. I clicked on the originator file of that console log, and added a breakpoint, and hit Refresh again. This lets me inspect the error in the browser console:

image

Here is the text of the error:

{
  "attributes": {
    "caused_by": {
      "caused_by": {
        "caused_by": {
          "reason": "offset out of bounds",
          "type": "illegal_argument_exception"
        },
        "reason": "offset out of bounds",
        "type": "illegal_argument_exception"
      },
      "failed_shards": [
        {
          "index": "logstash-2015.09.20",
          "node": "cI5ZLHWUSneLUTf8gBuhXQ",
          "reason": {
            "reason": "offset out of bounds",
            "type": "illegal_argument_exception"
          },
          "shard": 0
        },
        {
          "index": "logstash-2015.09.21",
          "node": "cI5ZLHWUSneLUTf8gBuhXQ",
          "reason": {
            "reason": "offset out of bounds",
            "type": "illegal_argument_exception"
          },
          "shard": 0
        },
        {
          "index": "logstash-2015.09.22",
          "node": "cI5ZLHWUSneLUTf8gBuhXQ",
          "reason": {
            "reason": "offset out of bounds",
            "type": "illegal_argument_exception"
          },
          "shard": 0
        }
      ],
      "grouped": true,
      "phase": "fetch",
      "reason": "all shards failed",
      "type": "search_phase_execution_exception"
    },
    "reason": "error while executing search",
    "type": "status_exception"
  },
  "err": {
    "attributes": {
      "caused_by": {
        "caused_by": {
          "caused_by": {
            "reason": "offset out of bounds",
            "type": "illegal_argument_exception"
          },
          "reason": "offset out of bounds",
          "type": "illegal_argument_exception"
        },
        "failed_shards": [
          {
            "index": "logstash-2015.09.20",
            "node": "cI5ZLHWUSneLUTf8gBuhXQ",
            "reason": {
              "reason": "offset out of bounds",
              "type": "illegal_argument_exception"
            },
            "shard": 0
          },
          {
            "index": "logstash-2015.09.21",
            "node": "cI5ZLHWUSneLUTf8gBuhXQ",
            "reason": {
              "reason": "offset out of bounds",
              "type": "illegal_argument_exception"
            },
            "shard": 0
          },
          {
            "index": "logstash-2015.09.22",
            "node": "cI5ZLHWUSneLUTf8gBuhXQ",
            "reason": {
              "reason": "offset out of bounds",
              "type": "illegal_argument_exception"
            },
            "shard": 0
          }
        ],
        "grouped": true,
        "phase": "fetch",
        "reason": "all shards failed",
        "type": "search_phase_execution_exception"
      },
      "reason": "error while executing search",
      "type": "status_exception"
    },
    "message": "status_exception",
    "statusCode": 400
  }
}

Here is the request causing the error:

curl 'http://localhost:5620/internal/bsearch' \
  -H 'Connection: keep-alive' \
  -H 'sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"' \
  -H 'DNT: 1' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36' \
  -H 'kbn-version: 8.0.0' \
  -H 'X-Chunk-Encoding: deflate' \
  -H 'Content-Type: application/json' \
  -H 'Accept: */*' \
  -H 'Origin: http://localhost:5620' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: http://localhost:5620/app/discover?_t=1625679054850' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  --data-raw '{"batch":[{"request":{"params":{"index":"logstash-*","body":{"size":500,"sort":[{"@timestamp":{"order":"desc","unmapped_type":"boolean"}}],"version":true,"aggs":{"2":{"date_histogram":{"field":"@timestamp","fixed_interval":"3h","time_zone":"UTC","min_doc_count":1}}},"fields":[{"field":"@timestamp","format":"date_time"},{"field":"relatedContent.article:modified_time","format":"date_time"},{"field":"relatedContent.article:published_time","format":"date_time"},{"field":"utc_time","format":"date_time"}],"script_fields":{},"stored_fields":["*"],"runtime_mappings":{},"_source":{"excludes":[]},"query":{"bool":{"must":[],"filter":[{"multi_match":{"type":"best_fields","query":"php","lenient":true}},{"range":{"@timestamp":{"gte":"2015-09-18T06:31:44.000Z","lte":"2015-09-23T18:31:44.000Z","format":"strict_date_optional_time"}}}],"should":[],"must_not":[]}},"highlight":{"pre_tags":["@kibana-highlighted-field@"],"post_tags":["@/kibana-highlighted-field@"],"fields":{"*":{}},"fragment_size":2147483647}},"track_total_hits":true,"preference":1625679751040}},"options":{"sessionId":"36f6f740-275d-4748-81cd-6743064b4ee0","isRestore":false,"strategy":"ese","isStored":false}}]}' \
  --compressed

@tsullivan
Copy link
Member

Hi, who should take assignment of this issue? It's not clear if the UI is responsible for causing this error, but it doesn't look like it.

@exalate-issue-sync exalate-issue-sync bot added impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:small Small Level of Effort labels Jul 8, 2021
@thelink2012
Copy link

Hey. I also think this is probably not a UI/Kibana issue. It's probably some characteristic of the indexed data that's triggering a corner case that went unnoticed in the highlighting change in Elasticsearch. Like I said, I can look further in the weekend (your step by step guide will be invaluable, thank you), but I'd support whoever wishes to look this up full-time.

spalger added a commit that referenced this issue Jul 9, 2021
@spalger

This comment has been minimized.

@spalger
Copy link
Contributor

spalger commented Jul 10, 2021

Okay, this is impacting a lot of discover tests and after four attempts to squash all the failing suites I've just committed a skip for the entire set of discover tests until we can get this resolved

master/8.0: d776c09
7.x/7.15: 7e29910

@thelink2012
Copy link

Hey. I had a bit of trouble in setting up Kibana, so instead I tried to find a clue in the Elasticsearch request @tsullivan sent. I guess this is the culprit: "fragment_size":2147483647.

Could you try applying this patch and see if the issue persists?

@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Jul 12, 2021
tylersmalley pushed a commit to tylersmalley/kibana that referenced this issue Jul 12, 2021
@tylersmalley
Copy link
Contributor

tylersmalley commented Jul 12, 2021

@thelink2012, I have tested your patch against this singular test locally and it worked. I am running all of CI against it here as there were multiple tests affected. 🤞

@tylersmalley
Copy link
Contributor

@thelink2012, there are still quite a few tests failing. I really think we should consider reverting your PR to unblock all the downstream teams relying on ES being up-to-date with Kibana while we work these failures out.

@thelink2012
Copy link

thelink2012 commented Jul 13, 2021 via email

mistic pushed a commit that referenced this issue Jul 14, 2021
* temp use unverified snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

* Revert "skip failing es promotion suite (#104469)"

This reverts commit 87971e7.

* Revert "skip failing es promotion suite (#104467)"

This reverts commit c72ad3e.

* Revert "skip failing es promotion suite (#104466)"

This reverts commit a0b36c7.

* Revert "skip failing es promotion suite (#104413)"

This reverts commit 9773e3f.

* Revert "skip failing es promotion suite (#104409)"

This reverts commit 79608dc.

* Revert "skip failing es promotion suite (#104366)"

This reverts commit dfc5dbb.

* Revert "skip failing es promotion suite (#104365)"

This reverts commit 2695777.

* Revert "skip failing es promotion suite (#104364)"

This reverts commit 16f69d2.

* Revert "skip failing es promotion suite (#104362)"

This reverts commit 2e492c2.

* Revert another commit to promote ES snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit to kibanamachine/kibana that referenced this issue Jul 14, 2021
* temp use unverified snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

* Revert "skip failing es promotion suite (elastic#104469)"

This reverts commit 87971e7.

* Revert "skip failing es promotion suite (elastic#104467)"

This reverts commit c72ad3e.

* Revert "skip failing es promotion suite (elastic#104466)"

This reverts commit a0b36c7.

* Revert "skip failing es promotion suite (elastic#104413)"

This reverts commit 9773e3f.

* Revert "skip failing es promotion suite (elastic#104409)"

This reverts commit 79608dc.

* Revert "skip failing es promotion suite (elastic#104366)"

This reverts commit dfc5dbb.

* Revert "skip failing es promotion suite (elastic#104365)"

This reverts commit 2695777.

* Revert "skip failing es promotion suite (elastic#104364)"

This reverts commit 16f69d2.

* Revert "skip failing es promotion suite (elastic#104362)"

This reverts commit 2e492c2.

* Revert another commit to promote ES snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
kibanamachine added a commit that referenced this issue Jul 14, 2021
* temp use unverified snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

* Revert "skip failing es promotion suite (#104469)"

This reverts commit 87971e7.

* Revert "skip failing es promotion suite (#104467)"

This reverts commit c72ad3e.

* Revert "skip failing es promotion suite (#104466)"

This reverts commit a0b36c7.

* Revert "skip failing es promotion suite (#104413)"

This reverts commit 9773e3f.

* Revert "skip failing es promotion suite (#104409)"

This reverts commit 79608dc.

* Revert "skip failing es promotion suite (#104366)"

This reverts commit dfc5dbb.

* Revert "skip failing es promotion suite (#104365)"

This reverts commit 2695777.

* Revert "skip failing es promotion suite (#104364)"

This reverts commit 16f69d2.

* Revert "skip failing es promotion suite (#104362)"

This reverts commit 2e492c2.

* Revert another commit to promote ES snapshot

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Tyler Smalley <tyler.smalley@elastic.co>
@thelink2012
Copy link

Hi. Could someone help me validate that my patch no longer breaks Kibana?

Since this PR is 1 year old, let me summarize this thread. No need to re-read again at all. I have submitted/merged PR elastic/elasticsearch#73785 (old!) into Elasticsearch, but a few days later this issue here was opened because Kibana's tests were failing, and bisecting Elastic the reason found was the PR I submitted. After some discussion, I could fix a out of bounds access in my PR and some of the functional tests in Kibana that were failing started passing, but some didn't. I couldn't take a look at it at the time, so we ended up reverting the Elasticsearch PR.

I'm resuming my activity at that Elasticsearch patch at this moment (i.e. let's fix the failing tests in KIbana), and to my surprise, all functional tests in Kibana were passing. No changes to my patch. Just rebased to elasticsearch/master.

Could someone help me validate this? The patch we are talking about (with the OoB fix) is elastic/elasticsearch#75306.

I would greatly appreciate anyone's help here. Even giving me instructions on how to test a particular Elasticsearch patch in Kibana CI (like @tylersmalley did in #104571 I think?) would be great. Tests are passing locally but I would like to confirm in CI, such that we do not encounter further problems if we merge that PR again into Elasticsearch.

@thelink2012
Copy link

thelink2012 commented Jul 13, 2022

Other observations I have thus far is that reverting to Kibana's commit at the time of this issue (i.e. to 434568a) makes the functional tests fail again. That is, something happened in Kibana from 1 year ago to today that makes my patch work with Kibana. Anyone have any idea what could it be?

In any case, I'm investigating the issue at Kibana's commit 434568a just to be sure that there is no issue with my patch.

@thelink2012
Copy link

thelink2012 commented Jul 14, 2022

@tylersmalley hm. is it possible that the tests in #104571 were against a nightly build which didn't include the patch I mentioned in #104466 (comment)? Do note that in that comment I didn't have that fix merged in Elasticsearch.

Why I think this is a possibility? Because the test failing in Firefox smoke test at https://github.com/elastic/kibana/runs/3051098270 is "discover tab field data search php should show the correct hit count" which we have definitely fixed during the discussion in here as you have checked locally yourself here.

Do you think we can merge that PR into Elasticsearch again and check the results in Kibana when it gets the nightly snapshot?

@thelink2012
Copy link

thelink2012 commented Jul 14, 2022

Other observations I have thus far is that reverting to Kibana's commit at the time of this issue (i.e. to 434568a) makes the functional tests fail again. That is, something happened in Kibana from 1 year ago to today that makes my patch work with Kibana. Anyone have any idea what could it be?

Ok, it also fails without the patch, so it's unrelated to the patch. So I think we should do as I commented above. Merge the patch back into ES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker failed-es-promotion impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:medium Medium Level of Effort skipped-test v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants