Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kbn/journeys] fix hanging on telemetry call & improve logging #175194

Merged

Conversation

dmlemeshko
Copy link
Member

@dmlemeshko dmlemeshko commented Jan 22, 2024

Summary

This PR fixes the issue causing (mostly) login journey stuck for 14 min waiting for Telemetry call response.

Screenshot 2024-01-22 at 11 12 24

I believe the issue was in how we handle the Observables for request events. I added extra comment in the particular code change.

I no longer can reproduce it, all the events are reported correctly:
image

Logs cleaning is to log in console only performance metrics event but not all EBT elements. Also not to report some browser errors that not Kibana specific.

Testing:

run the following script 3-4 times

PERFORMANCE_ENABLE_TELEMETRY=1 node scripts/run_performance.js --journey-path x-pack/performance/journeys/login.ts
  • script is completed without delays (e.g. doesn't hang on after hook in TEST phase)
  • telemetry requests are logged with correct counter and all finished, e.g. Waiting for telemetry request #2 to complete is followed by Telemetry request #2 complete
  • only events started with Report event "performance_metric" are in console output

Comment on lines +348 to +361
const id = ++this.telemetryTrackerCount;
this.log.info(`Waiting for telemetry request #${id} to complete`);
return Rx.of(requestSuccess$).pipe(
Rx.timeout(60_000),
Rx.catchError((error) => {
if (error instanceof Error && error.name === 'TimeoutError') {
this.log.error(`Timeout error occurred: ${error.message}`);
}
// Rethrow the error if it's not a TimeoutError
return Rx.throwError(() => new Error(error));
}),
Rx.tap({
complete: () => this.log.debug(`Telemetry request #${id} complete`),
complete: () => this.log.info(`Telemetry request #${id} complete`),
error: (err) => this.log.error(`Telemetry request was not processed: ${err.message}`),
Copy link
Member Author

@dmlemeshko dmlemeshko Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few changes to make it work correctly:

  • telemetryTrackerCount was always returning 1 since it was misplaced before observables.
  • delete requestFailure$ since we don't have to do anything in case of Telemetry request failure event. We fully rely on kibana service to make things the right way.
  • adding timeout of 60 sec to issue error notification in case of no new emitted events;
  • log messages on both complete & error
 └-> Go to Transform Wizard
     └-> "before each" hook: global before each for "Go to Transform Wizard"
     │ debg APM DEBUG start span {"span":"fe0fad39231fa33e","parent":"9cf0f1c9d8df9a6a","trace":"ada4a00b5cac22401cd1ed060e28e4f2","name":"step: Go to Transform Wizard","type":"step","subtype":null,"action":null}
     │ debg APM DEBUG supersedeWithSpanRunContext(<Span fe0fad39231fa33e>)
     │ info Waiting for telemetry request #2 to complete
     │ info Telemetry request #2 complete
     │ info Waiting for telemetry request #3 to complete
     │ info Telemetry request #3 complete
     │ info Waiting for telemetry request #4 to complete
     │ info Telemetry request #4 complete
     │ debg APM DEBUG ended span {"span":"fe0fad39231fa33e","parent":"9cf0f1c9d8df9a6a","trace":"ada4a00b5cac22401cd1ed060e28e4f2","name":"step: Go to Transform Wizard","type":"step","subtype":null,"action":null}
     │ debg APM DEBUG addEndedSpan(step: Go to Transform Wizard)
     │ debg APM DEBUG encoding span {"span":"fe0fad39231fa33e","parent":"9cf0f1c9d8df9a6a","trace":"ada4a00b5cac22401cd1ed060e28e4f2","name":"step: Go to Transform Wizard","type":"step"}
     └- ✓ pass  (1.0m)

Comment on lines +39 to +41
async waitForListViewTable() {
await this.page.waitForSelector(subj('table-is-ready'), { state: 'visible' });
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to page object and updating tests for consistency.

Comment on lines +45 to +47
await page.waitForSelector(subj('userMenuButton'), {
state: 'attached',
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve waiting for home page after login

@dmlemeshko
Copy link
Member Author

/ci

@dmlemeshko dmlemeshko self-assigned this Jan 22, 2024
@dmlemeshko dmlemeshko added release_note:skip Skip the PR/issue when compiling release notes v8.12.1 v8.13.0 labels Jan 22, 2024
@dmlemeshko dmlemeshko marked this pull request as ready for review January 22, 2024 13:00
@dmlemeshko dmlemeshko requested review from a team as code owners January 22, 2024 13:00
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @dmlemeshko

Copy link
Member

@pheyos pheyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dmlemeshko dmlemeshko merged commit 8be528e into elastic:main Jan 22, 2024
17 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 22, 2024
…ic#175194)

## Summary

This PR fixes the issue causing (mostly) [login
journey](https://buildkite.com/elastic/kibana-single-user-performance/builds/12398#018d1149-cc2e-4591-a61c-176768081e2c)
stuck for 14 min waiting for Telemetry call response.

<img width="964" alt="Screenshot 2024-01-22 at 11 12 24"
src="https://github.com/elastic/kibana/assets/10977896/8cadc2ec-ee84-42f6-8a0c-ad949367429c">

I believe the issue was in how we handle the Observables for request
events. I added extra comment in the particular code change.

I no longer can reproduce it, all the events are reported correctly:
<img width="964" alt="image"
src="https://github.com/elastic/kibana/assets/10977896/fa2c4b27-dcf2-480b-a07f-aeb23045149a">

Logs cleaning is to log in console only performance metrics event but
not all EBT elements. Also not to report some browser errors that not
Kibana specific.

Testing:

run the following script 3-4 times
```
PERFORMANCE_ENABLE_TELEMETRY=1 node scripts/run_performance.js --journey-path x-pack/performance/journeys/login.ts
```

- script is completed without delays (e.g. doesn't hang on after hook in
TEST phase)
- telemetry requests are logged with correct counter and all finished,
e.g. `Waiting for telemetry request elastic#2 to complete` is followed by
`Telemetry request elastic#2 complete`
- only events started with `Report event "performance_metric"` are in
console output

(cherry picked from commit 8be528e)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Jan 22, 2024
…ging (#175194) (#175247)

# Backport

This will backport the following commits from `main` to `8.12`:
- [[kbn/journeys] fix hanging on telemetry call &amp; improve logging
(#175194)](#175194)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dzmitry
Lemechko","email":"dzmitry.lemechko@elastic.co"},"sourceCommit":{"committedDate":"2024-01-22T17:18:10Z","message":"[kbn/journeys]
fix hanging on telemetry call & improve logging (#175194)\n\n##
Summary\r\n\r\nThis PR fixes the issue causing (mostly)
[login\r\njourney](https://buildkite.com/elastic/kibana-single-user-performance/builds/12398#018d1149-cc2e-4591-a61c-176768081e2c)\r\nstuck
for 14 min waiting for Telemetry call response.\r\n\r\n\r\n<img
width=\"964\" alt=\"Screenshot 2024-01-22 at 11 12
24\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/8cadc2ec-ee84-42f6-8a0c-ad949367429c\">\r\n\r\nI
believe the issue was in how we handle the Observables for
request\r\nevents. I added extra comment in the particular code
change.\r\n\r\nI no longer can reproduce it, all the events are reported
correctly:\r\n<img width=\"964\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/fa2c4b27-dcf2-480b-a07f-aeb23045149a\">\r\n\r\nLogs
cleaning is to log in console only performance metrics event but\r\nnot
all EBT elements. Also not to report some browser errors that
not\r\nKibana specific.\r\n\r\n\r\nTesting:\r\n\r\nrun the following
script 3-4 times\r\n```\r\nPERFORMANCE_ENABLE_TELEMETRY=1 node
scripts/run_performance.js --journey-path
x-pack/performance/journeys/login.ts\r\n```\r\n\r\n- script is completed
without delays (e.g. doesn't hang on after hook in\r\nTEST phase)\r\n-
telemetry requests are logged with correct counter and all
finished,\r\ne.g. `Waiting for telemetry request #2 to complete` is
followed by\r\n`Telemetry request #2 complete`\r\n- only events started
with `Report event \"performance_metric\"` are in\r\nconsole
output","sha":"8be528efb3169b630f44b33912adaf63bdea2a25","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v8.12.1","v8.13.0"],"title":"[kbn/journeys]
fix hanging on telemetry call & improve
logging","number":175194,"url":"https://github.com/elastic/kibana/pull/175194","mergeCommit":{"message":"[kbn/journeys]
fix hanging on telemetry call & improve logging (#175194)\n\n##
Summary\r\n\r\nThis PR fixes the issue causing (mostly)
[login\r\njourney](https://buildkite.com/elastic/kibana-single-user-performance/builds/12398#018d1149-cc2e-4591-a61c-176768081e2c)\r\nstuck
for 14 min waiting for Telemetry call response.\r\n\r\n\r\n<img
width=\"964\" alt=\"Screenshot 2024-01-22 at 11 12
24\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/8cadc2ec-ee84-42f6-8a0c-ad949367429c\">\r\n\r\nI
believe the issue was in how we handle the Observables for
request\r\nevents. I added extra comment in the particular code
change.\r\n\r\nI no longer can reproduce it, all the events are reported
correctly:\r\n<img width=\"964\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/fa2c4b27-dcf2-480b-a07f-aeb23045149a\">\r\n\r\nLogs
cleaning is to log in console only performance metrics event but\r\nnot
all EBT elements. Also not to report some browser errors that
not\r\nKibana specific.\r\n\r\n\r\nTesting:\r\n\r\nrun the following
script 3-4 times\r\n```\r\nPERFORMANCE_ENABLE_TELEMETRY=1 node
scripts/run_performance.js --journey-path
x-pack/performance/journeys/login.ts\r\n```\r\n\r\n- script is completed
without delays (e.g. doesn't hang on after hook in\r\nTEST phase)\r\n-
telemetry requests are logged with correct counter and all
finished,\r\ne.g. `Waiting for telemetry request #2 to complete` is
followed by\r\n`Telemetry request #2 complete`\r\n- only events started
with `Report event \"performance_metric\"` are in\r\nconsole
output","sha":"8be528efb3169b630f44b33912adaf63bdea2a25"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","branchLabelMappingKey":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/175194","number":175194,"mergeCommit":{"message":"[kbn/journeys]
fix hanging on telemetry call & improve logging (#175194)\n\n##
Summary\r\n\r\nThis PR fixes the issue causing (mostly)
[login\r\njourney](https://buildkite.com/elastic/kibana-single-user-performance/builds/12398#018d1149-cc2e-4591-a61c-176768081e2c)\r\nstuck
for 14 min waiting for Telemetry call response.\r\n\r\n\r\n<img
width=\"964\" alt=\"Screenshot 2024-01-22 at 11 12
24\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/8cadc2ec-ee84-42f6-8a0c-ad949367429c\">\r\n\r\nI
believe the issue was in how we handle the Observables for
request\r\nevents. I added extra comment in the particular code
change.\r\n\r\nI no longer can reproduce it, all the events are reported
correctly:\r\n<img width=\"964\"
alt=\"image\"\r\nsrc=\"https://github.com/elastic/kibana/assets/10977896/fa2c4b27-dcf2-480b-a07f-aeb23045149a\">\r\n\r\nLogs
cleaning is to log in console only performance metrics event but\r\nnot
all EBT elements. Also not to report some browser errors that
not\r\nKibana specific.\r\n\r\n\r\nTesting:\r\n\r\nrun the following
script 3-4 times\r\n```\r\nPERFORMANCE_ENABLE_TELEMETRY=1 node
scripts/run_performance.js --journey-path
x-pack/performance/journeys/login.ts\r\n```\r\n\r\n- script is completed
without delays (e.g. doesn't hang on after hook in\r\nTEST phase)\r\n-
telemetry requests are logged with correct counter and all
finished,\r\ne.g. `Waiting for telemetry request #2 to complete` is
followed by\r\n`Telemetry request #2 complete`\r\n- only events started
with `Report event \"performance_metric\"` are in\r\nconsole
output","sha":"8be528efb3169b630f44b33912adaf63bdea2a25"}}]}]
BACKPORT-->

Co-authored-by: Dzmitry Lemechko <dzmitry.lemechko@elastic.co>
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…ic#175194)

## Summary

This PR fixes the issue causing (mostly) [login
journey](https://buildkite.com/elastic/kibana-single-user-performance/builds/12398#018d1149-cc2e-4591-a61c-176768081e2c)
stuck for 14 min waiting for Telemetry call response.


<img width="964" alt="Screenshot 2024-01-22 at 11 12 24"
src="https://github.com/elastic/kibana/assets/10977896/8cadc2ec-ee84-42f6-8a0c-ad949367429c">

I believe the issue was in how we handle the Observables for request
events. I added extra comment in the particular code change.

I no longer can reproduce it, all the events are reported correctly:
<img width="964" alt="image"
src="https://github.com/elastic/kibana/assets/10977896/fa2c4b27-dcf2-480b-a07f-aeb23045149a">

Logs cleaning is to log in console only performance metrics event but
not all EBT elements. Also not to report some browser errors that not
Kibana specific.


Testing:

run the following script 3-4 times
```
PERFORMANCE_ENABLE_TELEMETRY=1 node scripts/run_performance.js --journey-path x-pack/performance/journeys/login.ts
```

- script is completed without delays (e.g. doesn't hang on after hook in
TEST phase)
- telemetry requests are logged with correct counter and all finished,
e.g. `Waiting for telemetry request elastic#2 to complete` is followed by
`Telemetry request elastic#2 complete`
- only events started with `Report event "performance_metric"` are in
console output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes v8.12.1 v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants