Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kbn-ftr-common-functional-services] extend retry service #178660

Merged
merged 13 commits into from
Mar 19, 2024

Conversation

dmlemeshko
Copy link
Member

@dmlemeshko dmlemeshko commented Mar 13, 2024

Summary

I took the retry wrapper created by jpdjere in #173998 and extended retryForSuccess<T> with required capabilities to provide the same functionality.

This PR:

  1. extends retry service with new function tryWithRetries<T> => : Promise<T> to retry block options.retryCount times within options.timeout period and return block result
  const response = await retry.tryWithRetries<SearchResponse>(
   'search request',
    async () => {
        const response = await supertest
          .post(`/internal/search/es`)
          .set(ELASTIC_HTTP_VERSION_HEADER, '1')
          .send({
            params: {
              body: {
                query: {
                  match_all: {},
                },
              },
            },
          })
          .expect(200);
        return response.body as SearchResponse;
    },
    {
      retryCount: 4,
      retryDelay: 100, // optional
      timeout: 30000, // optional
    }
  1. removes utils/retry.ts wrapper and migrate tests to FTR Retry service
  2. Adds descriptions to Retry service functions explaining the default use case

How the failures look like:

  • when reached timeout before retry count limit
 Error: retry.tryWithRetries reached timeout 11000 ms waiting for 'run console request'
 Error: expected '# Click the Variables button, above, to create your own variables.\nGET ${exampleVariable1} // _search\n{\n  "query": {\n    "${exampleVariable2}": {} // match_all\n  }\n}' to sort of equal '5'
     at Assertion.assert (expect.js:100:11)
     at Assertion.eql (expect.js:244:8)
     at _console.ts:47:41
     at processTicksAndRejections (node:internal/process/task_queues:95:5)
     at runAttempt (retry_with_retries.ts:29:15)
     at retryWithRetries (retry_with_retries.ts:97:21)
     at RetryService.tryForTime (retry.ts:38:12)
     at Context.<anonymous> (_console.ts:44:7)
     at Object.apply (wrap_function.js:73:16)
  • when reached retry count limit before timeout
 Error: retry.tryWithRetries reached the limit of attempts waiting for 'run console request': 2 out of 2
 Error: expected '# Click the Variables button, above, to create your own variables.\nGET ${exampleVariable1} // _search\n{\n  "query": {\n    "${exampleVariable2}": {} // match_all\n  }\n}' to sort of equal '5'
     at Assertion.assert (expect.js:100:11)
     at Assertion.eql (expect.js:244:8)
     at _console.ts:47:41
     at processTicksAndRejections (node:internal/process/task_queues:95:5)
     at runAttempt (retry_for_success.ts:29:15)
     at retryForSuccess (retry_for_success.ts:97:21)
     at RetryService.tryWithRetries (retry.ts:115:12)
     at Context.<anonymous> (_console.ts:44:7)
     at Object.apply (wrap_function.js:73:16)

@dmlemeshko dmlemeshko added release_note:skip Skip the PR/issue when compiling release notes FTR v8.14.0 v8.13.1 labels Mar 18, 2024
@dmlemeshko dmlemeshko marked this pull request as ready for review March 18, 2024 12:33
@dmlemeshko dmlemeshko requested review from a team as code owners March 18, 2024 12:33
@dmlemeshko dmlemeshko requested a review from nikitaindik March 18, 2024 12:33
@dmlemeshko
Copy link
Member Author

/ci

@dmlemeshko
Copy link
Member Author

/ci

Copy link
Member

@pheyos pheyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Also tried a few scenarios locally and everything worked fine ✔️

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending retryService functionality 🚀

onFailure(
lastError,
// optionally extend error message with description
`reached the limit of attempts${addText(description)}: ${
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

log,
});
{
retryCount: MAX_RETRIES,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much nicer by removing the log service and using the retryService function directly 💯

What would be the default values for these configs in case we don't provide it directly? Are they configurable at the test config level?

Copy link
Member Author

@dmlemeshko dmlemeshko Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the default values for these configs in case we don't provide it directly? Are they configurable at the test config level?

I believe hardcoding retryCount default value (e.g. to 2) will lead to misusage: similar to retry.waitForWithTimeout takes timeout value, tryWithRetries require options.retryCount for each call.

Few points:

  • I strongly believed that if retryCount is not defined explicitly, retry.try is the correct function to use.
  • Wrapping too many API calls with retry.tryWithRetries may hide instability of the product. By explicitly setting retryCount folks take more responsibility to evaluate individually case to case.

Copy link
Contributor

@maximpn maximpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlemeshko thank you for extending retryForSuccess<T>() utility function 👍

The changes look good to me besides a couple of things I left comment about. In general I'm curious if you considered using p-retry for retrying functionality?


while (true) {
if (retryCount) {
// Use retryCount as an optional condition
if (++attemptCounter > retryCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's possible to get rid of nested ifs by merging the condition

if (retryCount && ++attemptCounter > retryCount) {
  // ...
}

/**
* Use to retry block {options.retryCount} times within {options.timeout} period and return block result
* @param description block description
* @param block retrying operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If I got properly block here refers to a code block. I'd say it's quite a general term. Maybe rename it to something like action or retriableFn

if (Date.now() - start > timeout) {
await onFailure(lastError);
onFailure(lastError, `reached timeout ${timeout} ms${addText(description)}`);
throw new Error('expected onFailure() option to throw an error');
} else if (lastError && criticalWebDriverErrors.includes(lastError.name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It could be refactored to omit else. Since previous if code block throws an error else is optional here.

if (retryCount) {
// Use retryCount as an optional condition
if (++attemptCounter > retryCount) {
onFailure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if onFailure doesn't break the while loop? The default implementation throws an error but what if a custom onFailure handler doesn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I will make a fix. Still you can't pass custom onFailure logic outside Retry service.

retryDelay?: number;
retryCount?: number;
}

export async function retryForSuccess<T>(log: ToolingLog, options: Options<T>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you investigated if p-retry is applicable here? Additionally exponential back off timeout may work better.

Copy link
Member Author

@dmlemeshko dmlemeshko Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a chat with @maryam-saeidi about p-retry and the issues she faced while using it. This PR is actually about to help migrate away from p-retry.
I think p-retry implementation is more complex vs FTR retry service, but since we never had stability issues with FTR retry (folks question logging & interface, understood :) ) I would keep logic as simple as possible.
As for logging and capabilities, I still believe we can collaborate with PRs like this one to achieve improvements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmlemeshko Could you share found issues with p-retry?

My observation is p-retry feels like a quite popular library with 14M weekly downloads. A lot of bugs should've been fixed already. On top of that p-retry provides flexible configuration like exponential back off retry timeout while it still can be configured linearly. It allows to abort retries as well. Checking Kibana's codebase it's not hard to see it's used in multiple packages and plugins.

Having a comment for retryForSuccess with some explanation and improvements over p-retry will be really helpful for future maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maximpn Here is the ticket that I explained what challenge we faced, and in our case, we didn't have any log about the retry attempts which made it hard for us to understand if it was one request that timeout or there were multiple requests and due to exponential back off, the overall attempt failed. (The way that we tested it locally was by throwing an error in the test function and setting the retry to 10 times)

In general, I think it would be better to only rely on one library/package for retry purposes, for easier maintenance and a better understanding of the tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maryam-saeidi thanks for the details 👍

we didn't have any log about the retry attempts
Yes, it sounds like a bug. Or exponential timeout was too high so the test was interrupted due to a timeout. Btw, you could configure p-retry to have a constant retry interval.

@maximpn maximpn removed the request for review from nikitaindik March 19, 2024 09:12
Copy link
Contributor

@jpdjere jpdjere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmlemeshko for this work! 🚀

Glad I could contribute to improve the FTR testing experience

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/ftr-common-functional-services 31 21 -10
Unknown metric groups

API count

id before after diff
@kbn/ftr-common-functional-services 31 36 +5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dmlemeshko dmlemeshko merged commit 277b3fb into elastic:main Mar 19, 2024
17 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Mar 19, 2024
…8660)

## Summary

I took the retry wrapper created by
[jpdjere](https://github.com/jpdjere) in elastic#173998 and extended
`retryForSuccess<T>` with required capabilities to provide the same
functionality.

This PR:
1) extends retry service with new function `tryWithRetries<T> => :
Promise<T>` to retry block `options.retryCount` times within
`options.timeout` period and return block result

```ts
  const response = await retry.tryWithRetries<SearchResponse>(
   'search request',
    async () => {
        const response = await supertest
          .post(`/internal/search/es`)
          .set(ELASTIC_HTTP_VERSION_HEADER, '1')
          .send({
            params: {
              body: {
                query: {
                  match_all: {},
                },
              },
            },
          })
          .expect(200);
        return response.body as SearchResponse;
    },
    {
      retryCount: 4,
      retryDelay: 100, // optional
      timeout: 30000, // optional
    }
```

2) removes `utils/retry.ts` wrapper and migrate tests to FTR Retry
service
3) Adds descriptions to Retry service functions explaining the default
use case

How the failures look like:

- when reached timeout before retry count limit
```
 Error: retry.tryWithRetries reached timeout 11000 ms waiting for 'run console request'
 Error: expected '# Click the Variables button, above, to create your own variables.\nGET ${exampleVariable1} // _search\n{\n  "query": {\n    "${exampleVariable2}": {} // match_all\n  }\n}' to sort of equal '5'
     at Assertion.assert (expect.js:100:11)
     at Assertion.eql (expect.js:244:8)
     at _console.ts:47:41
     at processTicksAndRejections (node:internal/process/task_queues:95:5)
     at runAttempt (retry_with_retries.ts:29:15)
     at retryWithRetries (retry_with_retries.ts:97:21)
     at RetryService.tryForTime (retry.ts:38:12)
     at Context.<anonymous> (_console.ts:44:7)
     at Object.apply (wrap_function.js:73:16)
```
- when reached retry count limit before timeout
```
 Error: retry.tryWithRetries reached the limit of attempts waiting for 'run console request': 2 out of 2
 Error: expected '# Click the Variables button, above, to create your own variables.\nGET ${exampleVariable1} // _search\n{\n  "query": {\n    "${exampleVariable2}": {} // match_all\n  }\n}' to sort of equal '5'
     at Assertion.assert (expect.js:100:11)
     at Assertion.eql (expect.js:244:8)
     at _console.ts:47:41
     at processTicksAndRejections (node:internal/process/task_queues:95:5)
     at runAttempt (retry_for_success.ts:29:15)
     at retryForSuccess (retry_for_success.ts:97:21)
     at RetryService.tryWithRetries (retry.ts:115:12)
     at Context.<anonymous> (_console.ts:44:7)
     at Object.apply (wrap_function.js:73:16)
```

(cherry picked from commit 277b3fb)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.13

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Mar 19, 2024
…8660) (#178978)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[kbn-ftr-common-functional-services] extend retry service
(#178660)](#178660)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dzmitry
Lemechko","email":"dzmitry.lemechko@elastic.co"},"sourceCommit":{"committedDate":"2024-03-19T15:32:59Z","message":"[kbn-ftr-common-functional-services]
extend retry service (#178660)\n\n## Summary\r\n\r\nI took the retry
wrapper created by\r\n[jpdjere](https://github.com/jpdjere) in #173998
and extended\r\n`retryForSuccess<T>` with required capabilities to
provide the same\r\nfunctionality.\r\n\r\nThis PR:\r\n1) extends retry
service with new function `tryWithRetries<T> => :\r\nPromise<T>` to
retry block `options.retryCount` times within\r\n`options.timeout`
period and return block result\r\n\r\n```ts\r\n const response = await
retry.tryWithRetries<SearchResponse>(\r\n 'search request',\r\n async ()
=> {\r\n const response = await supertest\r\n
.post(`/internal/search/es`)\r\n .set(ELASTIC_HTTP_VERSION_HEADER,
'1')\r\n .send({\r\n params: {\r\n body: {\r\n query: {\r\n match_all:
{},\r\n },\r\n },\r\n },\r\n })\r\n .expect(200);\r\n return
response.body as SearchResponse;\r\n },\r\n {\r\n retryCount: 4,\r\n
retryDelay: 100, // optional\r\n timeout: 30000, // optional\r\n
}\r\n```\r\n\r\n2) removes `utils/retry.ts` wrapper and migrate tests to
FTR Retry\r\nservice\r\n3) Adds descriptions to Retry service functions
explaining the default\r\nuse case\r\n\r\nHow the failures look
like:\r\n\r\n- when reached timeout before retry count limit \r\n```\r\n
Error: retry.tryWithRetries reached timeout 11000 ms waiting for 'run
console request'\r\n Error: expected '# Click the Variables button,
above, to create your own variables.\\nGET ${exampleVariable1} //
_search\\n{\\n \"query\": {\\n \"${exampleVariable2}\": {} //
match_all\\n }\\n}' to sort of equal '5'\r\n at Assertion.assert
(expect.js:100:11)\r\n at Assertion.eql (expect.js:244:8)\r\n at
_console.ts:47:41\r\n at processTicksAndRejections
(node:internal/process/task_queues:95:5)\r\n at runAttempt
(retry_with_retries.ts:29:15)\r\n at retryWithRetries
(retry_with_retries.ts:97:21)\r\n at RetryService.tryForTime
(retry.ts:38:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply (wrap_function.js:73:16)\r\n```\r\n- when reached retry
count limit before timeout\r\n```\r\n Error: retry.tryWithRetries
reached the limit of attempts waiting for 'run console request': 2 out
of 2\r\n Error: expected '# Click the Variables button, above, to create
your own variables.\\nGET ${exampleVariable1} // _search\\n{\\n
\"query\": {\\n \"${exampleVariable2}\": {} // match_all\\n }\\n}' to
sort of equal '5'\r\n at Assertion.assert (expect.js:100:11)\r\n at
Assertion.eql (expect.js:244:8)\r\n at _console.ts:47:41\r\n at
processTicksAndRejections (node:internal/process/task_queues:95:5)\r\n
at runAttempt (retry_for_success.ts:29:15)\r\n at retryForSuccess
(retry_for_success.ts:97:21)\r\n at RetryService.tryWithRetries
(retry.ts:115:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply
(wrap_function.js:73:16)\r\n```","sha":"277b3fbc24889bedd512f23674e768d2f4c43294","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","FTR","v8.14.0","v8.13.1"],"title":"[kbn-ftr-common-functional-services]
extend retry
service","number":178660,"url":"https://github.com/elastic/kibana/pull/178660","mergeCommit":{"message":"[kbn-ftr-common-functional-services]
extend retry service (#178660)\n\n## Summary\r\n\r\nI took the retry
wrapper created by\r\n[jpdjere](https://github.com/jpdjere) in #173998
and extended\r\n`retryForSuccess<T>` with required capabilities to
provide the same\r\nfunctionality.\r\n\r\nThis PR:\r\n1) extends retry
service with new function `tryWithRetries<T> => :\r\nPromise<T>` to
retry block `options.retryCount` times within\r\n`options.timeout`
period and return block result\r\n\r\n```ts\r\n const response = await
retry.tryWithRetries<SearchResponse>(\r\n 'search request',\r\n async ()
=> {\r\n const response = await supertest\r\n
.post(`/internal/search/es`)\r\n .set(ELASTIC_HTTP_VERSION_HEADER,
'1')\r\n .send({\r\n params: {\r\n body: {\r\n query: {\r\n match_all:
{},\r\n },\r\n },\r\n },\r\n })\r\n .expect(200);\r\n return
response.body as SearchResponse;\r\n },\r\n {\r\n retryCount: 4,\r\n
retryDelay: 100, // optional\r\n timeout: 30000, // optional\r\n
}\r\n```\r\n\r\n2) removes `utils/retry.ts` wrapper and migrate tests to
FTR Retry\r\nservice\r\n3) Adds descriptions to Retry service functions
explaining the default\r\nuse case\r\n\r\nHow the failures look
like:\r\n\r\n- when reached timeout before retry count limit \r\n```\r\n
Error: retry.tryWithRetries reached timeout 11000 ms waiting for 'run
console request'\r\n Error: expected '# Click the Variables button,
above, to create your own variables.\\nGET ${exampleVariable1} //
_search\\n{\\n \"query\": {\\n \"${exampleVariable2}\": {} //
match_all\\n }\\n}' to sort of equal '5'\r\n at Assertion.assert
(expect.js:100:11)\r\n at Assertion.eql (expect.js:244:8)\r\n at
_console.ts:47:41\r\n at processTicksAndRejections
(node:internal/process/task_queues:95:5)\r\n at runAttempt
(retry_with_retries.ts:29:15)\r\n at retryWithRetries
(retry_with_retries.ts:97:21)\r\n at RetryService.tryForTime
(retry.ts:38:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply (wrap_function.js:73:16)\r\n```\r\n- when reached retry
count limit before timeout\r\n```\r\n Error: retry.tryWithRetries
reached the limit of attempts waiting for 'run console request': 2 out
of 2\r\n Error: expected '# Click the Variables button, above, to create
your own variables.\\nGET ${exampleVariable1} // _search\\n{\\n
\"query\": {\\n \"${exampleVariable2}\": {} // match_all\\n }\\n}' to
sort of equal '5'\r\n at Assertion.assert (expect.js:100:11)\r\n at
Assertion.eql (expect.js:244:8)\r\n at _console.ts:47:41\r\n at
processTicksAndRejections (node:internal/process/task_queues:95:5)\r\n
at runAttempt (retry_for_success.ts:29:15)\r\n at retryForSuccess
(retry_for_success.ts:97:21)\r\n at RetryService.tryWithRetries
(retry.ts:115:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply
(wrap_function.js:73:16)\r\n```","sha":"277b3fbc24889bedd512f23674e768d2f4c43294"}},"sourceBranch":"main","suggestedTargetBranches":["8.13"],"targetPullRequestStates":[{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/178660","number":178660,"mergeCommit":{"message":"[kbn-ftr-common-functional-services]
extend retry service (#178660)\n\n## Summary\r\n\r\nI took the retry
wrapper created by\r\n[jpdjere](https://github.com/jpdjere) in #173998
and extended\r\n`retryForSuccess<T>` with required capabilities to
provide the same\r\nfunctionality.\r\n\r\nThis PR:\r\n1) extends retry
service with new function `tryWithRetries<T> => :\r\nPromise<T>` to
retry block `options.retryCount` times within\r\n`options.timeout`
period and return block result\r\n\r\n```ts\r\n const response = await
retry.tryWithRetries<SearchResponse>(\r\n 'search request',\r\n async ()
=> {\r\n const response = await supertest\r\n
.post(`/internal/search/es`)\r\n .set(ELASTIC_HTTP_VERSION_HEADER,
'1')\r\n .send({\r\n params: {\r\n body: {\r\n query: {\r\n match_all:
{},\r\n },\r\n },\r\n },\r\n })\r\n .expect(200);\r\n return
response.body as SearchResponse;\r\n },\r\n {\r\n retryCount: 4,\r\n
retryDelay: 100, // optional\r\n timeout: 30000, // optional\r\n
}\r\n```\r\n\r\n2) removes `utils/retry.ts` wrapper and migrate tests to
FTR Retry\r\nservice\r\n3) Adds descriptions to Retry service functions
explaining the default\r\nuse case\r\n\r\nHow the failures look
like:\r\n\r\n- when reached timeout before retry count limit \r\n```\r\n
Error: retry.tryWithRetries reached timeout 11000 ms waiting for 'run
console request'\r\n Error: expected '# Click the Variables button,
above, to create your own variables.\\nGET ${exampleVariable1} //
_search\\n{\\n \"query\": {\\n \"${exampleVariable2}\": {} //
match_all\\n }\\n}' to sort of equal '5'\r\n at Assertion.assert
(expect.js:100:11)\r\n at Assertion.eql (expect.js:244:8)\r\n at
_console.ts:47:41\r\n at processTicksAndRejections
(node:internal/process/task_queues:95:5)\r\n at runAttempt
(retry_with_retries.ts:29:15)\r\n at retryWithRetries
(retry_with_retries.ts:97:21)\r\n at RetryService.tryForTime
(retry.ts:38:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply (wrap_function.js:73:16)\r\n```\r\n- when reached retry
count limit before timeout\r\n```\r\n Error: retry.tryWithRetries
reached the limit of attempts waiting for 'run console request': 2 out
of 2\r\n Error: expected '# Click the Variables button, above, to create
your own variables.\\nGET ${exampleVariable1} // _search\\n{\\n
\"query\": {\\n \"${exampleVariable2}\": {} // match_all\\n }\\n}' to
sort of equal '5'\r\n at Assertion.assert (expect.js:100:11)\r\n at
Assertion.eql (expect.js:244:8)\r\n at _console.ts:47:41\r\n at
processTicksAndRejections (node:internal/process/task_queues:95:5)\r\n
at runAttempt (retry_for_success.ts:29:15)\r\n at retryForSuccess
(retry_for_success.ts:97:21)\r\n at RetryService.tryWithRetries
(retry.ts:115:12)\r\n at Context.<anonymous> (_console.ts:44:7)\r\n at
Object.apply
(wrap_function.js:73:16)\r\n```","sha":"277b3fbc24889bedd512f23674e768d2f4c43294"}},{"branch":"8.13","label":"v8.13.1","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Dzmitry Lemechko <dzmitry.lemechko@elastic.co>
wayneseymour added a commit that referenced this pull request Sep 16, 2024
## Summary
Resolves: #192201

- Expose `TryWithRetriesOptions`
  - Tune timeouts to pass ci 
- Add attempt count debug info to `retry/retry_for_success.ts`
  - Helps with tuning timeout params
- Move exposure of `AlertingApiProvider` from
`x-pack/test_serverless/api_integration/services/index.ts` ->
`x-pack/test_serverless/shared/services/deployment_agnostic_services.ts`
- This exposes the alerting api under Deployment Agnostic Services (DA),
and DA is exposed within
`x-pack/test_serverless/functional/services/index.ts` (Shared Services
[Serverless])
- Collapse helper script functions into just another object literal
stanza within `AlertingApiProvider`
- Update all references
- Refactor alerting api to use `retry` service, instead of p-retry
(following [this pr](#178660))

### Additional debug logging 

Run in debug mode (add `-v`):
```
node scripts/functional_tests \
  --config x-pack/test_serverless/api_integration/test_suites/search/common_configs/config.group1.ts \
  --grep "Summary actions" 
  -v
```

#### After
```
       │ sill retry.tryWithRetries('Alerting API - waitForDocumentInIndex, retryOptions: {"retryCount":5,"retryDelay":200}', [object AsyncFunction], [object Object])
         │ debg --- retry.tryWithRetries error: index_not_found_exception
         │      	Root causes:
         │      		index_not_found_exception: no such index [alert-action-es-query] - Attempt #: 1
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 2
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 3
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 4
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 5
         
...
// Msg after all attempts fail:

       │   Error: retry.tryWithRetries reached the limit of attempts waiting for 'Alerting API - waitForDocumentInIndex, retryOptions: {"retryCount":5,"retryDelay":200}': 5 out of 5
       │   ResponseError: index_not_found_exception
       │   	Root causes:
       │   		index_not_found_exception: no such index [alert-action-es-query]
       │       at SniffingTransport._request (node_modules/@elastic/transport/src/Transport.ts:601:17)
       │       at processTicksAndRejections (node:internal/process/task_queues:95:5)
       │       at /Users/trezworkbox/dev/main.worktrees/cleanup-alerting-api/node_modules/@elastic/transport/src/Transport.ts:704:22
       │       at SniffingTransport.request (node_modules/@elastic/transport/src/Transport.ts:701:14)
       │       at Proxy.SearchApi (node_modules/@elastic/elasticsearch/src/api/api/search.ts:96:10)
       │       at alerting_api.ts:123:28
       │       at runAttempt (retry_for_success.ts:30:15)
       │       at retryForSuccess (retry_for_success.ts:99:21)
       │       at Proxy.tryWithRetries (retry.ts:113:12)
       │       at Object.waitForDocumentInIndex (alerting_api.ts:120:14)
       │       at Context.<anonymous> (summary_actions.ts:146:20)
       │       at Object.apply (wrap_function.js:74:16)
       │       at Object.apply (wrap_function.js:74:16)
       │       at onFailure (retry_for_success.ts:18:9)
       │       at retryForSuccess (retry_for_success.ts:75:7)
       │       at Proxy.tryWithRetries (retry.ts:113:12)
       │       at Object.waitForDocumentInIndex (alerting_api.ts:120:14)
       │       at Context.<anonymous> (summary_actions.ts:146:20)
       │       at Object.apply (wrap_function.js:74:16)
       │       at Object.apply (wrap_function.js:74:16)
```
### Notes
Was put back in draft to additional scope detailed in issue linked
above.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 25, 2024
## Summary
Resolves: elastic#192201

- Expose `TryWithRetriesOptions`
  - Tune timeouts to pass ci
- Add attempt count debug info to `retry/retry_for_success.ts`
  - Helps with tuning timeout params
- Move exposure of `AlertingApiProvider` from
`x-pack/test_serverless/api_integration/services/index.ts` ->
`x-pack/test_serverless/shared/services/deployment_agnostic_services.ts`
- This exposes the alerting api under Deployment Agnostic Services (DA),
and DA is exposed within
`x-pack/test_serverless/functional/services/index.ts` (Shared Services
[Serverless])
- Collapse helper script functions into just another object literal
stanza within `AlertingApiProvider`
- Update all references
- Refactor alerting api to use `retry` service, instead of p-retry
(following [this pr](elastic#178660))

### Additional debug logging

Run in debug mode (add `-v`):
```
node scripts/functional_tests \
  --config x-pack/test_serverless/api_integration/test_suites/search/common_configs/config.group1.ts \
  --grep "Summary actions"
  -v
```

#### After
```
       │ sill retry.tryWithRetries('Alerting API - waitForDocumentInIndex, retryOptions: {"retryCount":5,"retryDelay":200}', [object AsyncFunction], [object Object])
         │ debg --- retry.tryWithRetries error: index_not_found_exception
         │      	Root causes:
         │      		index_not_found_exception: no such index [alert-action-es-query] - Attempt #: 1
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 2
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 3
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 4
         │ sill es.search([object Object])
         │ debg --- retry.tryWithRetries failed again with the same message... - Attempt #: 5

...
// Msg after all attempts fail:

       │   Error: retry.tryWithRetries reached the limit of attempts waiting for 'Alerting API - waitForDocumentInIndex, retryOptions: {"retryCount":5,"retryDelay":200}': 5 out of 5
       │   ResponseError: index_not_found_exception
       │   	Root causes:
       │   		index_not_found_exception: no such index [alert-action-es-query]
       │       at SniffingTransport._request (node_modules/@elastic/transport/src/Transport.ts:601:17)
       │       at processTicksAndRejections (node:internal/process/task_queues:95:5)
       │       at /Users/trezworkbox/dev/main.worktrees/cleanup-alerting-api/node_modules/@elastic/transport/src/Transport.ts:704:22
       │       at SniffingTransport.request (node_modules/@elastic/transport/src/Transport.ts:701:14)
       │       at Proxy.SearchApi (node_modules/@elastic/elasticsearch/src/api/api/search.ts:96:10)
       │       at alerting_api.ts:123:28
       │       at runAttempt (retry_for_success.ts:30:15)
       │       at retryForSuccess (retry_for_success.ts:99:21)
       │       at Proxy.tryWithRetries (retry.ts:113:12)
       │       at Object.waitForDocumentInIndex (alerting_api.ts:120:14)
       │       at Context.<anonymous> (summary_actions.ts:146:20)
       │       at Object.apply (wrap_function.js:74:16)
       │       at Object.apply (wrap_function.js:74:16)
       │       at onFailure (retry_for_success.ts:18:9)
       │       at retryForSuccess (retry_for_success.ts:75:7)
       │       at Proxy.tryWithRetries (retry.ts:113:12)
       │       at Object.waitForDocumentInIndex (alerting_api.ts:120:14)
       │       at Context.<anonymous> (summary_actions.ts:146:20)
       │       at Object.apply (wrap_function.js:74:16)
       │       at Object.apply (wrap_function.js:74:16)
```
### Notes
Was put back in draft to additional scope detailed in issue linked
above.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 9d22e8c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FTR release_note:skip Skip the PR/issue when compiling release notes v8.13.0 v8.13.1 v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants