Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Better default value for fleet long polling timeout #76393

Merged

Conversation

nchaulet
Copy link
Member

@nchaulet nchaulet commented Sep 1, 2020

Summary

Resolve #75552

Increasing the long polling timeout allow Fleet to support a higher number of agents.

Done in this PR:

  • Set the default long polling timeout to 5 minutes instead of 1 minutes
  • Set the socket idle timeout for the agent checkin route to the xpack.ingestManager.fleet.pollingRequestTimeout. (so we can have a different timeout for Fleet than the rest of Kibana)

@nchaulet nchaulet added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.10.0 Team:Fleet Team label for Observability Data Collection Fleet team Ingest Management:beta2 labels Sep 1, 2020
@nchaulet nchaulet requested a review from a team September 1, 2020 15:32
@nchaulet nchaulet self-assigned this Sep 1, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

Comment on lines +174 to +175
// Set a timeout 3s before the real timeout to have a chance to respond an empty response before socket timeout
Math.max((appContextService.getConfig()?.fleet.pollingRequestTimeout ?? 0) - 3000, 3000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we can run into an edge case where the response can come in at almost the request time out limit? let's say 2 seconds before the limit. will this logic cause that to be ignored and treated as an empty response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can have an edge case here
The arbitrary 3s delay try to mitigate the risk of not sending any response (the way Kibana handle socket timeout the socket is destroyed immediately so we cannot send any data), but it can still happen if we some huge delay in the event loop.
If the response is happening like 2 seconds before the socket timeout we will send an empty response but it will be fixed during the next checkin.

Copy link
Member Author

@nchaulet nchaulet Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jen-huang let me know if I can still clarify that :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed during the next checkin

ah got it. that sounds acceptable to me 👍

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

page load bundle size

id value diff baseline
ingestManager 467.8KB +321.0B 467.5KB

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nchaulet nchaulet requested a review from jen-huang September 2, 2020 17:42
@nchaulet nchaulet merged commit 3d91157 into elastic:master Sep 3, 2020
@nchaulet nchaulet deleted the feature-fleet-better-polling-timeout branch September 3, 2020 15:06
gmmorris added a commit to gmmorris/kibana that referenced this pull request Sep 3, 2020
* master: (340 commits)
  [data.search.SearchSource] Remove legacy ES client APIs. (elastic#75943)
  [release notes] automatically retry on Github API 5xx errors (elastic#76447)
  [es_ui_shared] Fix eslint exhaustive deps rule (elastic#76392)
  [i18n] Integrate 7.9.1 Translations (elastic#76391)
  [APM] Update aggregations to support script sources (elastic#76429)
  [Security Solution] Refactor Network Top Countries to use Search Strategy (elastic#76244)
  Document security settings available on ESS (elastic#76513)
  [Ingest Manager] Add input revision to the config send to the agent (elastic#76327)
  [DOCS] Identifies cloud settings for Monitoring (elastic#76579)
  [DOCS] Identifies Cloud settings in Dev Tools (elastic#76583)
  [Ingest Manager] Better default value for fleet long polling timeout (elastic#76393)
  [data.indexPatterns] Fix broken rollup index pattern creation (elastic#76593)
  [Ingest Manager] Split Registry errors into Connection & Response (elastic#76558)
  [Security Solution] add an excess validation instead of the exact match (elastic#76472)
  Introduce TS incremental builds & move src/test_utils to TS project  (elastic#76082)
  fix bad merge (elastic#76629)
  [Newsfeed] Ensure the version format when calling the API (elastic#76381)
  remove server_extensions mixin (elastic#76606)
  Remove legacy applications and legacy mode (elastic#75987)
  [Discover] Fix sidebar element focus behavior when adding / removing columns (elastic#75749)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ingest Manager] Better default for agent checkin long polling
4 participants