-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Enforce 10 min cooldown for agent upgrade #168606
[Fleet] Enforce 10 min cooldown for agent upgrade #168606
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
@elasticmachine merge upstream |
@elasticmachine merge upstream |
Pinging @elastic/fleet (Team:Fleet) |
Opening this for review as CI failures seem unrelated. Can someone confirm whether |
@jillguyonnet Yes we should backport it to 8.11 as it is needed for #135539 |
} | ||
|
||
const elaspedSinceUpgradeInMillis = Date.now() - Date.parse(agent.upgraded_at); | ||
const upgradeCooldownInMin = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could you uppercase the name so it's clear that is a constant?
@@ -78,7 +78,8 @@ export async function upgradeBatch( | |||
agentsToCheckUpgradeable.map(async (agent) => { | |||
// Filter out agents currently unenrolling, unenrolled, or not upgradeable b/c of version check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also edit this comment to cover the new case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
return response.customError({ | ||
statusCode: 400, | ||
body: { | ||
message: `agent ${request.params.agentId} was upgraded less than 10 minutes ago`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message should probably interpolate the UPGRADE_COOLDOWN_IN_MIN
constant so we don't forget to update it if that constant ever changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks for reviewing 👍
@elasticmachine merge upstream |
return response.customError({ | ||
statusCode: 400, | ||
body: { | ||
message: `agent ${request.params.agentId} was upgraded less than ${AGENT_UPGRADE_COOLDOWN_IN_MIN} minutes ago`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message: `agent ${request.params.agentId} was upgraded less than ${AGENT_UPGRADE_COOLDOWN_IN_MIN} minutes ago`, | |
message: `agent ${request.params.agentId} was upgraded less than ${AGENT_UPGRADE_COOLDOWN_IN_MIN} minutes ago, confirming the upgrade will not roll back`, |
I think we need to explain why we have this cooldown or it will just lead to questions. Possibly the explanation should link directly back to our documentation.
Would it also be possible to have the response contain how long the user needs to wait until they can try again? Otherwise they will just be spamming these requests until one of them goes through.
It could be more correct to return a 429 with a Retry-After header in the response: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some basic text for now to explain the cooldown.
I've added elastic/ingest-docs#605 to track the docs for this, and will create a follow-up issue once this PR is merged to capture the need to come back and add a docs link to this text.
I've also updated the API to return a 429 Too Many Requests
and to include a time string in the format XXmYYs
to indicate how long to wait for a retry as well as a retry-after
header with the value in seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also give you a gold medal #168606 (review)
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
Looks like there's some FTR failures related to the cooldown change, as we have tests that upgrade an agent quickly. Will take a look soon. |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Summary Closes elastic#168233 This PR adds a check based on the `agent.upgraded_at` field and the time a request to upgrade the issue. If the request is issued sooner than 10 minutes after the last upgrade, it is rejected, even if `force: true` is passed: - `POST agents/{agentId}/upgrade` will fail with 400 - agents included in `POST agents/bulk_upgrade` will not be upgraded ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kyle Pollich <kyle.pollich@elastic.co> (cherry picked from commit 4fffedd)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…169295) # Backport This will backport the following commits from `main` to `8.11`: - [[Fleet] Enforce 10 min cooldown for agent upgrade (#168606)](#168606) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Jill Guyonnet","email":"jill.guyonnet@elastic.co"},"sourceCommit":{"committedDate":"2023-10-18T18:34:33Z","message":"[Fleet] Enforce 10 min cooldown for agent upgrade (#168606)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/kibana/issues/168233\r\n\r\nThis PR adds a check based on the `agent.upgraded_at` field and the time\r\na request to upgrade the issue. If the request is issued sooner than 10\r\nminutes after the last upgrade, it is rejected, even if `force: true` is\r\npassed:\r\n- `POST agents/{agentId}/upgrade` will fail with 400\r\n- agents included in `POST agents/bulk_upgrade` will not be upgraded\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by: Kyle Pollich <kyle.pollich@elastic.co>","sha":"4fffedd4bb46922ef8355241318cf0db80e4c9f5","branchLabelMapping":{"^v8.12.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","backport:prev-minor","v8.12.0"],"number":168606,"url":"https://github.com/elastic/kibana/pull/168606","mergeCommit":{"message":"[Fleet] Enforce 10 min cooldown for agent upgrade (#168606)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/kibana/issues/168233\r\n\r\nThis PR adds a check based on the `agent.upgraded_at` field and the time\r\na request to upgrade the issue. If the request is issued sooner than 10\r\nminutes after the last upgrade, it is rejected, even if `force: true` is\r\npassed:\r\n- `POST agents/{agentId}/upgrade` will fail with 400\r\n- agents included in `POST agents/bulk_upgrade` will not be upgraded\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by: Kyle Pollich <kyle.pollich@elastic.co>","sha":"4fffedd4bb46922ef8355241318cf0db80e4c9f5"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.12.0","labelRegex":"^v8.12.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/168606","number":168606,"mergeCommit":{"message":"[Fleet] Enforce 10 min cooldown for agent upgrade (#168606)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/kibana/issues/168233\r\n\r\nThis PR adds a check based on the `agent.upgraded_at` field and the time\r\na request to upgrade the issue. If the request is issued sooner than 10\r\nminutes after the last upgrade, it is rejected, even if `force: true` is\r\npassed:\r\n- `POST agents/{agentId}/upgrade` will fail with 400\r\n- agents included in `POST agents/bulk_upgrade` will not be upgraded\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>\r\nCo-authored-by: Kyle Pollich <kyle.pollich@elastic.co>","sha":"4fffedd4bb46922ef8355241318cf0db80e4c9f5"}}]}] BACKPORT--> Co-authored-by: Jill Guyonnet <jill.guyonnet@elastic.co>
Summary
Closes #168233
This PR adds a check based on the
agent.upgraded_at
field and the time a request to upgrade the issue. If the request is issued sooner than 10 minutes after the last upgrade, it is rejected, even ifforce: true
is passed:POST agents/{agentId}/upgrade
will fail with 400POST agents/bulk_upgrade
will not be upgradedChecklist