-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet UI] Forbid Agent from being upgraded if upgrade is currently in progress #168171
Comments
Pinging @elastic/fleet (Team:Fleet) |
@ycombinator Fleet UI/API considers upgrade ready when |
There is a possible high impact problem here even for older versions of the agent that we can't change. We start a process called the upgrade watcher that supervises the new version of the agent for 10 minutes after an upgrade. If another upgrade is attempted in this 10 minute window the upgrade watcher will interpret the restart that happens during this second upgrade as a crash and try to roll back the agent version. A roll back that occurs while another upgrade is in progress can have unpredictable results, the worst outcome is a broken agent installation. The safest thing to do is to for versions of the agent that don't report upgrade states is to forbid another upgrade attempt until 10 minutes have elapsed from the previous upgrade attempt. I am also starting to believe the default 10 minute upgrade watcher period is probably too long. Anything that is going to go wrong in the upgrade process is likely to happen much sooner than that, but this isn't a problem we can fix with Fleet. Edit: added note that the worst case outcome without doing this is a broken agent installation. |
Let's add a 10 minute cool down for all versions, and leave this issue as is to optimize for versions that report detailed upgrade states. I created #168233 for the cool down implementation. |
Quick clarification on the requirement:
Is the |
It's a good clarification, thank you for raising it. I had forgotten (from the RFC) that we do want to allow a user to retry a failed upgrade (i.e. when As of now there are no plans to automatically clean up the |
## Summary Closes #168171 Prevent upgrading an agent if it is already upgrading. Agents that report upgrade details are considered as upgrading when the `upgrade_details` field exists and is not equal to `UPG_FAILED` (cf. [this comment](#168171 (comment))). Agents that do _not_ report upgrade details are considered as upgrading when the `upgrade_started_at` field is set and the `upgraded_at` field is not. NB: this is existing behaviour, this PR does not change this logic. ### Checklist - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Today, neither the Fleet UI nor the Fleet API will allow the user to initiate the second upgrade if the first one is deemed to be still in progress. The UI grays out the "Upgrade agent" link and the API returns a 400 Bad Request response saying the agent is not upgradeable.
Side bar: The Upgrade Watcher is a process that the old (as in, pre-upgrade) version of Agent spawns right before it starts up the new (as in, post-upgrade) version of Agent. The job of the Upgrade Watcher is to monitor the health of the new Agent and, if necessary, rollback to the old Agent.
I have noticed that Fleet considers an upgrade as no longer in progress even while the Upgrade Watcher from that upgrade is still running. In other words, Fleet today will allow a user to request a second upgrade even while the Upgrade Watcher from the first upgrade is still running.
Going forward, however, once elastic/elastic-agent#3119 is resolved, Agent will start sending Fleet Server details about an ongoing upgrade via the
upgrade_details
field in check-in API requests. This field will contain, amongst other information, the current state of the upgrade process, with one of the states beingUPG_WATCHING
. This state means the Upgrade Watcher is running.As long as the
upgrade_details
field is present in check-in API requests, Fleet should consider an upgrade to be in progress and should not allow a second upgrade to be requested.For more background on this enhancement request, see elastic/elastic-agent#2706 (comment)
The text was updated successfully, but these errors were encountered: