Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #4318 - k0s reset should not delete all when drives fail to unmount #5132

Closed
wants to merge 1 commit into from

Conversation

MichaelDausmann
Copy link

Description

Although the issue is closed, #4318 is a serious problem and a critical design flaw IMO. This behaviour recently caused a serious data loss incident at my organisation where a k0s reset command was executed prior to un-mounting critical data as a part of a large and complex release.

If we did not have backup policies in place, this would have been an existential problem for our organisation, as it was, it caused a week of headaches and a large Cloud Egress bill for restoring deleted data.

There is just no way that silent deletion of data is an acceptable behaviour for any software that is intended for enterprise use.

Fixes #4318

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

I was able to run the unit test suite with no issue but I'm not sure how to really set up a local env and test this properly.

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings - 2 new errors for the reset command
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@MichaelDausmann MichaelDausmann requested review from a team as code owners October 22, 2024 01:39
@@ -45,19 +45,29 @@ func (d *directories) Run() error {
}

var dataDirMounted bool
allKubeletsUnmounted := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like it if the logic was flipped so that the two bools work the same way.

So, something like:

var kubeletDirsMounted bool

Then set it to true if unmount fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dataDirMounted variable only appears to control logging, not functional.

Copy link
Contributor

This pull request has merge conflicts that need to be resolved.

@MichaelDausmann
Copy link
Author

MichaelDausmann commented Dec 5, 2024

Closing. Superseded by the excellent work of @ncopa in #5193 and @twz123 in #5187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k0s reset deleted all data on persistent volumes
2 participants