Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid orphaned objects on delete #654

Merged
merged 2 commits into from
Aug 27, 2019
Merged

Avoid orphaned objects on delete #654

merged 2 commits into from
Aug 27, 2019

Conversation

FxKu
Copy link
Member

@FxKu FxKu commented Aug 21, 2019

If a cluster is deleted immediately after creation, fields of the cluster struct are empty when processing the DELETE event. The same might happen when a cluster doesn't get into the Running state at all.

fixes #551

@sdudoladov
Copy link
Member

sdudoladov commented Aug 21, 2019

#218 is a totally different issue caused by an early return when an error happens during deletion
I'd suggest handling it in a separate PR

@FxKu
Copy link
Member Author

FxKu commented Aug 21, 2019

Turns out that #218 has been resolved (partly) by PR #295. Updated the description.

@sdudoladov
Copy link
Member

sdudoladov commented Aug 23, 2019

@FxKu

  1. can you post the content of the cl var returned at line 177 for that case of quick deletion?
  2. will it work if you explicitly set
cl.Name = clusterName.Name
cl.Namespace = clusterName.Namespace

in addCluster ?

Looks like line 179 reads inconsistent state of cl because it does not acquire the relevant mutex (the Create() does)

@sdudoladov
Copy link
Member

sdudoladov commented Aug 26, 2019

After some experiments I think the synchronization of CREATE/ DELETE processing is correct. It is the setSpec() method that causes the problem. When the Patch call fails, newspec contains zero values of respective types and thus sets certain parts of the Postgres spec to empty values.

The following sequence of events leads to that situation:

  1. Operator starts creating a clusters; that takes time
  2. A DELETE event happens during the processing of CREATE . At that point k8s has already deleted the manifest.
  3. Updating the cluster status in the (no longer existing) manifest as the very last step of CREATE fails, causing setSpec() to nullify the Postgres spec.
  4. Operator attempts to process DELETE but is unable to find certain objects for deletion because Name and Namespace at the spec are empty strings

The practical consequence of the bug is that certain resources - for example pods - stay around forever, preventing rolling updates or creation of a cluster with the same name.

Deletions during Update and Sync events should have the same behaviour.

@FxKu the solution I found is to add return at the end of the error processing code of setSpec()

@FxKu
Copy link
Member Author

FxKu commented Aug 26, 2019

Thanks @sdudoladov for finding the source of this problem. Changed the code accordingly.

Btw. the problem also occurred when the cluster failed to get into a running state (e.g. by specifying non-existing Docker image) and a user then tries to delete the cluster. So, you're probably right with the Sync and Update.

@FxKu
Copy link
Member Author

FxKu commented Aug 27, 2019

👍

1 similar comment
@sdudoladov
Copy link
Member

👍

@sdudoladov sdudoladov merged commit 4a863d2 into master Aug 27, 2019
@FxKu FxKu added this to the v1.3 milestone Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

quick create-delete leaves orphaned objects
2 participants