Avoid orphaned objects on delete #654

FxKu · 2019-08-21T10:07:23Z

If a cluster is deleted immediately after creation, fields of the cluster struct are empty when processing the DELETE event. The same might happen when a cluster doesn't get into the Running state at all.

fixes #551

sdudoladov · 2019-08-21T11:13:14Z

#218 is a totally different issue caused by an early return when an error happens during deletion
I'd suggest handling it in a separate PR

FxKu · 2019-08-21T12:00:47Z

Turns out that #218 has been resolved (partly) by PR #295. Updated the description.

sdudoladov · 2019-08-23T14:23:29Z

@FxKu

can you post the content of the cl var returned at line 177 for that case of quick deletion?
will it work if you explicitly set

cl.Name = clusterName.Name
cl.Namespace = clusterName.Namespace

in addCluster ?

Looks like line 179 reads inconsistent state of cl because it does not acquire the relevant mutex (the Create() does)

sdudoladov · 2019-08-26T15:22:05Z

After some experiments I think the synchronization of CREATE/ DELETE processing is correct. It is the setSpec() method that causes the problem. When the Patch call fails, newspec contains zero values of respective types and thus sets certain parts of the Postgres spec to empty values.

The following sequence of events leads to that situation:

Operator starts creating a clusters; that takes time
A DELETE event happens during the processing of CREATE . At that point k8s has already deleted the manifest.
Updating the cluster status in the (no longer existing) manifest as the very last step of CREATE fails, causing setSpec() to nullify the Postgres spec.
Operator attempts to process DELETE but is unable to find certain objects for deletion because Name and Namespace at the spec are empty strings

The practical consequence of the bug is that certain resources - for example pods - stay around forever, preventing rolling updates or creation of a cluster with the same name.

Deletions during Update and Sync events should have the same behaviour.

@FxKu the solution I found is to add return at the end of the error processing code of setSpec()

FxKu · 2019-08-26T16:12:28Z

Thanks @sdudoladov for finding the source of this problem. Changed the code accordingly.

Btw. the problem also occurred when the cluster failed to get into a running state (e.g. by specifying non-existing Docker image) and a user then tries to delete the cluster. So, you're probably right with the Sync and Update.

FxKu · 2019-08-27T09:46:26Z

👍

sdudoladov · 2019-08-27T10:52:11Z

👍

set cluster name if delete happens too soon

c7fc4bf

FxKu requested review from avaczi, CyberDem0n, erthalion, Jan-M, RafiaSabih and sdudoladov as code owners August 21, 2019 10:07

fix source of problem and some typos

451fe44

sdudoladov merged commit 4a863d2 into master Aug 27, 2019

FxKu mentioned this pull request Aug 27, 2019

Incomplete deletion if resources upon delete postgresql #218

Closed

FxKu added this to the v1.3 milestone Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid orphaned objects on delete #654

Avoid orphaned objects on delete #654

FxKu commented Aug 21, 2019 •

edited

Loading

sdudoladov commented Aug 21, 2019 •

edited

Loading

FxKu commented Aug 21, 2019

sdudoladov commented Aug 23, 2019 •

edited

Loading

sdudoladov commented Aug 26, 2019 •

edited

Loading

FxKu commented Aug 26, 2019

FxKu commented Aug 27, 2019

sdudoladov commented Aug 27, 2019

Avoid orphaned objects on delete #654

Avoid orphaned objects on delete #654

Conversation

FxKu commented Aug 21, 2019 • edited Loading

sdudoladov commented Aug 21, 2019 • edited Loading

FxKu commented Aug 21, 2019

sdudoladov commented Aug 23, 2019 • edited Loading

sdudoladov commented Aug 26, 2019 • edited Loading

FxKu commented Aug 26, 2019

FxKu commented Aug 27, 2019

sdudoladov commented Aug 27, 2019

FxKu commented Aug 21, 2019 •

edited

Loading

sdudoladov commented Aug 21, 2019 •

edited

Loading

sdudoladov commented Aug 23, 2019 •

edited

Loading

sdudoladov commented Aug 26, 2019 •

edited

Loading