Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image-pruner: prune images in their own jobs #19468

Merged
merged 3 commits into from
Jun 22, 2018

Conversation

miminar
Copy link

@miminar miminar commented Apr 23, 2018

Instead of pruning in phases:

all streams -> all layers -> all blobs -> manifests -> images

Prune individual images in parallel jobs:

all streams -> parallel [
   image1's layers -> image1's blobs -> ... -> image1,
   image2's layers -> image2's blobs -> ... -> image2,
   ...
]

A failure in streams prune phase is not fatal anymore.

Resolves: rhbz#1567657

Additionally, previously manifest blobs weren't removed from the blob store. This PR removes the manifests of deleted images from blob store as well.

Remarks:

  • parallel execution makes it a bit more complicated
  • in order to make it more legible and reviewable, the parallelism with the handling of blocked images could be moved to a follow-up PR
  • this doesn't aim to make the image pruning 100% safe - it just attempts to make it error tolerable while maintaining as much consistency as possible

TODOs:

  • - pretty print result summary
  • - avoid potential races caused by using shared API clients by intantiating them separately for each worker
  • - resolve races in tests where the counters are shared between go-routines
  • - detect changes to image streams and update graph accordingly on the fly
  • - and image creations/deletions and update graph accordingly

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2018
@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 23, 2018
@miminar
Copy link
Author

miminar commented Apr 23, 2018

Publishing this in early state to gather some feedback.

@dmage, @legionus, @bparees, @coreydaley PTAL

@openshift openshift deleted a comment from michojel Apr 23, 2018
@legionus
Copy link
Contributor

Dynamically updated graph will not help to achieve greater consistency. You made a very comlicate code and it will speed up the work of the prunner, but this does not help with maintaining the integrity of the database. Removing an image is not atomic and in case of an error you will break the database.

In connection, I believe that this approach is not effective. I would suggest a different approach to prunning: mark-and-sweep. At the first iteration we mark objects with an annotation with the date of the planned deletion (for example, during 3 days). At the next iteration, we delete the objects whose label still stands. In this case, the graph is not updated dynamically.

return
}
out <- *w.prune(job)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for job := range in {
  out <- *w.prune(job)
}

In this case you don't need nil sentinels and you need just to close the channel to stop processing.

Copy link
Author

@miminar miminar Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll check that out.

Update: rewritten


return true
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need select there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to handle events either from workers, image streams listener or image listener (which are not there yet).


// UnreferencedImageComponentEdgeKind is an edge from an ImageNode to an ImageComponentNode denoting that
// the component is currently being unreferenced in a running job.
UnreferencedImageComponentEdgeKind = "UnreferencedImageComponentToDelete"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very artificial. Why do we need these "negative" references?

Can we structure the pruner such a way that we remove only nodes which don't have any references?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see now that it makes the alg. more complex without having a value. I wanted to somehow track the blobs being deleted (in the jobs running right now) inside the graph, but it just adds inefficiency and complexity. The tracking can be easily done outside of the graph.
If it weren't for the parallelism, the tracking wouldn't be necessary at all.

Can we structure the pruner such a way that we remove only nodes which don't have any references?

To answer that, we need to first answer https://github.com/openshift/origin/pull/19468/files/99829b95497e6c39d0bdafc4fa00b6f017e23a6e#r183420937. If we continue to stick with the current behaviour of keeping the image if any error for its components happens, then the unreferencing won't happen until the objects are deleted.

But I agree it would be more natural and it would simplify the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on tracking this elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked and simplified a bit.

return
}

func strenghtenReferencesFromFailedImageStreams(g genericgraph.Graph, failures []Failure) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/strenghten/strengthen/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not going to believe it, but my offline dictionary contains both 😄. I cannot find it online though so I'm going to trust you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

p.g.AddEdge(imageStreamNode, s, ReferencedImageManifestEdgeKind)
break
default:
panic(fmt.Sprintf("unhandeled image component type %q", cn.Type))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return this error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't expect this to ever fire. But sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should be "unhandled"

but yeah, i don't think we ever want to panic, even in the cli.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

res := &JobResult{Job: job}

// If namespace is specified prune only ImageStreams and nothing more.
if len(w.algorithm.namespace) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When may this condition become true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the statement was misplaced. It's removed now.

@michojel
Copy link

@legionus I don't see how mark-and-sweep approach makes this safer and simpler. What prevents users from making new references to marked images during the sweep phase?
I know we discussed this before and could not agree on anything 100% bullet proof even when using locks in etcd. And introducing locking mechanism for the sake of safe image pruning will make the subject even more complex.

The aim of this PR is not to make the pruning safer. The purpose is to tolerate errors that prevent customers from pruning anything by doing the pruning iteratively - image after image which brings a vague atomicity to the pruning.

I don't think this change prevents us from making the pruning safer in the future. It does not prevents us from implementing mark-and-sweep, etcd or storage locking. All it does is just introduction of job-based pruning while building on top of what we already have.

@@ -333,6 +356,7 @@ func (p *pruner) addImagesToGraph(images *imageapi.ImageList) []error {
//
// addImageStreamsToGraph also adds references from each stream to all the
// layers it references (via each image a stream references).
// TODO: identify streams with non-existing images for later cleanup
func (p *pruner) addImageStreamsToGraph(streams *imageapi.ImageStreamList, limits map[string][]*kapi.LimitRange) []error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function always returns nil. In case of an error, it panics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Haven't even realized that. I'm more leaning towards the panics for errors not aimed at the end user but a developer (saying, you're code is buggy, go fix it) such as this one. But I don't feel that strong about it, so I'll look into returning it as an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error reported now without a panic.

case *imagegraph.ImageStreamNode:
// ignore
default:
panic(fmt.Sprintf("unhandeled graph node %t", d.Node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just attempts to make it error tolerable ? :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can return the error for sure. This was meant rather as a debug statement until all the types got handled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not panicking any more.


if len(res.Failures) > 0 {
// TODO: include image as a failure as well for the sake of summary's completness
return res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If w.algorithm.pruneRegistry == true then here you will have broken Image. This image will not have blobs, but the registry will assume that these objects are there. This is a very bad condition.

It will be more correct to delete the image and then if there are no errors try to delete the objects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'm keeping the current behaviour that preserves image objects for future prunes so that blobs that failed to be pruned are recognized on the subsequent prunes. If we remove the image, we loose the option to prune its blobs using this kind of pruner.
But since we have a hard-pruner now, we could do the removal. I just hesitate to make it a default behaviour. Maybe a flag --keep-broken-images could be useful here. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scenario speaking against the image deletion regardless of prio failure is that the registry lost write permission to the storage and deletion of each blob fails (already happened to some customers). In this case all images are deleted but all the blobs remain on the storage.
Is it worse than broken images? Hard to tell. It depends on what the customer wants to achieve - either he's running out of storage space or etcd is too big and slow or even both.
Therefor I'm more inclined to having a flag like --keep-broken-images-on-failure defaulting to true (current behaviour) so that the customer can choose what he prefers.

@bparees thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you note, hard-prune can be used to resolve that scenario (blobs being left in storage). I don't think we need to introduce new flags for something that i would hope is not a typical scenario.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just afraid that hard-prune is not a very popular solution to this problem and am reluctant to make it a mandatory post-routine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't be mandatory unless there are other issues in their registry. if we can't delete layers due to storage issues, they're going to have to fix their storage issues anyway... running hard prune again after doing so seems like a small additional burden.

Is there some other case where you think it would be better to leave the image data in place if we cannot remove the blob data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other case where you think it would be better to leave the image data in place if we cannot remove the blob data?

Just a few similar scenarios where the user wants to prune especially storage like: connection issues to the registry or registry does not see the user as authorized to do the prune.

What about keeping the image only if there were no successful blob deletions? So that broken images get pruned, but healthy images can either be reused or pruned next time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case all images are deleted but all the blobs remain on the storage.
Is it worse than broken images?

@miminar No, broken images mush worse. We have few ways how to remove blobs completely or even restore images in the etcd, but we do not have tools to restore blobs in the storage to fix images. We have only diagnostic tool which can help to find such broken images. User should remove image from etcd and re-push such image by hands to fix them. Until he does this, these images will be broken. There will be errors in push and pull. It seems to me that this is a much worse scenario, because the cluster goes into an inconsistent state and the correction requires manual intervention.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping the image only if there were no successful blob deletions?

Went for this one. Please let me know if you have any objections.

return
}

func strenghtenReferencesFromFailedImageStreams(g genericgraph.Graph, failures []Failure) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs godoc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc'd

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 26, 2018
@miminar miminar closed this Apr 26, 2018
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 26, 2018
@miminar
Copy link
Author

miminar commented Apr 26, 2018

Rebased and added imagestream event handling.

@miminar miminar reopened this Apr 26, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 26, 2018
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 26, 2018
@miminar
Copy link
Author

miminar commented Apr 26, 2018

Added image event handling.

@miminar
Copy link
Author

miminar commented May 11, 2018

/retest

@miminar
Copy link
Author

miminar commented May 11, 2018

Ready for review.

g.internal.SetEdge(t, 1.0)
g.internal.SetEdge(t)
}
case simple.Edge:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that before changes this function was able to handle only genericgraph.Edge. Is this case ever executed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concrete.WeightedEdge used to be handled before. simple.Edge is its equivalent. But no, we don't use upstream edges in our code directly. Nothing prevents us from doing that in the future though.

resultChan <-chan JobResult,
) (deletions []Deletion, failures []Failure) {
imgUpdateChan := p.imageWatcher.ResultChan()
isUpdateChan := p.imageStreamWatcher.ResultChan()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of adding these watches? Trying to handle the case that a reference to a layer is added to an image or imagestream while we're in the middle of pruning? It seems unrelated to the fundamental purpose of the PR (to parallelize the image pruning operations)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of adding these watches? Trying to handle the case that a reference to a layer is added to an image or imagestream while we're in the middle of pruning?

exactly

It seems unrelated to the fundamental purpose of the PR

Yes, it's unrelated, but the way the code is structured, it's now easy to add that safety mechanism. From the past we know that pruning can take hours to complete and that's way too big window for changes and inconsistencies to happen.
For the sake of simplicity (or rather lesser complexity), I can extract that and move to a follow-up if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine, just wanted to understand why it was being done here, thanks.

imagegraph "github.com/openshift/origin/pkg/oc/graph/imagegraph/nodes"
)

// ComponentRetention knows all the places where image componenet needs to be pruned (e.g. global blob store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/componenet/component/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@bparees
Copy link
Contributor

bparees commented May 15, 2018

/approve
/hold
(hold until 3.11)

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 15, 2018
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2018
@miminar
Copy link
Author

miminar commented May 30, 2018

Rebased.

/test extended_image_registry

@bparees
Copy link
Contributor

bparees commented Jun 19, 2018

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 19, 2018
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Michal Minář added 3 commits June 20, 2018 09:47
Bumped to the kube level.

Signed-off-by: Michal Minář <miminar@redhat.com>
Signed-off-by: Michal Minář <miminar@redhat.com>
Instead of pruning in phases:

all streams -> all layers -> all blobs -> manifests -> images

Prune individual images in parallel jobs:

all streams -> parallel [
   image1's layers -> image1's blobs -> ... -> image1,
   image2's layers -> image2's blobs -> ... -> image2,
   ...
]

A failure in streams prune phase is not fatal anymore.

Signed-off-by: Michal Minář <miminar@redhat.com>
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2018
@miminar
Copy link
Author

miminar commented Jun 20, 2018

Rebased

@bparees
Copy link
Contributor

bparees commented Jun 20, 2018

@deads2k any concerns w/ the bump commit here?

@bparees
Copy link
Contributor

bparees commented Jun 22, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 22, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, miminar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit dabec00 into openshift:master Jun 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants