Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image pruning fixes #16717

Merged
merged 3 commits into from
Oct 10, 2017

Conversation

legionus
Copy link
Contributor

@legionus legionus commented Oct 6, 2017

No description provided.

@legionus legionus requested review from dmage and miminar October 6, 2017 09:28
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 6, 2017
@legionus legionus requested a review from bparees October 6, 2017 09:32
@0xmichalis
Copy link
Contributor

/retest

@@ -208,7 +210,7 @@ var _ Pruner = &pruner{}
//
// Also automatically remove any image layer that is no longer referenced by any
// images.
func NewPruner(options PrunerOptions) Pruner {
func NewPruner(options PrunerOptions) (Pruner, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc

@@ -527,29 +534,31 @@ func addBuildStrategyImageReferencesToGraph(g graph.Graph, strategy buildapi.Bui
_, id, err := imageapi.ParseImageStreamImageName(from.Name)
if err != nil {
glog.V(2).Infof("Error parsing ImageStreamImage name %q: %v - skipping", from.Name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipping - this is not true anymore

ids := make(graph.NodeSet)

for _, imageNode := range imageNodes {
glog.V(4).Infof("Examining image %q", imageNode.Image.Name)

if imageIsPrunable(g, imageNode, algorithm) {
glog.V(4).Infof("Image %q is prunable", imageNode.Image.Name)
prunable = append(prunable, imageNode)
if _, ok := prunable[imageNode.Image.Name]; !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this condition be false?

streamName := getName(streamNode.ImageStream)

if _, ok := prunableStreams[streamName]; !ok {
prunableStreams[streamName] = streamNode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this unconditionally

}

for streamName, streamNode := range prunableStreams {
glog.V(4).Infof("Checking if ImageStream %s", streamName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking if what?

errs = append(errs, pruneImageComponents(p.g, p.registryClient, p.registryURL, prunableComponents, layerLinkPruner)...)
errs = append(errs, pruneBlobs(p.g, p.registryClient, p.registryURL, prunableComponents, blobPruner)...)
errs = append(errs, pruneManifests(p.g, p.registryClient, p.registryURL, prunableImageNodes, manifestPruner)...)

if len(errs) > 0 {
// If we had any errors removing image references from image streams or deleting
// If we had any errors removing image references from deleting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing image references from deleting?

glog.V(4).Infof("Checking tag event %d with image %q", i, tagEvent.Image)

if _, ok := imageNodes[tagEvent.Image]; !ok {
glog.V(4).Infof("Tag event doesn't match deleted image - keeping")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code will produce MUCH better logs, but I still don't like them :-) They are not greppable.

glog.V(4).Infof("Checking image stream tag %s:%s: event %d with image %q - keeping, doesn't match deleted image", streamName, tag, i, tagEvent.Image)
glog.V(4).Infof("Checking image stream tag %s:%s: event %d with image %q - keeping, within the allowed range of revisions", streamName, tag, i, tagEvent.Image)
glog.V(4).Infof("Checking image stream tag %s:%s: event %d with image %q - removing, matches deleted image", streamName, tag, i, tagEvent.Image)

@legionus legionus changed the title Image pruning fixes [WIP] Image pruning fixes Oct 6, 2017
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2017
@legionus legionus force-pushed the image-pruning-fixes branch 2 times, most recently from 750b3ef to f6baf00 Compare October 6, 2017 12:24
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
@legionus
Copy link
Contributor Author

legionus commented Oct 6, 2017

@mfojtik @dmage comments addressed

@legionus legionus changed the title [WIP] Image pruning fixes Image pruning fixes Oct 6, 2017
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2017
// delete any images. This way, you can rerun prune and retry things that failed.
// If we had any errors deleting layers, blobs, or manifest data from the registry,
// stop here and don't delete any images. This way, you can rerun prune and retry
// things that failed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to get smarter about this and still prune the images for which we successfully pruned all the layers/blobs/manifests.

}

return updatedStream, err
}

func (p *describingImageStreamDeleter) NotifyImageStreamPrune(stream *imageapi.ImageStream, updatedTags []string, deletedTags []string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees done

@bparees
Copy link
Contributor

bparees commented Oct 6, 2017

@legionus is this tied to a bug/issue?

}

updatedStream, err := streamPruner.UpdateImageStream(stream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should know if we actually changed the stream or not, can't we be smarter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, on the previous step we've selected all image stream that should be modified. If the stream is not changed here, it means someone else concurrently removes objects from the image stream.

}
imageID = id
case "DockerImage":
ref, err := imageapi.ParseDockerImageReference(from.Name)
if err != nil {
glog.V(2).Infof("Error parsing DockerImage name %q: %v - skipping", from.Name, err)
return
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone manages to create build config with corrupted image reference, it will prevent the pruner from running. Is this really such an important error?

If it is, let's add some flag (e.g. --force) marching over small errors, crippled references and dead men.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar You can not create such strategy:

# buildconfigs "cakephp-ex" was not valid:
# * spec.strategy.sourceStrategy.from.name: Invalid value: "foo:bar:baz": not a valid Docker pull specification: invalid reference format

Also even if I add such a option, then it will need to specify this option constantly until broken strategy is removed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the prior docker/distribution rebases made parsing of references stricter. I can easily imagine really old build strategy with a relaxed reference value. I can also imagine this parser becoming even stricter.

Also even if I add such a option, then it will need to specify this option constantly until broken strategy is removed.

If the error is printed, the admin will have the opportunity to act and tune/delete the object. Make sure to include namespace/name of the build config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

func calculatePrunableImages(
g graph.Graph,
imageNodes []*imagegraph.ImageNode,
algorithm pruneAlgorithm) (map[string]*imagegraph.ImageNode, graph.NodeSet) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

func calculatePrunableImages(
    ...
    algorithm pruneAlgorithm,
) (map[string]*imagegraph.ImageNode, graph.NodeSet) {
    prunable := make(map[string]*imagegraph.ImageNode)

makes it clearer for eyes where arguments stop and code starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

glog.V(4).Infof("Removing tag %q from status.tags of ImageStream %s", tag, getName(stream))
delete(stream.Status.Tags, tag)
if _, ok := imageNodes[tagEvent.Image]; !ok {
glog.V(4).Infof("Image stream tag %s:%s revision %d - keeping because image %q is not deleted", streamName, tag, i, tagEvent.Image)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please verify that the image indeed exists (it is contained in the full image group)? And remove it if not (and the tag event is old enough).

Deleted images that still exist in imagestreamtags polute the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

Copy link
Contributor Author

@legionus legionus Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar No, we can't do it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can't do it.

Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar What you suggested cannot be implemented in this place because if the user adds a new image, we'll delete as non-existent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar I mean that we have to be tolerant to unknown images.

@@ -75,7 +75,9 @@ type ImageStreamDeleter interface {
GetImageStream(stream *imageapi.ImageStream) (*imageapi.ImageStream, error)
// UpdateImageStream removes all references to the image from the image
// stream's status.tags. The updated image stream is returned.
UpdateImageStream(stream *imageapi.ImageStream, image *imageapi.Image, updatedTags []string) (*imageapi.ImageStream, error)
UpdateImageStream(stream *imageapi.ImageStream) (*imageapi.ImageStream, error)
// NotifyImageStreamPrune shows notification about image stream was updated.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about updated image stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

@@ -775,6 +775,13 @@ func (c *CleanUpContainer) Run() {
}
}

out, err := c.OC.SetNamespace(metav1.NamespaceDefault).AsAdmin().
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a commentary why is this necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a TODO to replace this with removals of individual blobs used in the test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

@bparees
Copy link
Contributor

bparees commented Oct 6, 2017

/unassign @fabianofranz
/assign

@legionus legionus force-pushed the image-pruning-fixes branch 2 times, most recently from 3e6fea9 to 7ab0ea0 Compare October 9, 2017 13:26
@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

is this tied to a bug/issue?

@bparees No.

Copy link
Member

@dmage dmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

if err != nil {
// During prune someone removed imagestream
if kerrapi.IsNotFound(err) {
return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it's worth of 4th info level at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

streamNode.ImageStream = updatedStream
}

return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paranoid mode on: another NotFound could occur here as well :paranoid mode off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miminar Done

@legionus legionus force-pushed the image-pruning-fixes branch 2 times, most recently from e2523a6 to c43af16 Compare October 9, 2017 16:42
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
Signed-off-by: Gladkov Alexey <agladkov@redhat.com>
}
imageID = id
case "DockerImage":
ref, err := imageapi.ParseDockerImageReference(from.Name)
if err != nil {
glog.V(2).Infof("Error parsing DockerImage name %q: %v - skipping", from.Name, err)
return
return fmt.Errorf("error parsing DockerImage name %q: %v", from.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw agreement to make this a warning, not fail the entire prune process? That would be my inclination.

Same for the ImageStreamImage check above.

If the reference is invalid, it clearly is not going to reference an image and keep us from pruning that image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw agreement to make this a warning, not fail the entire prune process? That would be my inclination.

@bparees Yes, but my answer was that you can't create such reference.

If the reference is invalid, it clearly is not going to reference an image and keep us from pruning that image.

You can not be sure that this Strategy is not going to reference an image. You just don't know. I believe that in the conditions when we are not sure we need to save user data. In the end the user will see the error and can decide to remove this build/buildconfig or fix reference.

#16717 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees Yes, but my answer was that you can't create such reference.

I think relying on that guarantee is a mistake. Bugs in the buildconfig validation could allow the creation of such a reference and then pruning is broken for no good reason.

You can not be sure that this Strategy is not going to reference an image.

I don't care what it may do in the future. I know it's not currently referencing an image. What the user may update it to reference in the future when they fix their typo is irrelevant. It may have tried to reference an image, but it is not referencing an image (it has an invalid reference, so it's not referencing anything).

I believe that in the conditions when we are not sure we need to save user data.

What are we saving? Can you give me a scenario where a user manages to create an image that we'd prune here despite an invalid reference existing in the system, that would ultimately be a problem?

If your argument is "the user meant to reference something but they did it wrong so we have to save everything until we know what they meant to reference", well what if they meant to reference "ruby" and they typed it as "riby"? It's a "valid" reference, so it won't trip this error, but the "ruby" image is going to get cleaned up in that case too. There's only so much protection we can offer.

In the end the user will see the error and can decide to remove this build/buildconfig or fix reference.

The admin running this is not the same as the user who created the broken reference. So your solution requires the admin to either go edit the user's object themselves, or track down the user and discuss how to resolve it. Neither of those are ideal situations for an admin who's just trying to free up space (not to mention that this is frequently run as an automated job that may or may not have good monitoring in place).

In your referenced comment thread you said "done" but it's not obvious to me what that means. You didn't add the suggested "force" flag and you didn't make this a warning. (And to be clear, I prefer the latter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on that guarantee is a mistake. Bugs in the buildconfig validation could allow the creation of such a reference and then pruning is broken for no good reason.

@bparees I'll change it, but it's your decision.

In your referenced comment thread you said "done" but it's not obvious to me what that means

My done was about:

If the error is printed, the admin will have the opportunity to act and tune/delete the object. Make sure to include namespace/name of the build config.

and I added the ability to see the name of the object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it, but it's your decision.

I asked a pretty direct question:

What are we saving? Can you give me a scenario where a user manages to create an image that we'd prune here despite an invalid reference existing in the system, that would ultimately be a problem?

You seem very concerned about something here, but I haven't grasped what it is. Can you help me understand the scenario you are envisioning where proceeding with pruning is a mistake and why it is a mistake?

I am trying to make an educated decision.... so far I am not seeing the risk in proceeding with pruning in this situation and i'd like you to help me be confident that I am not overlooking something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a pretty direct question:

@bparees At first I did not understand correctly. Sorry, of course I will try to explain my point of view.

What are we saving?

Since this operation can not be canceled (we send requests to the registry to delete blobs/manifests), then we need to be very careful with the calculation of references. Otherwise, we will permanently delete objects that the user did not want to lose.

If we have references that we can not parse, it means that there is something wrong in the system (incomplete upgrade or something else). In this case, it is not the best time to perform actions that can not be rolled back.

Ignoring "broken" Build/BuildConfig will lead us to the image pruning if there are no other references in the cluster.

Can you give me a scenario where a user manages to create an image that we'd prune here despite an invalid reference existing in the system, that would ultimately be a problem?

Imagine the situation when in the new version of origin we updated the github.com/openshift/origin/pkg/image/reference in such a way (it can be corner cases) that 10% of the old references stopped parsing. If you ignore the reference parsing error, you will lose 10% of the links in the graph. This will lead to a loss of actual data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine the situation when in the new version of origin we updated the github.com/openshift/origin/pkg/image/reference in such a way (it can be corner cases) that 10% of the old references stopped parsing. If you ignore the reference parsing error, you will lose 10% of the links in the graph. This will lead to a loss of actual data.

Thanks, that is a compelling argument.

Do we at least have enough information to add the namespace containing the bad buildconfig reference? Right now i'm not convinced we're printing enough information to help an admin go find the bad reference and fix it. They essentially would have to dump all the buildconfigs in all the namespaces and then grep for the string. We should do more to help point them directly to where the problem is, in these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we at least have enough information to add the namespace containing the bad buildconfig reference?

@bparees Right now, I output an error to the user and specifying <namespace>/<name> of the Build/BuildConfig. Is this information enough ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i hadn't seen that you were catching this error and wrapping it w/ additional information. I see it now.

@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

@bparees Also, I should note that pruneStreams has an unsolved bug: If you specify the --keep-tag-revisions=0 option, then this code can remove the tag completely, but the prune process does not count tag references process does not take into account the references to the tags and it can break Builds/BuildConfigs if they have references to these tags.

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

@bparees Also, I should note that pruneStreams has an unsolved bug: If you specify the --keep-tag-revisions=0 option, then this code can remove the tag completely

Hm. I suppose you're getting what you asked for if you do that but I agree it's likely to surprise people.

@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

I suppose you're getting what you asked for.

@bparees Obviously, the user did not ask us to turn the cluster into a non-consistent state. From our documentation:

In order to prune images that are no longer required by the system due to age, status, or exceed limits, administrators may run the following command:
$ oadm prune images [<options>]

and

  1. Specify --keep-tag-revisions=0 with the oadm prune images command.

This action will effectively remove all the tags from all the namespaces with underlying images, unless they are younger or they are referenced by objects younger than the specified threshold.

So we do not have to delete something that is still required by the system.

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

@bparees Obviously, the user did not ask us to turn the cluster into a non-consistent state.

but they did ask us to remove all the tags from the system, so long as the associated images are older than a certain age.

So we do not have to delete something that is still required by the system.

sorry, I can't follow this statement. What behavior are you proposing would be correct for "--keep-tag-revisions=0"?

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 9, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, dmage, legionus

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2017
@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

sorry, I can't follow this statement. What behavior are you proposing would be correct for "--keep-tag-revisions=0"?

@bparees Let's consider the situation:

We have two Images:

  • Image sha256:image1 was created month ago;
  • Image sha256:image2 was created 5 minutes ago;

ImageStream myproject/foo (was created month ago) has two tags:

  • Tag v1 that points to Image sha256:image1;
  • Tag v2 that points to Image sha256:image2;

One running Pod that points to the Image sha256:image2.
One Build that points to ImageStreamTag myproject/foo:v1.

Now let's run the command:
oc adm prune images --keep-tag-revisions=3 --keep-younger-than=60m

We will make the graph and add the dependences:

So according to the graph, we can remove the Image sha256:image1 because there is no reference to it and it is older than necessary. And since it was the only one in the tag, we remove the v1 tag from ImageStream myproject/foo.

Now we have a Build that points to myproject/foo:v1 tag that no longer exists.

This happened because we did not count the reference to the v1 tag itself when it was deleted. If we did this, we would not be able to remove it and therefore we could not remove the last image from this tag.

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

Build will not give us strong references too because it points to ImageStreamTag

hm. It seems like for objects that have a reference to an imagestreamtag, we should be resolving what image that imagestreamtag currently points to, and counting that as a hard reference from the object(buildconfig in this case) to the image. Is there a reason we did not do that?

@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

Is there a reason we did not do that?

@bparees The tag may change during pruning. Since Build points to the tag we must protect this tag from pruning. Otherwise, if the tag changes, we will remove it anyway. Imagine the situation that after the start of the pruning process, someone updated the tag to something that we want to remove.

<offtopic> In the current conditions, the pruning can not be done correctly. There will always be the possibility of a race condition.</offtopic>

Moreover, this process with a certain probability can damage the registry storage (if you use S3). I mean, if you're pushing data at the same time, there's a chance to get broken images.

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

@bparees The tag may change during pruning. Since Build points to the tag we must protect this tag from pruning. Otherwise, if the tag changes, we will remove it anyway. Imagine the situation that after the start of the pruning process, someone updated the tag to something that we want to remove.

so do we protect it or not? you said Also, I should note that pruneStreams has an unsolved bug: If you specify the --keep-tag-revisions=0, then this code can remove the tag completely

As you say, we're never going to solve 100% of potential race conditions since we don't have a transactional lock on the graph. Someone could also update their buildconfig to point to a tag or image we already selected for pruning, in the middle of a pruning operation.

I think (if only for my education) we need to have a level set conversation on what the pruning algorithm is and why. It seems like there is some inconsistency as to whether the primary concern is imagestreamtags, or images.

@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

btw you have the same problem with a DeploymentConfig that is not currently deployed, I would imagine. You have no hard reference to an image, just a reference from the DeploymentConfig to an ImageStreamTag.

Which again makes me think we should be resolving the reference to an image and treating that as a strong reference between the image and the deploymentconfig(or buildconfig).

If someone updates the reference it mid-pruning you're still not any worse off than you are today where you ignore the reference entirely.

@legionus
Copy link
Contributor Author

legionus commented Oct 9, 2017

/retest

1 similar comment
@legionus
Copy link
Contributor Author

/retest

@legionus
Copy link
Contributor Author

so do we protect it or not?

@bparees No. You can see it here.

As you say, we're never going to solve 100% of potential race conditions since we don't have a transactional lock on the graph.

We can change the conditions and make it possible. As a crazy idea: Mover the pruning process to the master and block (not reject) write requests while the pruning is in progress. In this case, we guarantee the immutability of the state. Also, we can move blobs/manifests pruning to the registry (we already have hard pruning), rather than sending millions of delete requests, which are rather slow.

I think (if only for my education) we need to have a level set conversation on what the pruning algorithm is and why.

@bparees I did not know anything about image pruning process a weeks ago )))
So I'm in the same position as you.

btw you have the same problem with a DeploymentConfig that is not currently deployed, I would imagine. You have no hard reference to an image, just a reference from the DeploymentConfig to an ImageStreamTag.

Sure. I do not know of all the places where there can be such tag references, so I listed only builds and buildconfigs. But of course there are certainly more of them.

If someone updates the reference it mid-pruning you're still not any worse off than you are today where you ignore the reference entirely.

Sure. It's better to take into account these references than to ignore them completely.

Honestly, I do not understand why the graph is used to prune images. There is enough map of images with a reference counter. This will work well only if we are sure that the state of the cluster can not change, but since we ignore this fact, we can not take it into account.

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 10, 2017

@legionus: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/origin/unit 53f1fdb link /test origin-ut
ci/openshift-jenkins/experimental/integration eeb7388 link /test origin-it
ci/openshift-jenkins/integration eeb7388 link /test integration
ci/openshift-jenkins/unit eeb7388 link /test unit
ci/openshift-jenkins/extended_networking_minimal eeb7388 link /test extended_networking_minimal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 16717, 16759, 16756).

@openshift-merge-robot openshift-merge-robot merged commit 4fc5ca3 into openshift:master Oct 10, 2017
rdoproject pushed a commit to rdo-infra/rdo-container-registry that referenced this pull request Nov 6, 2017
This commits allows to deploy the RDO registry with OpenShift 3.7.

It currently uses the "rdo-test" branch because there is still one unmerged
pull request upstream that hasn't merged yet upstream.

Delta from OpenShift 3.5 that is interesting to us:

- Significant improvements for registry and image pruning see [1][2][3][4]
- docker-registry can now use reencrypt routes [5]
- Metrics and logging were deployed by default in 3.5, this is no longer the
  case in 3.7, avoiding an unnecessary impact on performance. [6]
- We're now deploying a persistent volume for the docker-registry service on
  the local filesystem.

[1]: openshift/origin#13671
[2]: openshift/origin#16717
[3]: openshift/origin#16656
[4]: openshift/origin#17020
[5]: openshift/origin#14249
[6]: openshift/openshift-ansible@660bafe

Change-Id: I5c364a1aab883b6af061051bf190ce857bf2e1f9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants