Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration flake: TestV2RegistryGetTags #16012

Closed
smarterclayton opened this issue Aug 28, 2017 · 15 comments
Closed

Integration flake: TestV2RegistryGetTags #16012

smarterclayton opened this issue Aug 28, 2017 · 15 comments
Assignees
Labels
component/imageregistry kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@smarterclayton
Copy link
Contributor

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/15624/test_pull_request_origin_integration/6615/#githubcomopenshiftorigintestintegrationrunner-testv2registrygettags

    		I0828 09:50:50.938907   29013 handler.go:150] user.openshift.io-apiserver: GET "/apis/user.openshift.io/v1/users/admin" satisfied by gorestful with webservice /apis/user.openshift.io/v1
    		I0828 09:50:50.939716   29013 wrap.go:42] GET /apis/user.openshift.io/v1/users/admin: (1.267754ms) 200 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:42432]
    		I0828 09:50:50.940014   29013 rbac.go:116] RBAC DENY: user "admin" groups ["system:authenticated:oauth" "system:authenticated"] cannot "get" resource "images.image.openshift.io" named "sha256:8ab0b9dd51671a70959a95dd92b9a44d4af48dca7ae67f67bd36a11eccba9e07" cluster-wide
    		I0828 09:50:50.940165   29013 wrap.go:42] GET /apis/image.openshift.io/v1/images/sha256:8ab0b9dd51671a70959a95dd92b9a44d4af48dca7ae67f67bd36a11eccba9e07: (2.764532ms) 403 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:42572]
    		time="2017-08-28T09:50:50.940408719Z" level=error msg="unable to get image: sha256:8ab0b9dd51671a70959a95dd92b9a44d4af48dca7ae67f67bd36a11eccba9e07" go.version=go1.8.3 http.request.host="127.0.0.1:5000" http.request.id=cb93796f-1d0b-4044-98ba-d981ab217ca4 http.request.method=PUT http.request.remoteaddr="127.0.0.1:39554" http.request.uri="/v2/integration/test/manifests/latest" http.request.useragent="Go-http-client/1.1" instance.id=3d733449-c4a8-48f6-a519-ac9281f3a2f2 openshift.auth.user=admin openshift.auth.userid=5f56ab2b-8bd6-11e7-aed0-0242ac110002 openshift.logger=registry vars.name="integration/test" vars.reference=latest 
    		time="2017-08-28T09:50:50.940623144Z" level=error msg="response completed with error" err.code=unknown err.detail="User \"admin\" cannot get images.image.openshift.io at the cluster scope" err.message="unknown error" go.version=go1.8.3 http.request.host="127.0.0.1:5000" http.request.id=cb93796f-1d0b-4044-98ba-d981ab217ca4 http.request.method=PUT http.request.remoteaddr="127.0.0.1:39554" http.request.uri="/v2/integration/test/manifests/latest" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=51.326717ms http.response.status=500 http.response.written=371 instance.id=3d733449-c4a8-48f6-a519-ac9281f3a2f2 openshift.auth.user=admin openshift.auth.userid=5f56ab2b-8bd6-11e7-aed0-0242ac110002 openshift.logger=registry vars.name="integration/test" vars.reference=latest 
    		127.0.0.1 - - [28/Aug/2017:09:50:50 +0000] "PUT /v2/integration/test/manifests/latest HTTP/1.1" 500 371 "" "Go-http-client/1.1"
    		--- FAIL: TestV2RegistryGetTags (12.95s)
    			v2_docker_registry_test.go:158: unexpected put status code: 500
    			etcd.go:141: dumping etcd to "/tmp/openshift/test-integration/tmp-testv2registrygettags470828217/etcd-dump-TestV2RegistryGetTags-v2.json"

Needs to wait longer

@mfojtik @bparees can someone priority this so it gets fixed soon? Just need to wait longer I think

@smarterclayton smarterclayton added kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Aug 28, 2017
@smarterclayton
Copy link
Contributor Author

Or could be a @deads2k bug in startup logic where the roles aren't being fully created I guess.

@smarterclayton
Copy link
Contributor Author

@deads2k
Copy link
Contributor

deads2k commented Aug 28, 2017

I see failed to get image: User \"admin\" cannot get images.image.openshift.io at the cluster scope", but in that test, the user admin is a project admin and I don't see anything trying add permissions. You're sure that particular 403 is unexpected?

I don't see anywhere in the logic waiting for access on something.

@mfojtik
Copy link
Contributor

mfojtik commented Aug 28, 2017

AFAIK only cluster admins can get images... I saw several docker.io flakes today where we failed to fetch stuff from dockerhub, maybe this one is related.

@bparees
Copy link
Contributor

bparees commented Sep 1, 2017

@smarterclayton @deads2k what was the final assessment here? What I gathered is:

  1. @smarterclayton thinks the rules aren't being setup consistently.
  2. @deads2k thinks nothing is attempting to setup the rules

if (2) is true, this test would presumably never pass, so I don't buy that unless the 403 isn't the root cause of the 500 error that's being returned.

@deads2k
Copy link
Contributor

deads2k commented Sep 1, 2017

@smarterclayton @deads2k what was the final assessment here? What I gathered is:

@smarterclayton thinks the rules aren't being setup consistently.
@deads2k thinks nothing is attempting to setup the rules
if (2) is true, this test would presumably never pass, so I don't buy that unless the 403 isn't the root cause of the 500 error that's being returned.

Since the server reports healthy (eventually), the bootstrap rules were created

@bparees
Copy link
Contributor

bparees commented Sep 1, 2017

Since the server reports healthy (eventually), the bootstrap rules were created

are you saying we're running tests before the server has reported healthy? that seems like a fundamental flaw in our integration test model if true.

@bparees
Copy link
Contributor

bparees commented Sep 12, 2017

The StartTestMasterAPI helper waits for the server to report healthy before proceeding, so the server was healthy, thus the bootstrap roles were created (And since this is a flake, not a permfail, those roles must be correct at least some of the time).

So this still sounds like a flaw in RBAC to me.

https://github.com/openshift/origin/blob/master/test/integration/v2_docker_registry_test.go#L80
https://github.com/openshift/origin/blob/master/test/util/server/server.go#L526
https://github.com/openshift/origin/blob/master/test/util/server/server.go#L467-L474

@deads2k am i missing something?

@deads2k
Copy link
Contributor

deads2k commented Sep 13, 2017

@bparees do you know which calls its responding to? I would expect that in normal operation the docker registry would make SARs or attempt to make calls which would be forbidden. I know that it does for deciding about the provisioning of imagestreams as a for instance. Are you sure that the denials aren't happening in "normal" code paths from the docker registry.

@bparees
Copy link
Contributor

bparees commented Sep 13, 2017

the call is this (get on apis/image.openshift.io/v1/images as an admin user. Looks like perhaps that user is only admin for the project, not the cluster, but again if the permissions are insufficient why wouldn't this always fail?):

I0828 09:50:50.940165   29013 wrap.go:42] GET /apis/image.openshift.io/v1/images/sha256:8ab0b9dd51671a70959a95dd92b9a44d4af48dca7ae67f67bd36a11eccba9e07: (2.764532ms) 403 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:42572]
    		time="2017-08-28T09:50:50.940408719Z" level=error msg="unable to get image: sha256:8ab0b9dd51671a70959a95dd92b9a44d4af48dca7ae67f67bd36a11eccba9e07" go.version=go1.8.3 http.request.host="127.0.0.1:5000" http.request.id=cb93796f-1d0b-4044-98ba-d981ab217ca4 http.request.method=PUT http.request.remoteaddr="127.0.0.1:39554" http.request.uri="/v2/integration/test/manifests/latest" http.request.useragent="Go-http-client/1.1" instance.id=3d733449-c4a8-48f6-a519-ac9281f3a2f2 openshift.auth.user=admin openshift.auth.userid=5f56ab2b-8bd6-11e7-aed0-0242ac110002 openshift.logger=registry vars.name="integration/test" vars.reference=latest 
    		time="2017-08-28T09:50:50.940623144Z" level=error msg="response completed with error" err.code=unknown err.detail="User \"admin\" cannot get images.image.openshift.io at the cluster scope" err.message="unknown error" go.version=go1.8.3 http.request.host="127.0.0.1:5000" http.request.id=cb93796f-1d0b-4044-98ba-d981ab217ca4 http.request.method=PUT http.request.remoteaddr="127.0.0.1:39554" http.request.uri="/v2/integration/test/manifests/latest" http.request.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=51.326717ms http.response.status=500 http.response.written=371 instance.id=3d733449-c4a8-48f6-a519-ac9281f3a2f2 openshift.auth.user=admin openshift.auth.userid=5f56ab2b-8bd6-11e7-aed0-0242ac110002 openshift.logger=registry vars.name="integration/test" vars.reference=latest 

@bparees
Copy link
Contributor

bparees commented Sep 15, 2017

@deads2k bump.

@deads2k
Copy link
Contributor

deads2k commented Sep 15, 2017

@bparees have you managed to figure out where in the docker registry you're failing? What its trying to do? This could very easily be a case of, "I'm trying to access A which failed so I try to access B".

It could also be that your test needs to wait until the project access rules have synchronized (which I noted that it isn't doing back here: #16012 (comment)

@bparees bparees assigned legionus and unassigned bparees Sep 15, 2017
@deads2k
Copy link
Contributor

deads2k commented Sep 15, 2017

I turned on the audit trail and ran successful run. The only audit log says that on succeses, only system:admin is requesting images.

2017-09-15T12:22:10.416052742-04:00 AUDIT: id="328c7d31-9258-4c08-a1b6-cf244d044c3b" ip="127.0.0.1" method="GET" user="system:admin" groups="\"system:cluster-admins\",\"system:authenticated\"" as="<self>" asgroups="<lookup>" namespace="<none>" uri="/apis/image.openshift.io/v1/images/sha256:6f41df09ec020ee02fc178dda86d1ee9a54ee5608b083a9a655840a30292c810"

Either you're hitting a different path on failures or you're accidentally changing users. You need to figure out where the call is coming from.

@bparees
Copy link
Contributor

bparees commented Sep 15, 2017

Thank you @deads2k

@aveshagarwal
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/imageregistry kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

No branches or pull requests

8 participants