Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry build instantiation on conflict in rest api #13910

Merged
merged 1 commit into from
May 11, 2017

Conversation

jim-minter
Copy link
Contributor

fixes #13694
fixes #13220

@jim-minter
Copy link
Contributor Author

@csrwng @bparees at the moment I'm just putting this out here for discussion ;)

Is retry-N-times a preferred strategy to retry-forever?

Separately, actually there are quite a few calls to Instantiate() in our codebase. Some specifically catch a Conflict response and don't retry (um?). Others don't catch it.

I'm wondering whether a better plan is to refactor Instantiate() to retry itself (according to the retry strategy you prefer) and clear up the conflict paths where they exist in callers?

@openshift/devex

@gabemontero
Copy link
Contributor

retry forever generically scares me, but I admittedly don't have the requisite experience with the build instantiate/create path to say it does not fit into some special case category.

I like the latter idea, assuming the implementation cost/risks are tolerable.

my 2 cents in any event

@csrwng
Copy link
Contributor

csrwng commented Apr 27, 2017

Is retry-N-times a preferred strategy to retry-forever?

I vote for retry-N-times

I'm wondering whether a better plan is to refactor Instantiate() to retry itself (according to the retry strategy you prefer) and clear up the conflict paths where they exist in callers?

Seems reasonable

@bparees
Copy link
Contributor

bparees commented Apr 27, 2017

Near as i can tell, the only place we update the buildconfig is within instantiate() and clone(). So again i'm not clear how we're hitting this failure unless someone else is also starting a build at the same time and we're in a race.

anyway I vote for N retries and moving the retry logic into the instantiate/clone methods, but i'm still concerned we don't really understand why the conflict is happening.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2017
@jim-minter
Copy link
Contributor Author

@bparees hmm, then could it be the jenkins plugin that's updating the buildconfig under our feet? I still need to go through the logs in more detail.

@bparees
Copy link
Contributor

bparees commented Apr 27, 2017

@jim-minter w/o looking at the test/pipeline definition etc myself, all i can say is "anything's possible" :)

@gabemontero
Copy link
Contributor

gabemontero commented Apr 27, 2017

It could be the plugin. After initiating the build request, once we have the build object, we annotate the build with the job url. See https://github.com/openshift/jenkins-plugin/blob/master/src/main/java/com/openshift/jenkins/plugins/pipeline/model/IOpenShiftBuilder.java#L251-L258

When we implemented that job annotation pieced, I had to add some retry there because I was getting update conflicts. See https://github.com/openshift/jenkins-plugin/blob/master/src/main/java/com/openshift/jenkins/plugins/pipeline/model/IOpenShiftPlugin.java#L612-L614

Bottom line ... I think this could be viewed as a valid explanation for the concurrent access.

@gabemontero
Copy link
Contributor

Also, that blue-green job has some multithreaded-ness, including accessing the build with background threads .... i would hope that etcd manages the RO vs RW type of locking as one would expect, but I only have a high level understanding of etcd

@bparees
Copy link
Contributor

bparees commented Apr 27, 2017

It could be the plugin. After initiating the build request, once we have the build object, we annotate the build with the job url

we're getting the conflict on the update to the buildconfig, not the build, though.

@gabemontero
Copy link
Contributor

gabemontero commented Apr 27, 2017 via email

@jim-minter
Copy link
Contributor Author

The source of the conflict is definitely Java (okhttp user agent):

I0426 12:55:04.100810    2597 panics.go:76] GET /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline: (5.846ms) 200 [[okhttp/3.4.2] 172.17.0.4:58212]
I0426 12:55:04.124480    2597 panics.go:76] GET /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline: (7.214965ms) 200 [[okhttp/3.4.2] 172.17.0.4:58212]
I0426 12:55:04.282048    2597 panics.go:76] GET /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline: (3.17854ms) 200 [[openshift/v3.6.0 (linux/amd64) openshift/b4cdb81] 172.18.15.5:34530]
I0426 12:55:04.320678    2597 panics.go:76] PUT /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline: (7.782331ms) 200 [[okhttp/3.4.2] 172.17.0.4:58212]
I0426 12:55:04.339865    2597 panics.go:76] POST /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline/instantiate: (64.612288ms) 409 [[oc/v3.6.0 (linux/amd64) openshift/b4cdb81] 172.18.15.5:47686]
I0426 12:55:04.919017    2597 panics.go:76] DELETE /oapi/v1/namespaces/extended-test-jenkins-pipeline-1n0dc-8cr85/buildconfigs/bluegreen-pipeline: (12.213644ms) 200 [[openshift/v3.6.0 (linux/amd64) openshift/b4cdb81] 172.18.15.5:34530]

I think that the issue is probably PipelineJobListener.java in openshift-sync-plugin firing in response to the pipeline being created initially and sending a needless buildconfig update back to OpenShift. @gabemontero would you be able to help me fix this?

Regardless, I think the origin server code should be hardened up: will update this PR to do so.

@jim-minter
Copy link
Contributor Author

[test]

@gabemontero
Copy link
Contributor

| @gabemontero would you be able to help me fix this?

I should be able to look into the sync plugin item soon @jim-minter

@jim-minter
Copy link
Contributor Author

flake #14109
flake #14110
[test]
@bparees, @csrwng ptal

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Clone() should do the same thing.

@bparees
Copy link
Contributor

bparees commented May 10, 2017

[testextended][extended:core(builds)]

@bparees
Copy link
Contributor

bparees commented May 10, 2017

flake #10773
[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 84906c4

@openshift-bot
Copy link
Contributor

Evaluated for origin testextended up to 84906c4

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1315/) (Base Commit: f714687)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended/364/) (Base Commit: f714687) (Extended Tests: core(builds))

@bparees
Copy link
Contributor

bparees commented May 10, 2017

@jim-minter i'm ok w/ merging this if you're happy, just take the WIP off the description.

@jim-minter jim-minter changed the title WIP retry build instantiation on conflict in rest api Retry build instantiation on conflict in rest api May 11, 2017
@jim-minter
Copy link
Contributor Author

@bparees done - please merge

@bparees
Copy link
Contributor

bparees commented May 11, 2017

[merge]

@bparees
Copy link
Contributor

bparees commented May 11, 2017

something appears to have hung in the networking tests
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 84906c4

@openshift-bot
Copy link
Contributor

openshift-bot commented May 11, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/610/) (Base Commit: 012de50) (Image: devenv-rhel7_6225)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants