-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry build instantiation on conflict in rest api #13910
Conversation
@csrwng @bparees at the moment I'm just putting this out here for discussion ;) Is retry-N-times a preferred strategy to retry-forever? Separately, actually there are quite a few calls to Instantiate() in our codebase. Some specifically catch a Conflict response and don't retry (um?). Others don't catch it. I'm wondering whether a better plan is to refactor Instantiate() to retry itself (according to the retry strategy you prefer) and clear up the conflict paths where they exist in callers? @openshift/devex |
retry forever generically scares me, but I admittedly don't have the requisite experience with the build instantiate/create path to say it does not fit into some special case category. I like the latter idea, assuming the implementation cost/risks are tolerable. my 2 cents in any event |
I vote for retry-N-times
Seems reasonable |
Near as i can tell, the only place we update the buildconfig is within instantiate() and clone(). So again i'm not clear how we're hitting this failure unless someone else is also starting a build at the same time and we're in a race. anyway I vote for N retries and moving the retry logic into the instantiate/clone methods, but i'm still concerned we don't really understand why the conflict is happening. |
@bparees hmm, then could it be the jenkins plugin that's updating the buildconfig under our feet? I still need to go through the logs in more detail. |
@jim-minter w/o looking at the test/pipeline definition etc myself, all i can say is "anything's possible" :) |
It could be the plugin. After initiating the build request, once we have the build object, we annotate the build with the job url. See https://github.com/openshift/jenkins-plugin/blob/master/src/main/java/com/openshift/jenkins/plugins/pipeline/model/IOpenShiftBuilder.java#L251-L258 When we implemented that job annotation pieced, I had to add some retry there because I was getting update conflicts. See https://github.com/openshift/jenkins-plugin/blob/master/src/main/java/com/openshift/jenkins/plugins/pipeline/model/IOpenShiftPlugin.java#L612-L614 Bottom line ... I think this could be viewed as a valid explanation for the concurrent access. |
Also, that blue-green job has some multithreaded-ness, including accessing the build with background threads .... i would hope that etcd manages the RO vs RW type of locking as one would expect, but I only have a high level understanding of etcd |
we're getting the conflict on the update to the buildconfig, not the build, though. |
On Thu, Apr 27, 2017 at 2:39 PM, Ben Parees ***@***.***> wrote:
It could be the plugin. After initiating the build request, once we have
the build object, we annotate the build with the job url
we're getting the conflict on the update to the buildconfig, not the
build, though.
Ah ... then yeah, don't see after my review the jenkins plugin contributing
to the race/conflict
… —
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#13910 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadAzKeJMQL5zIRc1IIlSFZMvvafKAks5r0ODMgaJpZM4NKD2J>
.
|
The source of the conflict is definitely Java (okhttp user agent):
I think that the issue is probably PipelineJobListener.java in openshift-sync-plugin firing in response to the pipeline being created initially and sending a needless buildconfig update back to OpenShift. @gabemontero would you be able to help me fix this? Regardless, I think the origin server code should be hardened up: will update this PR to do so. |
[test] |
| @gabemontero would you be able to help me fix this? I should be able to look into the sync plugin item soon @jim-minter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Clone() should do the same thing.
[testextended][extended:core(builds)] |
flake #10773 |
Evaluated for origin test up to 84906c4 |
Evaluated for origin testextended up to 84906c4 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1315/) (Base Commit: f714687) |
continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended/364/) (Base Commit: f714687) (Extended Tests: core(builds)) |
@jim-minter i'm ok w/ merging this if you're happy, just take the WIP off the description. |
@bparees done - please merge |
[merge] |
something appears to have hung in the networking tests |
Evaluated for origin merge up to 84906c4 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/610/) (Base Commit: 012de50) (Image: devenv-rhel7_6225) |
fixes #13694
fixes #13220