Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Orca): Feature: option to add a delay before polling starts in Webhook stage #3450 #2974

Merged
merged 5 commits into from
Jun 11, 2019
Merged

Conversation

neelimamanubolu
Copy link
Contributor

Hi,
As suggested added 404 status for the webhook stage. Please find the below link for refernce.

spinnaker/spinnaker#3450

@spinnakerbot
Copy link
Contributor

The following commits need their title changed:

  • d42bfd3: Merge pull request #1 from spinnaker/master

  • 4d5e37b: Merge pull request #2 from spinnaker/master

Please format your commit title into the form:

<type>(<scope>): <subject>, e.g. fix(kubernetes): address NPE in status check

This allows us to easily generate changelogs & determine semantic version numbers when cutting releases. You can read more about commit conventions here.

@asher asher merged commit 2c887f2 into spinnaker:master Jun 11, 2019
@@ -101,7 +101,7 @@ class CreateWebhookTask implements RetryableTask {
return TaskResult.builder(ExecutionStatus.TERMINAL).context(outputs).build()
}

if (statusCode.is5xxServerError() || statusCode.value() == 429) {
if (statusCode.is5xxServerError() || statusCode.value() == 429 || statusCode.value() == 404) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the issue is talking about the monitoring version of the webhook task:
https://github.com/spinnaker/orca/blob/master/orca-webhook/src/main/groovy/com/netflix/spinnaker/orca/webhook/tasks/MonitorWebhookTask.groovy#L92

I am a little concerned about blindly retrying on 404 - I think in more cases than not 404 is an irrecoverable error rather than retryable.

Adding this to the monitor side is a little more palatable but I'd rather we don't add it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mark Vulfson ,
Kindly please go through @eric Zimanyi comments for the issue " Adding two parameters in the UI to create delay before polling starts in Webhook stage spinnaker/spinnaker#4470 ".
Original issue is "Feature: option to add a delay before polling starts in Webhook stage spinnaker/spinnaker#3450".
We proposed solution that we will go in the form of issue #4470, Eric suggested to go with two parts.. 1. add 404 status and 2.add delay.
We done 404 status as the first attempt and trying to add delay in the next step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neelimamanubolu I see what @ezimanyi is saying. But note that as I mentioned earlier, the changes you made do not implement his suggestion. His idea is to retry on 404 in the MonitorWebhookTask not WebhookTask. I would be more OK with retrying on 404 on the MonitorTask (though I still think, and @ezimanyi notes in his comment as well, this is a breaking change.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MonitorWebhookTask, changes are not reflecting, @ezimanyi gave suggestion to add like 5xx and 429, but not exactly in the same class. So we created in WebhookTask , tested and fially made PR.

Copy link
Contributor

@marchello2000 marchello2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see noted in the comment

@marchello2000
Copy link
Contributor

Also, the title is misleading. option to add a delay before polling starts in Webhook stage - this change doesn't do that

@marchello2000
Copy link
Contributor

marchello2000 commented Jun 11, 2019

@neelimamanubolu : (chatted with @asher) I am going to revert this PR for now, please see comments and put up another PR, I am happy to review it then!

marchello2000 added a commit to marchello2000/orca that referenced this pull request Jun 11, 2019
marchello2000 added a commit that referenced this pull request Jun 11, 2019
@neelimamanubolu
Copy link
Contributor Author

neelimamanubolu commented Jun 12, 2019

Also, the title is misleading. option to add a delay before polling starts in Webhook stage - this change doesn't do that

As posted earlier this title is the suggested solution for the issue #3450. For your refrence pls go through the comments from @eric in the "Adding two parameters in the UI to create delay before polling starts in Webhook stage #4470 ".

Still if u find the commit doesnt suit then we can discuss more brief about this.

@marchello2000
Copy link
Contributor

Also, the title is misleading. option to add a delay before polling starts in Webhook stage - this change doesn't do that

As posted earlier this title is the suggested solution for the issue #3450. For your refrence pls go through the comments from @eric in the "Adding two parameters in the UI to create delay before polling starts in Webhook stage #4470 ".

Still if u find the commit doesnt suit then we can discuss more brief about this.

I understand that, but think about someone going through the commits 6 months from now trying to understand why something happened. The person will read the title "add a delay before starting in webhook stage" but when they look at the code it has nothing to do with a delay. All it does is retry on 404s.
It's fine to add all those details to the body of the commit message, but the title should be a short description of what the change does

@neelimamanubolu
Copy link
Contributor Author

Also, the title is misleading. option to add a delay before polling starts in Webhook stage - this change doesn't do that

As posted earlier this title is the suggested solution for the issue #3450. For your refrence pls go through the comments from @eric in the "Adding two parameters in the UI to create delay before polling starts in Webhook stage #4470 ".
Still if u find the commit doesnt suit then we can discuss more brief about this.

I understand that, but think about someone going through the commits 6 months from now trying to understand why something happened. The person will read the title "add a delay before starting in webhook stage" but when they look at the code it has nothing to do with a delay. All it does is retry on 404s.
It's fine to add all those details to the body of the commit message, but the title should be a short description of what the change does

Ok..Mark, as requested will raise another PR for the other issue "Adding two parameters in the UI to create delay before polling starts in Webhook stage #4470 ".

bpowell pushed a commit to armory-io/orca that referenced this pull request Jun 13, 2019
creating a new ObjectMapper instance for each Stage or other per-request
contexts can be wasteful; ObjectMapper is safe to reuse across threads
(after it is configured). See
spinnaker/spinnaker#4367 for more details.

fix(sql): optimize reading executions by pipeline id (spinnaker#2949)

fix(sql): add precondition check to pipeline_config_id_idx migration (spinnaker#2953)

chore(*): Bump kork (spinnaker#2954)

fix(pipelines): save pipelines task shouldn’t reuse pipeline id (spinnaker#2951)

When save pipelines task is used multiple times in a stage, it now reuses the pipeline ID. This is problematic because if new pipelines are created they will reuse a previous pipeline ID and overwrite it rather than creating a new pipeline. I think the real problem here is the use of “pipeline.id” in the stage context for multiple purposes (both of which make sense independently).

1) SaveServiceAccountTask generates a new pipeline.id if one doesn’t exist for use with regenerateCronTriggerIds. It effectively changes the contract from a new pipeline not having an id, to it being generated beforehand.

2) Later in this class, after a pipeline is updated or created, its id is stored in “pipeline.id” to aid with refresh logic.

But the two of these together keeps the context “pipeline.id” from the last saved pipeline, and reuses it in the case that the pipeline.id is missing (when it should be created).
It may be that these two usages should have different names in the stage context. In that case this loop wouldn’t happen. Since SavePipelinesFromArtifactStage is the only stage where multiple pipelines are saved, and it does not currently use managed service accounts, it seems easier to just avoid the new regenerateCronTriggers logic when isSavingMultiplePipelines is true. Changing the model can have unintended consequences.

fix(kubernetes): force cache refresh after deploy stage artifact cleanup

fix issue 4469: orca-core skipping most tests in build (spinnaker#2957)

* fix(tests): rename fields in LockContextSpec

works around the java.lang.VerifyError mentioned in
spinnaker/spinnaker#4469

* fix(ExecutionSpec): fix test failure due to slf4j-simple

With LockContextSpec fixed in the previous commit to no longer break the
junit-vintage test runner, ExecutionSpec was also broken in commit
a1de32d due to the change from using log4j's
MDC to slf4j's MDC class.

orca-core was setting `slf4j-simple` as `testRuntimeScope` - and
`slf4j-simple` has a no-op MDC implementation which quietly discards any
values set via `MDC.put()`.

To fix this, allow the MDCAdapter provided by `slf4j-api` to be used
instead by removing the dependency on slf4j-simple, and add an assertion
to the test class to ensure that any MDCAdapter instance besides the
no-op adapter is in use.

chore(dependencies): Autobump korkVersion (spinnaker#2959)

perf(artifacts): use pageSize=1 when resolving prior artifacts (spinnaker#2955)

This is a smaller-scoped attempt at
spinnaker#2938, which relates to
spinnaker/spinnaker#4367.

`ArtifactResolver.resolveArtifacts(Map)` currently attempts to load
_all_ executions for a pipeline in order to find the most recent
execution in the course of resolving expected artifacts. This causes a
lot of unnecessary data to be loaded from Redis/SQL, only to be
discarded a few operations later.

This change makes use of the fact that the ExecutionRepository
implementations will respect the `executionCriteria.pageSize` argument
when retrieving Execitions for a pipelineId.

In the Redis-based implementation, the executions are stored in a sorted
set scored on `buildTime` (or `currentTimeMillis()` if `buildTime` is
null), so retrieving all of the executions for the pipelineId with a
pageSize=1 should load the Execution with the most recent `buildTime`.

In the SQL-based implementation,
`retrievePipelinesForPipelineConfigId(String, ExecutionCriteria)` sorts
the query results based on the `id` field.

For both implementations, this is a small change from the existing
behavior of ArtifactResolver, which retrieves all executions and then
uses the one with the most recent `startTime` (or `id`). This change
seems like it will lead to the same result in most cases though, since
buildTime is set when the Execution is stored, and the `id` field is
ascending based on the timestamp of when it is generated.

The `retrievePipelinesForPipelienConfigId` method in both
implementations currently ignores the `executionCriteria.sortType`
field, but I've added this in the call from ArtifactResolver to at
least document ArtifactResolver's desire.

fix(zombie-check): make zombie check run regardless of instance status (spinnaker#2962)

Remove check for all activators being up before running a zombie check.
It doesn't really buy us anything other than confusion.
E.g. if there is a dedicated instance that runs zombie checks but has
queue disabled the zombie check will not run.
Even if down instances run a zombie check that should be ok since their
view of the queue should still be accurate

chore(dependencies): Autobump korkVersion (spinnaker#2964)

chore(dependencies): kork 5.4.8, fiat 1.1.0 (spinnaker#2969)

chore(dependencies): Autobump korkVersion (spinnaker#2970)

fix(RRB): Add dedicated pin stage (spinnaker#2968)

For RRB we start by pinning the source SG so that it doesn't scale down during deploy (and end with unpinning it).
However, if for whatever reason some of the instances are taken out of discovery/targetgroup we will timeout
waiting for capacity match on this operation.
Since this pin/unpin operations are purely "cautionary" and don't require us to actually match the capacity,
don't wait for capacity match so we don't fail for this non-critical reason

fix(findImage) : Fix child deploy to use correct find image when clou… (spinnaker#2960)

* fix(findImage) : Fix child deploy to use correct find image when cloud providers differ

* fixup

* adding tests

fix(spel): Fix NPE in SpEL evaluation

Evaluating SpEL on a null object throws an NPE; if the input object
is null, just return null as the evaluated expression.

fix(gcb): Bind artifacts produced from GCB stage

Because we didn't include the BindProducedArtifacts task in the
GCB stage, produced artifacts were not matched to expected artifacts,
which prevented correctly using them downstream.

fix(Orca): Feature: option to add a delay before polling starts in Webhook stage spinnaker#3450 (spinnaker#2974)

feat(pipeline/expressions): support yaml parsing (spinnaker#2946)

This patch adds two helper functions `readYaml` and
`yamlFromUrl` to transfrom yaml strings (either
embedded or from retrieved from an URL) to their
object representation.

DNMY: missing tests

Revert "fix(Orca): Feature: option to add a delay before polling starts in Webhook stage spinnaker#3450 (spinnaker#2974)" (spinnaker#2976)

This reverts commit 2c887f2.

fix(core): Mark mptv2 items as inherited (spinnaker#2971)

* fix(core): Mark mptv2 items as inherited

spinnaker/spinnaker#4451

* fix(core): Refactor how items are marked inherited

spinnaker/spinnaker#4451

fix(findImage) : Fix findImage bug introduced in spinnaker#2960 (spinnaker#2980)

fix(fromUrl): Better error message for redirects (spinnaker#2978)

When using SpEL functions like `*FromUrl` when the request requires auth and causes
a redirect response, we get a confusing "Not retrying" error message.
Make it a more descriptive error message that indicated where we are redirecting to (e.g. `login.asp`)
and state that it's not supported

fix(runJob): cancel underlying job on cancel stage (spinnaker#2966)

fix(plugins): added dependency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants