Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operators: ignore node deletion errors on absence #3113

Merged
merged 2 commits into from
May 22, 2024

Conversation

burgerdev
Copy link
Contributor

@burgerdev burgerdev commented May 21, 2024

Context

If a node deletion operation fails transiently after the instance was deleted at the CSP, we need to ensure that the retry succeeds, even if the instance we attempt to delete is already gone at the CSP.

Example operator logs: https://gist.github.com/burgerdev/c80de86ae44552a383364ef031243a41

Proposed change(s)

  • AWS: Ignore the specific error message for missing images.
  • Azure: nothing - deleting absent instance IDs is no error here.
  • GCP: turn on SkipInstancesOnValidationError, which is designed for this use case.

Additional info

Checklist

@burgerdev burgerdev added the bug fix Fixing a bug label May 21, 2024
@burgerdev burgerdev added this to the v2.17.0 milestone May 21, 2024
@burgerdev burgerdev requested a review from malt3 as a code owner May 21, 2024 12:51
Copy link

netlify bot commented May 21, 2024

Deploy Preview for constellation-docs canceled.

Name Link
🔨 Latest commit 5949a5d
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/664cc07ce4e2070008434d91

Comment on lines 222 to 227
func isInstanceNotFoundError(err error) bool {
if err == nil {
return false
}
return strings.Contains(err.Error(), "Instance Id not found")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func isInstanceNotFoundError(err error) bool {
if err == nil {
return false
}
return strings.Contains(err.Error(), "Instance Id not found")
}
func isInstanceNotFoundError(err error) bool {
var respErr *awshttp.ResponseError // using "awshttp" for package github.com/aws/aws-sdk-go-v2/aws/transport/http
return errors.As(err, &respErr) && respErr.HTTPStatusCode() == http.StatusNotFound
}

From https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md#error-handling
Did not tests this, but if it works it would be preferable over string comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be preferable indeed, but AWS does not return 404s in that case, but 400s. I meant to also link to the logs but forgot - fixed that now (see above). I would not want to infer nonexistence from a 400, because that might hide other failure conditions, so I'm back at string comparison :/

<ErrorResponse xmlns="http://autoscaling.amazonaws.com/doc/2011-01-01/">
  <Error>
    <Type>Sender</Type>
    <Code>ValidationError</Code>
    <Message>Instance Id not found - No managed instance found for instance ID: i-061c63c5eb45f0416</Message>
  </Error>
  <RequestId>702f44ea-6d55-480b-8dfa-3da9e62f1634</RequestId>
</ErrorResponse>

Copy link
Contributor

Coverage report

Package Old New Trend
operators/constellation-node-operator/internal/cloud/aws/client 34.40% 34.60% ↗️
operators/constellation-node-operator/internal/cloud/gcp/client 43.80% 43.90% ↗️

@burgerdev burgerdev merged commit 902b7f4 into main May 22, 2024
22 of 24 checks passed
@burgerdev burgerdev deleted the burgerdev/aws-node-lifecycle branch May 22, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants