Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1837564: pkg/terraform: add diagnostics errors for terraform apply operations #3535

Merged
merged 2 commits into from
May 20, 2020

Conversation

abhinavdahiya
Copy link
Contributor

The terraform errors are tracked in a buffer. This buffers is then used to match against various
known conditions to understand the reasons for the errors.

This now allows the terraform apply to return specific errors in these cases instead of previous failed to apply Terraform constant
string message.

/cc @openshift/openshift-team-installer

@openshift-ci-robot openshift-ci-robot requested a review from a team May 3, 2020 19:15
@abhinavdahiya
Copy link
Contributor Author

/test e2e-gcp
/test e2e-azure

@abhinavdahiya
Copy link
Contributor Author

/test all

@abhinavdahiya abhinavdahiya force-pushed the tf_diagnose branch 2 times, most recently from a27c07b to 4043e43 Compare May 4, 2020 16:54
@jhixson74
Copy link
Member

/approve

@abhinavdahiya
Copy link
Contributor Author

/test e2e-azure

@abhinavdahiya
Copy link
Contributor Author

/test e2e-gcp

@jhixson74
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2020
Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too. I would be interested to see some actual log errors.

@rna-afk PTAL re metric relevance

pkg/terraform/diagnose.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 7, 2020
@abhinavdahiya
Copy link
Contributor Author

/test e2e-azure
/test e2e-gcp

@abhinavdahiya
Copy link
Contributor Author

abhinavdahiya commented May 7, 2020

@patrickdillon here is an example of the error logged by the installer https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/3535/pull-ci-openshift-installer-master-e2e-azure/541#1:build-log.txt%3A108

level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: error(AzureVirtualMachineFailure) from \"InfrastructureProvider\": Some virtual machines failed to provision in alloted time. Virtual machines can fail to provision if the bootstap virtual machine has failing services."

@abhinavdahiya
Copy link
Contributor Author

/retest

}, {
match: regexp.MustCompile(`Error: Error Creating/Updating Subnet .*: network.SubnetsClient#CreateOrUpdate: .* Code="AnotherOperationInProgress" Message="Another operation on this or dependent resource is in progress`),

reason: "AzureMultiOperationFailure",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we structure this so Azure conditions can be defined in a pkg/terraform/azure subpackage, so per-platform sub-teams can be assigned as owners for their platform's conditions? There might be some generic conditions that apply to all platforms, but I'd expect most of the time, you'd want a whole bunch of platform-specific conditions from the platform the installer is using, plus maybe a handful of generic conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i'm keeping this list in same file and we can split in into platforms in future work.

func (e *Err) Error() string {
buf := &bytes.Buffer{}
if len(e.Source) > 0 {
fmt.Fprintf(buf, "error(%s) from %q", e.Reason, e.Source)
Copy link
Member

@wking wking May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need %q for Source, or can we use %s (here and in your later Error from %q)? Both of those strings are compiled into the installer right? I'd expect to use %q only if we were passing along some information slurped from the provider, where we weren't sure if there was going to be whitespace or other potentially confusing characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@abhinavdahiya abhinavdahiya changed the title pkg/terraform: add diagnostics errors for terraform apply operations Bug 1837564: pkg/terraform: add diagnostics errors for terraform apply operations May 19, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 19, 2020
@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: This pull request references Bugzilla bug 1837564, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1837564: pkg/terraform: add diagnostics errors for terraform apply operations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The diagnostics error allows the providing important context to provide better error reporting for the the users.
The error allows the installer assets etc. to provide structural information,
- Source: the source of the error, the installer assets have errors from cloud providers or internal errors, the source
allows providing hat context to better categorize these errors.
- Reason: is a single word reason that corrrectly summarizes the type of error, allows the users to quickly understand the type of error.
also should allow internal metrics tracking to tracks these error types.
The terraform errors are tracked in a buffer. This buffers is then used to match against various
known conditions to understand the reasons for the errors.

This now allows the terraform apply to return specific errors in these cases instead of previous `failed to apply Terraform` constant
string message.
@abhinavdahiya
Copy link
Contributor Author

/retest

/test e2e-azure
/test e2e-gcp

@abhinavdahiya
Copy link
Contributor Author

/retest

@jhixson74
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2020
@jhixson74
Copy link
Member

/approve

1 similar comment
@abhinavdahiya
Copy link
Contributor Author

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, jhixson74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 254680f into openshift:master May 20, 2020
@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: All pull requests linked via external trackers have merged: openshift/installer#3535. Bugzilla bug 1837564 has been moved to the MODIFIED state.

In response to this:

Bug 1837564: pkg/terraform: add diagnostics errors for terraform apply operations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants