Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Using a wrong cidr_v4 does not get validated #263

Closed
fernando-villalba opened this issue Jul 17, 2024 · 5 comments · Fixed by #275
Closed

[BUG] Using a wrong cidr_v4 does not get validated #263

fernando-villalba opened this issue Jul 17, 2024 · 5 comments · Fixed by #275
Labels
bug Something isn't working

Comments

@fernando-villalba
Copy link
Contributor

fernando-villalba commented Jul 17, 2024

Issues

Using a completely wrong cidr_v4 causes terraform to keep trying to provision this resource for two minutes rather than failing straight away:

resource "civo_network" "volume-issue" {
  label = "blah-network"
  cidr_v4 = "sfdsdfsdf"

}
image

This should fail straight away as it does on the API, CLI, etc.

Acceptance Criteria

  • Have this resource fail straight away when entering the wrong value.

This ticket should be used as a reference for a bigger piece of work revamping the way we handle errors in terraform. The default behaviour with our provider seems to be to keep trying and it does NOT return the error the API gives, which is a problem.

We need to go through the entire provider and change the behaviour for everything not by writing error messages in terraform provider, but returning errors from the API

@fernando-villalba fernando-villalba added the bug Something isn't working label Jul 17, 2024
@Praveen005
Copy link
Contributor

Praveen005 commented Jul 22, 2024

Hi team,

I've been working on this issue and would like to share my approach for your review.

Outcome:

  1. Improved error handling.
  2. This resolves issues 269, 263, 256, and 249. (Screenshots attached below)

Approach:

  1. Retry Mechanism: I have Implemented a centralized retry mechanism using the Terraform Plugin SDK's Retry package within our utils. so, it can be used across the provider.

  2. Error Classification: Civo API returns more than 240 different types of errors. So, I have categorized the API errors into Retryable and Non-Retryable groups. This definitely is not an accurate categorization, but I have tried to be lenient to strike a balance. This approach is extensible and errors can anytime be added or removed from the group without affecting the core functionality.

  3. Error Message Parsing: We get error from the civo API in the following form:

Error: Unknown error response - status: 400 Bad Request, code: 400, reason: {"code":"invalid_v4_cidr_provided","reason":"The provided v4 CIDR must be a valid CIDR with a subnet mask comprehended between /22 and /28"}

So, I have tried to extract the Json part of it, and return only the code & reason. So, the user has enough information as to what went wong, and also the error message doesn't look cluttered.
Is this okay? or should I return the complete error?

Complete code can be seen in the gist Here.


Regarding Error Message Rewriting:

We need to go through the entire provider and change the behaviour for everything not by writing error messages in terraform provider, but returning errors from the API

I am a bit confused, would you like to refactor error handling throughout the provider as shown below?

Current:

kubeCluster, err := apiClient.FindKubernetesCluster(name.(string))
if err != nil {
	return diag.Errorf("[ERR] failed to retrive kubernetes cluster: %s", err)
}

Proposed:

kubeCluster, err := apiClient.FindKubernetesCluster(name.(string))
if err != nil {
	return diag.FromErr(err)
}


Screenshots for issues: 263, 256 & 249:

You can see them error-ing out, This prevents ghost resource creation and quota exhaustion.

Issue 263:

Screenshot 2024-07-22 155133

Issue 256:

Screenshot 2024-07-22 164114

Issue 249:

Screenshot 2024-07-22 153032

Issue 269:
Screenshot 2024-07-22 181416

Regards,
Praveen

@uzaxirr
Copy link
Member

uzaxirr commented Jul 22, 2024

Hii @Praveen005
Your approach looks fine to me. But a lil bit of correction. Here we decided to remove retry for all of the resources. So you just have to remove the retry mechanism wherever it is implemented and return the error that we got from the API in a clean format to the user.

@Praveen005
Copy link
Contributor

Hi @uzaxirr,

Thanks for the input.

We currently have retries only in resource_instance.go & resource_network.go
I will replace the relevant part with the following:

        instance, err := apiClient.CreateInstance(config)
	if err != nil{
		customErr, parseErr := utils.ParseErrorResponse(err.Error())
		if parseErr == nil {
			err = customErr
		}
		return diag.Errorf("[ERR] failed to create an instance: %s", err)
	}

In resource_kubernetes_cluster_nodepool.go we have:

// Add retry logic here to delete the node pool
	err = retry.RetryContext(ctx, d.Timeout(schema.TimeoutDelete)-time.Minute, func() *retry.RetryError {
		_, err := apiClient.GetKubernetesClusterPool(getKubernetesCluster.ID, d.Id())
		if err != nil {
			if errors.Is(err, civogo.DatabaseClusterPoolNotFoundError) {
				log.Printf("[INFO] kubernetes node pool %s deleted", d.Id())
				return nil
			}
			log.Printf("[INFO] error trying to read kubernetes cluster pool: %s", err)
			return retry.NonRetryableError(fmt.Errorf("error waiting for Kubernetes node pool to be deleted: %s", err))
		}
		log.Printf("[INFO] kubernetes node pool %s still exists", d.Id())
		return retry.RetryableError(fmt.Errorf("kubernetes node pool still exists"))
	})
	if err != nil {
		return diag.FromErr(err)
	}

I feel we shouldn't delete this, many a times it errors out because of its dependency on other resources. or should we?

Also, is the format of error below okay?

Screenshot 2024-07-22 181416

Is there anything else, I should take care of?

@uzaxirr
Copy link
Member

uzaxirr commented Jul 22, 2024

@Praveen005 the above seems fine, can you raise a PR

@Praveen005
Copy link
Contributor

Hi @uzaxirr,

I have raised the PR. When you have a moment, could you please review it?

Regards,
Praveen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants