Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm/Release] uninstall should be able to wait #1739

Closed
gitfool opened this issue Sep 30, 2021 · 1 comment · Fixed by #1742
Closed

[Helm/Release] uninstall should be able to wait #1739

gitfool opened this issue Sep 30, 2021 · 1 comment · Fixed by #1742
Assignees
Labels
area/helm helm-release-ga-blockers Items blocking Helm Release GA kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@gitfool
Copy link

gitfool commented Sep 30, 2021

Issue details

When installing a series of helm charts, using the preview helm releases feature, and there is a dependency chain between these helm releases and other resources, destroying the stack does not wait for the helm releases that have dependee resources, so the dependees with finalizers (owned by the helm release) are orphaned and the destroy eventually times out and fails.

Steps to reproduce

The following is a dotnet/c# code extract of a very specific case that captures the gist of the issue:

// aws load balancer controller; https://github.com/aws/eks-charts/tree/master/stable/aws-load-balancer-controller
Logger.LogDebug("Installing aws load balancer controller");
var awsLbcRole = new RoleX($"{k8sPrefix}-aws-load-balancer-controller",
    new RoleXArgs
    {
        AssumeRolePolicy = IamHelpers.AssumeRoleForServiceAccount(oidcArn, oidcUrl, "kube-system", "aws-load-balancer-controller", awsProvider),
        InlinePolicies = { ["policy"] = ReadResource("AwsLoadBalancerPolicy.json") }
    },
    new ComponentResourceOptions { Provider = awsProvider });

var awsLbcCrds = new ConfigGroup("aws-load-balancer-controller-crds",
    new ConfigGroupArgs { Yaml = ReadResource("AwsLoadBalancerCrds.yaml") },
    new ComponentResourceOptions { Provider = k8sProvider });

// workaround https://github.com/pulumi/pulumi-kubernetes/issues/1725
var _clusterName = OutputUtilities.GetValueAsync(clusterName).GetAwaiter().GetResult();
var _awsLbcRoleArn = OutputUtilities.GetValueAsync(awsLbcRole.Arn).GetAwaiter().GetResult();

var awsLbcValues =
    new Dictionary<string, object>
    {
        ["clusterName"] = _clusterName,
        ["enableCertManager"] = true,
        ["serviceAccount"] = new { annotations = new Dictionary<string, string> { ["eks.amazonaws.com/role-arn"] = _awsLbcRoleArn } }
    }.ToDictionary(); // workaround https://github.com/pulumi/pulumi/issues/8013

var awsLbcRelease = new Release("aws-load-balancer-controller", // ingress records with alb.ingress.kubernetes.io annotations depend on chart finalizers
    new ReleaseArgs
    {
        Namespace = "kube-system",
        Name = "aws-load-balancer-controller",
        RepositoryOpts = new RepositoryOptsArgs { Repo = "https://aws.github.io/eks-charts" },
        Chart = "aws-load-balancer-controller",
        Version = K8sConfig.AwsLbcChartVersion,
        Values = awsLbcValues,
        Atomic = true,
        SkipCrds = true
    },
    new CustomResourceOptions { DependsOn = { awsLbcCrds, certManagerRelease }, Provider = k8sProvider });

...

// kube prometheus stack; https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Logger.LogDebug("Installing kube prometheus stack");
var monitoringNs = new Namespace("monitoring",
    new NamespaceArgs { Metadata = new ObjectMetaArgs { Name = "monitoring" } },
    new CustomResourceOptions { Provider = k8sProvider });

GrafanaPassword = string.IsNullOrEmpty(K8sConfig.GrafanaPassword)
    ? new RandomPassword("kps-grafana-password", new RandomPasswordArgs { Length = 16, Special = true }).Result
    : Output.CreateSecret(K8sConfig.GrafanaPassword);

// workaround https://github.com/pulumi/pulumi-kubernetes/issues/1725
var _grafanaPassword = OutputUtilities.GetValueAsync(GrafanaPassword).GetAwaiter().GetResult();

var kubePrometheusStackValues =
    new Dictionary<string, object>
    {
        ["nameOverride"] = "kps",
        ["fullnameOverride"] = "kps",
        ["kubeVersionOverride"] = K8sConfig.Version,
        ["kubeTargetVersionOverride"] = K8sConfig.Version,
        ["alertmanager"] = new
        {
            alertmanagerSpec = new
            {
                storage = new
                {
                    volumeClaimTemplate = new
                    {
                        metadata = new { name = "alertmanager-db" },
                        spec = new
                        {
                            storageClassName = "efs-sc",
                            accessModes = new[] { "ReadWriteOnce" },
                            resources = new { requests = new { storage = "10Gi" } } // mandatory but ignored by efs
                        }
                    }
                },
                nodeSelector = new { role = "monitoring" },
                tolerations = new[] { new { key = "role", @operator = "Exists" } }
            },
            ingress = new
            {
                enabled = true,
                hosts = new[] { $"alertmanager.{AwsConfig.Route53.Internal.Domain}" },
                pathType = "Prefix",
                annotations = new Dictionary<string, string>
                {
                    ["alb.ingress.kubernetes.io/load-balancer-name"] = $"{EnvName}-{AwsConfig.Alb.Internal.Default.Group}",
                    ["alb.ingress.kubernetes.io/scheme"] = AwsConfig.Alb.Internal.Default.Scheme,
                    ["alb.ingress.kubernetes.io/group.name"] = AwsConfig.Alb.Internal.Default.Group,
                    ["alb.ingress.kubernetes.io/target-type"] = AwsConfig.Alb.Internal.Default.TargetType,
                    ["alb.ingress.kubernetes.io/healthcheck-path"] = "/-/healthy",
                    ["external-dns.alpha.kubernetes.io/hostname"] = $"alertmanager.{AwsConfig.Route53.Internal.Domain}",
                    ["kubernetes.io/ingress.class"] = AwsConfig.Alb.Internal.Default.Class
                }
            }
        },
        ["grafana"] = new
        {
            nameOverride = "kps-grafana",
            fullnameOverride = "kps-grafana",
            adminPassword = _grafanaPassword,
            nodeSelector = new { role = "monitoring" },
            tolerations = new[] { new { key = "role", @operator = "Exists" } },
            ingress = new
            {
                enabled = true,
                hosts = new[] { $"grafana.{AwsConfig.Route53.Internal.Domain}" },
                pathType = "Prefix",
                annotations = new Dictionary<string, string>
                {
                    ["alb.ingress.kubernetes.io/load-balancer-name"] = $"{EnvName}-{AwsConfig.Alb.Internal.Default.Group}",
                    ["alb.ingress.kubernetes.io/scheme"] = AwsConfig.Alb.Internal.Default.Scheme,
                    ["alb.ingress.kubernetes.io/group.name"] = AwsConfig.Alb.Internal.Default.Group,
                    ["alb.ingress.kubernetes.io/target-type"] = AwsConfig.Alb.Internal.Default.TargetType,
                    ["alb.ingress.kubernetes.io/healthcheck-path"] = "/api/health",
                    ["external-dns.alpha.kubernetes.io/hostname"] = $"grafana.{AwsConfig.Route53.Internal.Domain}",
                    ["kubernetes.io/ingress.class"] = AwsConfig.Alb.Internal.Default.Class
                }
            }
        },
        ["kubeControllerManager"] = new { enabled = false },
        ["kubeEtcd"] = new { enabled = false },
        //["kubelet"] = new
        //{
        //    serviceMonitor = new
        //    {
        //        metricRelabelings = new[]
        //        {
        //            new
        //            {
        //                sourceLabels = new[] { "node" },
        //                targetLabel = "instance"
        //            }
        //        }
        //    }
        //},
        ["kubeScheduler"] = new { enabled = false },
        ["kube-state-metrics"] = new
        {
            nameOverride = "kps-kube-state-metrics",
            fullnameOverride = "kps-kube-state-metrics",
            nodeSelector = new { role = "monitoring" },
            tolerations = new[] { new { key = "role", @operator = "Exists" } }
        },
        ["prometheus"] = new
        {
            prometheusSpec = new
            {
                podMonitorSelectorNilUsesHelmValues = false,
                probeSelectorNilUsesHelmValues = false,
                ruleSelectorNilUsesHelmValues = false,
                serviceMonitorSelectorNilUsesHelmValues = false,
                retention = "30d",
                storageSpec = new
                {
                    volumeClaimTemplate = new
                    {
                        metadata = new { name = "prometheus-db" },
                        spec = new
                        {
                            storageClassName = "efs-sc",
                            accessModes = new[] { "ReadWriteOnce" },
                            resources = new { requests = new { storage = "150Gi" } } // mandatory but ignored by efs
                        }
                    }
                },
                nodeSelector = new { role = "monitoring" },
                tolerations = new[] { new { key = "role", @operator = "Exists" } }
            },
            ingress = new
            {
                enabled = true,
                hosts = new[] { $"prometheus.{AwsConfig.Route53.Internal.Domain}" },
                pathType = "Prefix",
                annotations = new Dictionary<string, string>
                {
                    ["alb.ingress.kubernetes.io/load-balancer-name"] = $"{EnvName}-{AwsConfig.Alb.Internal.Default.Group}",
                    ["alb.ingress.kubernetes.io/scheme"] = AwsConfig.Alb.Internal.Default.Scheme,
                    ["alb.ingress.kubernetes.io/group.name"] = AwsConfig.Alb.Internal.Default.Group,
                    ["alb.ingress.kubernetes.io/target-type"] = AwsConfig.Alb.Internal.Default.TargetType,
                    ["alb.ingress.kubernetes.io/healthcheck-path"] = "/-/healthy",
                    ["external-dns.alpha.kubernetes.io/hostname"] = $"prometheus.{AwsConfig.Route53.Internal.Domain}",
                    ["kubernetes.io/ingress.class"] = AwsConfig.Alb.Internal.Default.Class
                }
            }
        },
        //["nodeExporter"] = new
        //{
        //    serviceMonitor = new
        //    {
        //        relabelings = new[]
        //        {
        //            new
        //            {
        //                sourceLabels = new[] { "__meta_kubernetes_pod_node_name" },
        //                targetLabel = "kubernetes_node"
        //            }
        //        }
        //    }
        //},
        ["prometheus-node-exporter"] = new
        {
            nameOverride = "kps-prometheus-node-exporter",
            fullnameOverride = "kps-prometheus-node-exporter",
            tolerations = new[] { new { key = "role", @operator = "Exists" } }
        },
        ["prometheusOperator"] = new
        {
            admissionWebhooks = new
            {
                certManager = new { enabled = true },
                patch = new
                {
                    nodeSelector = new { role = "monitoring" },
                    tolerations = new[] { new { key = "role", @operator = "Exists" } }
                }
            },
            nodeSelector = new { role = "monitoring" },
            tolerations = new[] { new { key = "role", @operator = "Exists" } }
        }
    }.ToDictionary(); // workaround https://github.com/pulumi/pulumi/issues/8013

var kubePrometheusStackRelease = new Release("kube-prometheus-stack",
    new ReleaseArgs
    {
        Namespace = "monitoring",
        Name = "kube-prometheus-stack",
        RepositoryOpts = new RepositoryOptsArgs { Repo = "https://prometheus-community.github.io/helm-charts" },
        Chart = "kube-prometheus-stack",
        Version = K8sConfig.KubePrometheusStackChartVersion,
        Values = kubePrometheusStackValues,
        Atomic = true,
        SkipCrds = true
    },
    new CustomResourceOptions { DependsOn = { awsLbcRelease /* finalizers */, certManagerRelease, efsDriverRelease /* finalizers */, kubePrometheusStackCrds, monitoringNs }, Provider = k8sProvider });

TL;DR: kps helm release depends on aws lbc helm release since it creates ingress records that are annotated accordingly for the aws lbc and will have the side effect of creating an aws alb for the given group. The dependency in this case is not so much for creation as deletion, since the aws lbc injects a finalizer into the ingress record so that it can tear down the aws alb when the reference count goes to 0. If the aws lbc is deleted first then the ingress records will be orphaned waiting for the finalizer and subsequently the monitoring namespace containing the ingress records cannot be deleted either.

It's obviously important to respect the dependency chain, specifically tearing down resources in the chain in reverse order, but as part of that process it needs to synchronously wait for dependee resources to complete their tear down before continuing.

I'm not sure if it's better to only wait if configured accordingly, or through tracked dependencies; or to always wait, which would simplify the implementation, assuming it would not impact performance by blocking concurrent destruction of resources, and given that destroying a stack should ultimately wait for all resources to be destroyed anyway.

References:

  • helm uninstall cli has wait and timeout options
  • Presumably the helm sdk used by pulumi kubernetes would support these options too?
@gitfool gitfool added the kind/bug Some behavior is incorrect or out of spec label Sep 30, 2021
@viveklak viveklak added the helm-release-ga-blockers Items blocking Helm Release GA label Sep 30, 2021
@mikhailshilkov mikhailshilkov added this to the 0.63 milestone Oct 6, 2021
@pulumi-bot pulumi-bot added the resolution/fixed This issue was fixed label Oct 6, 2021
@gitfool
Copy link
Author

gitfool commented Oct 7, 2021

@viveklak since you didn't have a test case, I'm confirming that #1742 fixes this issue. Thanks! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm helm-release-ga-blockers Items blocking Helm Release GA kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants