Installing cert-manager with CRDs using helm hangs pulumi #1222

ninja- · 2020-07-27T17:05:21Z

Problem description

const certManagerNamespace = new kubernetes.core.v1.Namespace("certmanager", { metadata: { name: "cert-manager" } }, { provider });
const certmanager = new kubernetes.helm.v2.Chart("certmanager", {
  chart: "cert-manager",
  namespace: "cert-manager",
  fetchOpts: {
    repo: helmRepo.jetstack
  },
  values: {
    installCRDs: true,
  }
}, providers);

I used this setup for a while and it was just fine.
At some point problems started where it would just hang my pulumi up forever working on cert-manager.
(maybe it was "caused" by new helm release of cert-manager, since I didn't specify version).

I tried deleting CRDs and the namespace, then doing pulumi refresh, removing the above code from index.ts - and then my deploy worked just fine.
As I would try to apply that cert-manager code cleanly on the same cluster, it would keep hanging again.
I noticed that while it hangs it just keeps forever using 100% cpu on pulumi-kubernetes process...
doing strace -ff -p $PID on that processed showed just a spam of timer related syscalls, I saw no network or I/O activity...

I have some feeling that it's related to the CRDs and installing them seperately would solve the problem, but I haven't checked yet.
It might be related to finalizers as well, as patching or removing CRDs may cause finalizers to hang forever.
I tried deleting the CRDs once and it would hang unless I removed the finalizers first.
On a second try on clean setup, it would delete them just fine.
But I am not convinced it's even trying to patch these CRDs, but I am 100% sure it hangs while "working" on one of them.

Also I noticed that during deploy DigitalOcean's kubernetes API server got into super aggresive throttling mode, where it would start dropping connections before handshake even.
I am not sure yet what's the deal with throttling, waiting on ticket response.
It may be that pulumi was spamming the server all the time because of a bug, or it may be that their limits are just incorrectly set.
If you think throttling can cause pulumi-kubernetes to go into 100% cpu loop, maybe that's the bug here?

I tried running with maximum verbosity but nothing interesting found, except maybe serialization debugs containing the CRDs.

Errors & Logs

Affected product version(s)

Latest pulumi and kubernetes plugin.
DigitalOcean Kubernetes.
Latest helm3

Reproducing the issue

Suggestions for a fix

The text was updated successfully, but these errors were encountered:

ninja- · 2020-07-27T17:12:43Z

#1130 seems related - sometimes it would hang on planning, sometimes on applying, depended on messing with pulumi refresh and some related stuff

ninja- · 2020-07-27T17:18:28Z

#963 #964 as well. but in this case problem doesnt seem to be on nodejs side but golang pulumi-kubernetes

ninja- · 2020-07-27T17:23:30Z

In #1219 there's a hint that downgrading cert-manager to 0.15.2 may help - I will try that, but something is clearly wrong in pulumi-kubernetes process and verbose logging doesn't help much to know.

ninja- · 2020-07-27T17:26:02Z

hmm ok I think I found the issue upstream here: https://cert-manager.io/docs/installation/upgrading/upgrading-0.15-0.16/
They say it would fail the same with helm upgrade and so far helm version with a fix was not released.

Kubernetes bug: kubernetes/kubernetes#91615

XBeg9 · 2020-07-28T21:48:47Z

it's not only using helm, I got hang by doing this:

import * as k8s from "@pulumi/kubernetes";

export const certManager = new k8s.yaml.ConfigFile("cert-manager", {
  file:
    "https://github.com/jetstack/cert-manager/releases/download/v0.16.0/cert-manager.yaml",
});

banerjeeip · 2020-07-30T19:49:55Z

I'm encountering the same issue. Here is my code

from pulumi_kubernetes.yaml import ConfigFile

cert_manager = ConfigFile('cert_manager', 
'https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager-legacy.yaml')

lblackstone · 2020-07-31T22:16:00Z

After further investigation, this issue appears to be triggered by the v0.16.0 version of cert-manager. Their release notes indicate that the underlying issue is a bug in a dependency, and the fix is still pending.

Once the fix has merged, I'll update the Pulumi k8s provider's dependency. For now, I'd suggest sticking with a previous version of cert-manager.

Edit: In the interest of fixing this more quickly, I forked the upstream repo and applied the fix in the fork. I'll cut a release with the fix on Monday.

brandon-martin-bcg · 2020-09-04T23:09:34Z

After further investigation, this issue appears to be triggered by the v0.16.0 version of cert-manager. Their release notes indicate that the underlying issue is a bug in a dependency, and the fix is still pending.

Once the fix has merged, I'll update the Pulumi k8s provider's dependency. For now, I'd suggest sticking with a previous version of cert-manager.

Edit: In the interest of fixing this more quickly, I forked the upstream repo and applied the fix in the fork. I'll cut a release with the fix on Monday.

cert-manager v1.0.0 is still having various issues with pulumi.
Between random hangs, CRD int64/float64 conversion errors this is still very problematic. Using nodejs here.

dotansimha · 2020-09-23T14:48:58Z

Same here with v1.0.1 of cert-manager

lblackstone · 2020-09-23T17:26:59Z

I tested this again this morning with the latest k8s provider release (v2.6.1) and did not encounter the reported hangs. I expected this to be fixed by the changes in #1223, so can you verify that you're using a recent version of the provider?

Here's the code that deployed successfully for me:

import * as k8s from "@pulumi/kubernetes";

export const certManager = new k8s.yaml.ConfigFile("cert-manager", {
    file: "https://github.com/jetstack/cert-manager/releases/download/v1.0.1/cert-manager.yaml",
});

It also worked with a Helm deployment:

import * as k8s from "@pulumi/kubernetes";

const certManagerNamespace = new k8s.core.v1.Namespace("certmanager", { metadata: { name: "cert-manager" } } );
const certmanager = new k8s.helm.v3.Chart("certmanager", {
    chart: "cert-manager",
    namespace: "cert-manager",
    version: "1.0.1",
    fetchOpts: {
        repo: "https://charts.jetstack.io"
    },
    values: {
        installCRDs: true,
    }
});

dotansimha · 2020-09-24T09:15:19Z

@lblackstone Tried multiple things, using latest versions of all tools (kubectl and pulumi). My cluster is in Azure.
Only solution I found is to deploy cert manager separately, without Pulumi, using a more performant cluster.

Changing the cluster size had some effect on installing cert-manager with Pulumi, but I still got it hanging forever (waited ~30 minutes, had just another service deployment in addition to the cert-manager).

Didn't want to give it more tries, since stopping the hanged running Pulumi process caused the state to be corrupted, and I needed to remove all cluster resources and then recreate all of them :(

brandon-martin-bcg · 2020-09-24T19:40:44Z

We actually just abandoned cert manager and went for Azure Front Door for certificate generation/ingress. Cert manager seems to be more trouble than it's worth, and tearing down clusters always causes pulumi stack issues when cert manager resources are present.
Azure Front Door isn't well documented for use with aks, but using static IPs as backends and setting up services to use those IPs with Load Balancers works well. Only downside is Front Door can't handle self signed backend certs, but has been much more reliable on the pulumi front.

Here's an example custom resource I'm using for Front Door

export interface frontDoorOpts extends pulumi.CustomResourceOptions  {
  name: string,
  port: number,
  https?: boolean,
  additionalRoutingRules?:pulumi.Input<azure.types.input.frontdoor.FrontdoorRoutingRule>[]
}

export class FrontDoor extends pulumi.ComponentResource {
  public readonly publicIp:azure.network.PublicIp;
  public readonly frontDoor:azure.frontdoor.Frontdoor;
  public readonly frontDoorHttps:azure.frontdoor.CustomHttpsConfiguration;
  public readonly frontDoorCustomHttps:azure.frontdoor.CustomHttpsConfiguration;
  public readonly dns:azure.dns.CNameRecord;
  
  constructor(name:string, opts:frontDoorOpts) {
    super("pkg:index:fd", name, {}, opts);
    
    this.publicIp = new azure.network.PublicIp(`${name}-public-ip`, {
      name: `${name}-ip`,
      resourceGroupName: resourceGroup.name,
      location: config.get('location'),
      allocationMethod: 'Static',
      sku: 'Standard',
      tags: globalTags
    }, { parent: this })

    this.dns = new azure.dns.CNameRecord(`${name}-dns`, {
      name: opts.name,
      zoneName: zone.name,
      resourceGroupName: zone.resourceGroupName,
      ttl: 300,
      record: `${name}-ingress.azurefd.net`
    }, {parent: this});
    
    this.frontDoor = new azure.frontdoor.Frontdoor(`${name}-frontdoor`, {
      name: `${name}-ingress`,
      resourceGroupName: resourceGroup.name,
      backendPools: [{
        name,
        loadBalancingName: name,
        healthProbeName: name,
        
        backends: [{
          address: this.publicIp.ipAddress,
          httpPort: opts.port,
          httpsPort: opts.port,
          hostHeader: `${opts.name}.${subdomain}`,
        }]
      }],
      frontendEndpoints: [{
        name,
        hostName: `${name}-ingress.azurefd.net`
      }, {
        name: `${name}custom`,
        hostName: `${opts.name}.${subdomain}`
      }],
      backendPoolHealthProbes: [{
        name,
        protocol: opts.https ? 'Https' : 'Http'
      }],
      backendPoolLoadBalancings: [{
        name,
      }],
      enforceBackendPoolsCertificateNameCheck: false,
      routingRules: [{
        name,
        acceptedProtocols: ['Https'],
        frontendEndpoints: [name, `${name}custom`],
        patternsToMatches: ['/*'],
        forwardingConfiguration: {
          forwardingProtocol: opts.https ? 'HttpsOnly' : 'HttpOnly',
          backendPoolName: name
        }
      }, {
        name: 'redirect',
        acceptedProtocols: ['Http'],
        frontendEndpoints: [name, `${name}custom`],
        patternsToMatches: ['/*'],
        redirectConfiguration: {
          redirectProtocol: 'HttpsOnly',
          redirectType: 'Moved'
        }
      },
      ...opts.additionalRoutingRules ? opts.additionalRoutingRules : []
    ],
    }, {parent: this, dependsOn: [this.dns, kubernetesCluster]});

    this.frontDoorHttps = new azure.frontdoor.CustomHttpsConfiguration(`${name}-https`, {
      frontendEndpointId: this.frontDoor.frontendEndpoints.apply(frontendEndpoints => frontendEndpoints[0].id || '/subscriptions/random-id/resourceGroups/fake-rg/providers/Microsoft.FrontDoor/frontendEndpoints/shi'),
      customHttpsProvisioningEnabled: true,
      resourceGroupName: resourceGroup.name,
      customHttpsConfiguration: {
        certificateSource: 'FrontDoor'
      }
    }, {parent: this, dependsOn: this.frontDoor})
    
    this.frontDoorCustomHttps = new azure.frontdoor.CustomHttpsConfiguration(`${name}-custom-https`, {
      frontendEndpointId: this.frontDoor.frontendEndpoints.apply(frontendEndpoints => frontendEndpoints[1].id || '/subscriptions/random-id/resourceGroups/fake-rg/providers/Microsoft.FrontDoor/frontendEndpoints/shi'),
      customHttpsProvisioningEnabled: true,
      resourceGroupName: resourceGroup.name,
      customHttpsConfiguration: {
        certificateSource: 'FrontDoor'
      }
    }, {parent: this, dependsOn: this.frontDoor});

    this.registerOutputs({
      dns: this.dns,
      frontDoor: this.frontDoor,
      frontDoorHttps: this.frontDoorHttps,
      frontDoorCustomHttps: this.frontDoorCustomHttps,
      publicIp: this.publicIp
    })
  }
}

Then I just pass the public IP created in this front door resource into my services

const frontDoor = new FrontDoor(`${name}-frontdoor`, {
      name: ingressName,
      port
    });

const service = new k8s.core.v1.Service(`${name}-app-service`, {
    metadata: {
      namespace,
      name,
      labels: {
        app: selector
      },
      annotations: serviceAnnotations
    },
    spec: {
      externalTrafficPolicy: ingressName ? 'Cluster' : 'Local',
      loadBalancerIP: frontDoor.publicIp.ipAddress,
      ports: [{
        port,
        protocol: 'TCP'
      }],
      selector: {
        app: selector
      },
      sessionAffinity: 'None',
      type: 'LoadBalancer'
    }
  }, { provider: kubernetesProvider, deleteBeforeReplace: true, dependsOn: serviceDependencies, customTimeouts: {
    create: '1h'
  } });

ninja- mentioned this issue Jul 27, 2020

pulumi pre hangs when applying argocd's k8s manifest #1130

Closed

leezen assigned lblackstone Jul 27, 2020

lblackstone mentioned this issue Jul 27, 2020

Update json-patch mod to fix hangs on pulumi update #1223

Merged

gkonc001 mentioned this issue Jul 31, 2020

cert-manager 0.7.2 to 0.8.2 expected kind, but got map #630

Closed

lblackstone mentioned this issue Jul 31, 2020

Throttling request warnings #1228

Closed

lblackstone closed this as completed in #1223 Aug 1, 2020

leezen modified the milestones: current, 0.41 Aug 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installing cert-manager with CRDs using helm hangs pulumi #1222

Installing cert-manager with CRDs using helm hangs pulumi #1222

ninja- commented Jul 27, 2020 •

edited

Loading

ninja- commented Jul 27, 2020

ninja- commented Jul 27, 2020 •

edited

Loading

ninja- commented Jul 27, 2020

ninja- commented Jul 27, 2020 •

edited

Loading

XBeg9 commented Jul 28, 2020

banerjeeip commented Jul 30, 2020

lblackstone commented Jul 31, 2020 •

edited

Loading

brandon-martin-bcg commented Sep 4, 2020 •

edited

Loading

dotansimha commented Sep 23, 2020

lblackstone commented Sep 23, 2020 •

edited

Loading

dotansimha commented Sep 24, 2020 •

edited

Loading

brandon-martin-bcg commented Sep 24, 2020 •

edited

Loading

Installing cert-manager with CRDs using helm hangs pulumi #1222

Installing cert-manager with CRDs using helm hangs pulumi #1222

Comments

ninja- commented Jul 27, 2020 • edited Loading

Problem description

Errors & Logs

Affected product version(s)

Reproducing the issue

Suggestions for a fix

ninja- commented Jul 27, 2020

ninja- commented Jul 27, 2020 • edited Loading

ninja- commented Jul 27, 2020

ninja- commented Jul 27, 2020 • edited Loading

XBeg9 commented Jul 28, 2020

banerjeeip commented Jul 30, 2020

lblackstone commented Jul 31, 2020 • edited Loading

brandon-martin-bcg commented Sep 4, 2020 • edited Loading

dotansimha commented Sep 23, 2020

lblackstone commented Sep 23, 2020 • edited Loading

dotansimha commented Sep 24, 2020 • edited Loading

brandon-martin-bcg commented Sep 24, 2020 • edited Loading

ninja- commented Jul 27, 2020 •

edited

Loading

ninja- commented Jul 27, 2020 •

edited

Loading

ninja- commented Jul 27, 2020 •

edited

Loading

lblackstone commented Jul 31, 2020 •

edited

Loading

brandon-martin-bcg commented Sep 4, 2020 •

edited

Loading

lblackstone commented Sep 23, 2020 •

edited

Loading

dotansimha commented Sep 24, 2020 •

edited

Loading

brandon-martin-bcg commented Sep 24, 2020 •

edited

Loading