Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceAccountKey gives error private_key: not a string and never becomes ready after gcp-provider pod is restarted (for instance after a version upgrade) #307

Closed
jonnylangefeld opened this issue May 20, 2023 · 4 comments · Fixed by #314
Labels
bug Something isn't working community is:triaged

Comments

@jonnylangefeld
Copy link

jonnylangefeld commented May 20, 2023

What happened?

We use the pattern of a ServiceAccount together with a ServiceAccountKey quite a lot, as suggested by the examples.

However, we recently upgraded provider-gcp from v0.27.0 to v.0.32.0, which restarted the pod. After this process all our ServiceAccountKey resources got a condition like

  - lastTransitionTime: "2023-05-20T00:59:13Z"
    message: 'observe failed: cannot get connection details: cannot get additional
      connection details: private_key: not a string'
    reason: ReconcileError
    status: "False"
    type: Synced

This gets all resources depending on ServiceAccountKeys into a non-ready state.

How can we reproduce it?

  1. Create a ServiceAccount and a ServiceAccountKey as suggested by the example:
    https://github.com/upbound/provider-gcp/blob/a74e05300b59c0e804ee241a974122bfad0b82d2/examples/cloudplatform/serviceaccountkey.yaml#L1-L28

  2. Wait for ServiceAccountKey to be synced and ready.

  3. Restart the gcp-provider pod.

What environment did it happen in?

  • Crossplane Version: v1.12.0
  • Provider Version: v1.32.0
  • Kubernetes Version: 1.23
  • Kubernetes Distribution: GKE
@jonnylangefeld jonnylangefeld added bug Something isn't working needs:triage labels May 20, 2023
@jonnylangefeld
Copy link
Author

Some minimal analysis shows that the specific error is produced by the upbound/upjet library.

I believe it has to do with the GCP API only responding with the private_key property upon first creation. Any time after it doesn't respond with private_key anymore. This is indicated by the terraform provider code: This is only populated when creating a new key.

@ulucinar
Copy link
Collaborator

ulucinar commented May 25, 2023

Hi @jonnylangefeld,
Thank you for reporting this issue. I assume you are not using a connection details secret with the ServiceAccountKey.cloudplatform resources with which you observe this issue, right? One way to publish the connection details (in the ServiceAccountKey's case, this includes the associated private_key sensitive attribute), you may use something like:

apiVersion: cloudplatform.gcp.upbound.io/v1beta1
kind: ServiceAccountKey
metadata:
  annotations:
    meta.upbound.io/example-id: cloudplatform/v1beta1/serviceaccountkey
  labels:
    testing.upbound.io/example-name: example-service-account-key
  name: example-service-account-key
spec:
  forProvider:
    publicKeyType: TYPE_X509_PEM_FILE
    serviceAccountIdSelector:
      matchLabels:
        testing.upbound.io/example-name: example-service-account-key
  writeConnectionSecretToRef:
    name: gcp-service-account-key
    namespace: upbound-system

Please note the spec.writeConnectionSecretToRef for publishing the connection details as a Kubernetes secret.

If/when you do not do this, I think we are hitting a bug with the associated resource configuration here:
https://github.com/upbound/provider-gcp/blob/43d33c797f4368ae126438cda6be165720e965fa/config/cloudplatform/config.go#LL64C13-L64C13

I guess the paved object's GetString call done here:
https://github.com/upbound/provider-gcp/blob/43d33c797f4368ae126438cda6be165720e965fa/config/common/config.go#L62
is probably failing (we need to check but I think the not a string in the reported error originates from this call).

So, if this is the case, we need to fix the configuration so that it does not assume the existence of the string private_key attribute because it might be missing if that attribute's value has not been published to a connection details secret.

But still, I think it makes sense to use a connection details secret with this resource. What do you think?

@ulucinar
Copy link
Collaborator

A relevant testing issue is here: upbound/official-providers-ci#92

@jonnylangefeld
Copy link
Author

jonnylangefeld commented May 26, 2023

We don't use

  writeConnectionSecretToRef:
    name: gcp-service-account-key
    namespace: upbound-system

But we use

  publishConnectionDetailsTo:
    name: <a generated name filled in by a patch>
    configRef:
      name: default

and the default StoreConfig the configRef is referring to has spec.type: Kubernetes.
With that we were still hitting the issue.

I might have misinterpreted this part of your reply:

But still, I think it makes sense to use a connection details secret with this resource. What do you think?
I thought with publishConnectionDetailsTo we do use a connection details secret. Or did you mean something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community is:triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants