You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Occasionally the XR Synced status switches to False temporarily due to conversion webhook error(s). This status does not propagate upwards to the claim.
Relevant Error Output Snippet
Warning ComposeResources 2m15s (x9 over 16m) defined/compositeresourcedefinition.apiextensions.crossplane.io cannot compose resources: cannot apply composed resource "aks_cluster": failed to prune fields: failed add back owned items: failed to convert pruned object at version containerservice.azure.upbound.io/v1beta2: conversion webhook forcontainerservice.azure.upbound.io/v1beta1, Kind=KubernetesCluster returned invalid metadata: invalid metadata of type <nil>in input object
And beta trace:
$ crossplane beta trace -n claim-namespace myaks dev-aks-cluster-1 -o wide
NAME RESOURCE SYNCED READY STATUS
AKS/dev-aks-cluster-1 (dev-azure-eastus2) True True Available
└─ XAKS/dev-aks-cluster-1-mnxjr False True ReconcileError: cannot compose resources: cannot apply composed resource "aks_cluster": failed to prune fields: failed add back owned items: failed to convert pruned object at version containerservice.azure.upbound.io/v1beta2: conversion webhook for containerservice.azure.upbound.io/v1beta1, Kind=KubernetesCluster returned invalid metadata: invalid metadata of type <nil> in input object
├─ XAKSNodepoolSet/dev-aks-cluster-1-nodepool-set nodepool_set True True Available
│ ├─ KubernetesClusterNodePool/dev-aks-cluster-1-generalnp ng_generalnp True True Available
│ └─ NodePoolCalculation/dev-aks-cluster-1-calc node_pool_calculation True True Available
├─ XPermission/dev-aks-cluster-1-ca-permission cluster_autoscaler_permission True True Available
│ ├─ RoleAssignment/dev-aks-cluster-1-cluster-autoscaler-assignment role_assignment True True Available
│ ├─ RoleDefinition/dev-aks-cluster-1-cluster-autoscaler-definition role_definition True True Available
│ ├─ FederatedIdentityCredential/dev-aks-cluster-1-cluster-autoscaler-fedid federated_identity True True Available
│ └─ UserAssignedIdentity/dev-aks-cluster-1-cluster-autoscaler-identity identity True True Available
├─ KubernetesCluster/dev-aks-cluster-1-mnxjr aks_cluster True True Available
├─ EventHubNamespace/dev-aks-cluster-1-test queue-dev-aks-cluster-1-test True True Available
├─ EventHubNamespace/dev-aks-cluster-1-test-data queue-dev-aks-cluster-1-test-data True True Available
Crossplane Version
1.14.3
Provider Version
1.3.0
Kubernetes Version
v1.28.9
Kubernetes Distribution
AKS
Additional Info
Hi, we were previously hitting #645 and the workaround/fix was to upgrade our MR:
Since upgrading we're noticing that occasionally (and seemingly randomly) the XR Synced status flips to False due to a conversion webhook failure. This condition only lasts for tens of seconds before Synced then reverts back to True. Sometimes this can take over an hour to reoccur. This status change does not propagate upwards to the claim (as shown in the crossplane beta trace output).
I unfortunately don't have a minimal reproduction available as the environment/configuration displaying this behaviour is very complex, multiple pipeline steps etc and I haven't had much luck reproducing with an environment running in kind.
There are no interesting logs in the Crossplane, Provider, or Function pods (even with --debug enabled), or the AKS control plane.
My limited debugging led me to this Kubernetes bug: kubernetes/kubernetes#117356 - which seems plausible as I can see that the CRD only stores v1beta1:
We recreated our claim (by deleting it and all associated XRs, MRs etc.) and the conversion webhook errors haven't reappeared for a number of days.
This chart shows the count of webhook conversion failures over a ~7 day period. You can see that the failures have stopped entirely around the time we deleted/recreated the claim.
It's worth noting that we had originally upgraded the MR from v1beta1 to v1beta2, so I'm not sure if recreating it at v1beta2 has anything to do with the lack of failures, but that seems unlikely to me as this error only appeared weeks after the initial claim creation.
If we see the error reappear, I'll update the issue.
Just adding here that we've seen this reappear ~3 weeks later. No obvious correlation between the errors reappearing and changes to our composition etc.
Is there an existing issue for this?
Affected Resource(s)
kubernetesclusters.containerservice.azure.upbound.io/v1beta1
Resource MRs required to reproduce the bug
No response
Steps to Reproduce
KubernetesCluster
managed resource via a Composition Function (we're using https://github.com/crossplane-contrib/function-cue)What happened?
Occasionally the XR
Synced
status switches toFalse
temporarily due to conversion webhook error(s). This status does not propagate upwards to the claim.Relevant Error Output Snippet
And beta trace:
Crossplane Version
1.14.3
Provider Version
1.3.0
Kubernetes Version
v1.28.9
Kubernetes Distribution
AKS
Additional Info
Hi, we were previously hitting #645 and the workaround/fix was to upgrade our MR:
Since upgrading we're noticing that occasionally (and seemingly randomly) the XR
Synced
status flips toFalse
due to a conversion webhook failure. This condition only lasts for tens of seconds beforeSynced
then reverts back toTrue
. Sometimes this can take over an hour to reoccur. This status change does not propagate upwards to the claim (as shown in thecrossplane beta trace
output).I unfortunately don't have a minimal reproduction available as the environment/configuration displaying this behaviour is very complex, multiple pipeline steps etc and I haven't had much luck reproducing with an environment running in
kind
.There are no interesting logs in the Crossplane, Provider, or Function pods (even with
--debug
enabled), or the AKS control plane.My limited debugging led me to this Kubernetes bug: kubernetes/kubernetes#117356 - which seems plausible as I can see that the CRD only stores
v1beta1
:provider-upjet-azure/package/crds/containerservice.azure.upbound.io_kubernetesclusters.yaml
Line 11089 in 65f8cdc
Any ideas on where to look next?
The text was updated successfully, but these errors were encountered: