Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Side diff failing for fluent-bit #17568

Open
3 tasks
andrewjamesbrown opened this issue Mar 19, 2024 · 22 comments
Open
3 tasks

Server Side diff failing for fluent-bit #17568

andrewjamesbrown opened this issue Mar 19, 2024 · 22 comments
Labels
bug Something isn't working component:argo-cd server-side-diff version:2.13 Latest confirmed affected version is 2.13

Comments

@andrewjamesbrown
Copy link

andrewjamesbrown commented Mar 19, 2024

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

One of our ArgoCD instances is showing the following error when upgrading fluent-bit 0.30.4 -> 0.44.0:

Failed to compare desired state to live state: failed to calculate diff: error calculating server side diff: serverSideDiff error: error removing non config mutations for resource DaemonSet/fluent-bit: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value)

To Reproduce
We use a Kyverno policy to modify images to point to a local ECR cache instead of pulling from dockerhub directly.
When upgrading the fluent-bit helm chart from 0.30.4 to 0.44.0, we get the error above.
We are using ArgoCD v2.10.3+0fd6344

Expected behavior

Screenshots

Version

v2.10.3+0fd6344

Logs

Paste any relevant application logs here.
@andrewjamesbrown andrewjamesbrown added the bug Something isn't working label Mar 19, 2024
@bryanhorstmann
Copy link

Just ran into a similar issue with kube-prometheus-stack. I had to set controller.diff.server.side: "false" in order to unblock myself. ArgoCD server: v2.10.0+2175939

I deleted a list from my values files and ran into this

@0xDones
Copy link

0xDones commented Mar 20, 2024

I'm having the same issue.

I had to set controller.diff.server.side: "true" to fix another error I was getting with SyncOptions.ServerSideApply=true on my applications, but now I'm getting this new error.

Failed to compare desired state to live state: failed to calculate diff: error calculating server side diff: 
serverSideDiff error: error removing non config mutations for resource StatefulSet/loki-backend: 
error reverting webhook removed fields in predicted live resource: errors: 
.spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value) 
.spec.template.spec.containers: element 1: associative list with keys has an element that omits key field "name" (and doesn't have default value)

@algo7
Copy link

algo7 commented Apr 2, 2024

Same issue with

{
    "Version": "v2.10.5+335875d",
    "BuildDate": "2024-03-28T15:02:45Z",
    "GitCommit": "335875d13e018bed6e03873f4742582582964745",
    "GitTreeState": "clean",
    "GoVersion": "go1.21.3",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.2.1 2023-10-19T20:13:51Z",
    "HelmVersion": "v3.14.3+gf03cc04",
    "KubectlVersion": "v0.26.11",
    "JsonnetVersion": "v0.20.0"
}

and csi-driver-nfs-v4.6.0

@ptr1120
Copy link

ptr1120 commented Apr 5, 2024

I am now having the same issue with our custom deployment + server-side diff activated, after updating ArgoCd v2.10.4 -> v2.10.6:

Failed to compare desired state to live state: failed to calculate diff: error calculating server side diff: serverSideDiff error: error removing non config mutations for resource StatefulSet/xxx: error reverting webhook removed fields in predicted live resource: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value)

Even though my container 0 has a name.

How to fix that?

@atlasoft
Copy link

Just ran into this and checking "apply only" allowed me to sync the application again.

image

@Skaronator
Copy link

Skaronator commented May 21, 2024

Just ran into this as well with the loki-distributed, mimir-distributed and grafana-agent helm-chart. We had server side apply and server-side diff enabled for a few weeks, and now it just broke. I only modified CPU/MEM resource limits/requests. Nothing else changed.

The "apply only" mentioned above didn't help either.

Running the latest version, 2.11.0.

@Skaronator
Copy link

I found a workaround for my issue. I deleted the affected StatefulSet/Deployment with Orphan, which means it doesn't delete any pods. Then ran ArgoCD Sync again, which re-created the STS/Deploy resources.

@Skaronator
Copy link

Skaronator commented May 22, 2024

Looks like my issue is more related to ignoreDifferences not working with ServerSideDiff. There is already an open issue: #17362

Edit: nvm, removing all ignoreDifferences didn't fix it.
Edit2: We switched back to client side diff and apply and only use server side apply for specific resourced (e.g. very large grafana dashboards)

@STollenaar
Copy link

I think I found the issue related to ServerSide diff. Which is and issue inside the gitops-engine repo. Basically when you have a nested value being changed it breaks the map traversal used for doing these server side comparisons. I tried debugging it and applying a bandaid fix that I don't know if it would even be right argoproj/gitops-engine@c25fd94#diff-00282c65a618a9ea64cdb99da5137663dc5773f2c3fd8c37ed2e9a99f3d67f09L254

@algo7
Copy link

algo7 commented Jul 23, 2024

Any update on this?

@gmauleon
Copy link

Having the same problem: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value)

Modified some containers environment variables in my spec.
ArgoCD 2.11.3 with server-side diff activated. Syncing manually fixed the problem.

But the main worry in our case is that's it's kind of a silent error, the app will flap from sync to unknown for a couple of minute and then back to sync for an hour every hour until we discovered it and manually sync. That last bug might be due to the fact that we use applicationset with progressive sync though.

@adberger
Copy link
Contributor

Having the same problem: .spec.template.spec.containers: element 0: associative list with keys has an element that omits key field "name" (and doesn't have default value)

Modified some containers environment variables in my spec. ArgoCD 2.11.3 with server-side diff activated. Syncing manually fixed the problem.

But the main worry in our case is that's it's kind of a silent error, the app will flap from sync to unknown for a couple of minute and then back to sync for an hour every hour until we discovered it and manually sync. That last bug might be due to the fact that we use applicationset with progressive sync though.

We also have this problem (Unknown Error State) and we don't use ApplicationSets with Progressive Sync.

@gmauleon
Copy link

gmauleon commented Jul 24, 2024

Update: the error happens in a portion of the code that revert webhook mutation in the diffs as stated by @STollenaar

So on our side adding the IncludeMutationWebhook=true in the already present compare-options annotation like so argocd.argoproj.io/compare-options: ServerSideDiff=true,IncludeMutationWebhook=true, bypass the error.

Not sure what other problems can arise down the line by setting this option though...

@adberger
Copy link
Contributor

Update: the error happens in a portion of the code that revert webhook mutation in the diffs as stated by @STollenaar

So on our side adding the IncludeMutationWebhook=true in the already present compare-options annotation like so argocd.argoproj.io/compare-options: ServerSideDiff=true,IncludeMutationWebhook=true, bypass the error.

Not sure what other problems can arise down the line by setting this option though...

Unknown Error state also gone?

@gmauleon
Copy link

Yes the unknown in this case was because of the errors while doing diffs, so there is definitely a problem in the code that "ignore the webhooks" in server side diffs, but so far it's a good workaround.

Including webhooks mutation in diffs will probably cause some unwanted differences though, depending on what webhooks you have in your clusters, in our case just ignoring the /metadata/generation at large did the trick.

      ignoreDifferences:
      - group: '*'
        jsonPointers:
        - /metadata/generation
        kind: '*'

@sstarcher
Copy link

I'm seeing similar issues for kube-prometheus-stack


ComparisonError: Failed to compare desired state to live state: failed to perform pre-diff normalization: error building typed results: error creating typedConfig: .spec.containers[1].ports: element 0: associative list with keys has an element that omits key field "protocol" (and doesn't have default value)

@andrii-korotkov-verkada
Copy link
Contributor

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and tell us if the issue is still present, please?

@andrii-korotkov-verkada andrii-korotkov-verkada added the version:EOL Latest confirmed affected version has reached EOL label Nov 11, 2024
@sstarcher
Copy link

Still an issue in 2.12.6

@andrii-korotkov-verkada andrii-korotkov-verkada added version:2.12 Latest confirmed affected version is 2.12 and removed version:EOL Latest confirmed affected version has reached EOL labels Nov 11, 2024
@zswanson
Copy link

zswanson commented Dec 2, 2024

Observed as an issue in 2.13 too
(should probably rename the issue)

@andrii-korotkov-verkada andrii-korotkov-verkada added version:2.13 Latest confirmed affected version is 2.13 and removed type:bug version:2.12 Latest confirmed affected version is 2.12 labels Dec 2, 2024
@thecosmicfrog
Copy link

thecosmicfrog commented Dec 3, 2024

People experiencing this issue may want to follow my issue here, where @andrii-korotkov-verkada has been doing some digging into Kubernetes structured-merge-diff: #20792

I'm thinking we're all seeing roughly the same symptoms?

@cwstrommer
Copy link

@thecosmicfrog I'm inclined to concur that you're seeing the same issue from a different perspective. Same goes for mine, #19994 , where the the block is renamed instead of removed - result being that the block is duplicated. I focused on the steps needed to consistently trigger this, but I agree that @andrii-korotkov-verkada has likely pinpointed the root cause.

@thecosmicfrog
Copy link

Hi all. A fix has been added to the master branch of Argo CD that has fixed the [...] omits key field "name" (and doesn't have default value) issue, at least for my reported issue:

#20792 (comment)

I no longer get this error with server-side diff and server-side apply enabled. You may want to re-test.

Per @andrii-korotkov-verkada, you can build the latest image by cloning https://github.com/argoproj/argo-cd and running make image. I then re-tagged by running:

docker image tag argocd:latest <your-docker-repo>/argocd:issue-20792-fix

I then inserted this image into argo-helm by setting values as such:

[...]

global:
  image:
    repository: <your-docker-repo>/argocd
    tag: issue-20792-fix

  # If uploaded to a private registry, you might need to add this:
  imagePullSecrets:
    - name: <secret-name>

[...]

...and helm upgrade'd my Argo CD.

Hopefully saves people a few minutes looking for the relevant values 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:argo-cd server-side-diff version:2.13 Latest confirmed affected version is 2.13
Projects
None yet
Development

No branches or pull requests