Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Fix) Updating helm charts to support version 3 windows integration #754

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

RagalahariP
Copy link

@RagalahariP RagalahariP commented May 18, 2023

Updating helm charts to support version 3 windows integration

Testing : Tested in local EKS cluster with the following node config

kubectl get nodes -o wide
NAME                          STATUS   ROLES    AGE     VERSION                INTERNAL-IP   EXTERNAL-IP      OS-IMAGE                         KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-11-231.ec2.internal   Ready    <none>   3d21h   v1.23.17-eks-a59e1f0   10.0.11.231   18.208.247.135   Windows Server 2022 Datacenter   10.0.20348.1668                docker://20.10.21
ip-10-0-110-39.ec2.internal   Ready    <none>   22d     v1.23.17-eks-a59e1f0   10.0.110.39   <none>           Windows Server 2019 Datacenter   10.0.17763.4252                docker://20.10.21
ip-10-0-27-190.ec2.internal   Ready    <none>   22d     v1.23.17-eks-a59e1f0   10.0.27.190   3.95.250.93      Amazon Linux 2                   5.4.238-148.347.amzn2.x86_64   docker://20.10.17

Dashboard: https://onenr.io/0BQ18La75jx

Original PR (with comments): #744

@RagalahariP RagalahariP requested a review from a team as a code owner May 18, 2023 07:20
enableWindows: false
# -- List of Windows versions present in the cluster. Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022
windowsOsList: []
# - version: 2019
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended for those to be commented out?

Copy link
Author

@RagalahariP RagalahariP May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrepai Yes, WindowsOsList is required if enableWindows is set to True.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrepai I have updated comment.

@RagalahariP RagalahariP requested a review from nrepai May 22, 2023 18:08
Copy link
Contributor

@juanjjaramillo juanjjaramillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. My only concern is that in the description it seems you only tested using Kubernetes v1.23:

kubectl get nodes -o wide
NAME                          STATUS   ROLES    AGE     VERSION                INTERNAL-IP   EXTERNAL-IP      OS-IMAGE                         KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-11-231.ec2.internal   Ready    <none>   3d21h   v1.23.17-eks-a59e1f0   10.0.11.231   18.208.247.135   Windows Server 2022 Datacenter   10.0.20348.1668                docker://20.10.21
ip-10-0-110-39.ec2.internal   Ready    <none>   22d     v1.23.17-eks-a59e1f0   10.0.110.39   <none>           Windows Server 2019 Datacenter   10.0.17763.4252                docker://20.10.21
ip-10-0-27-190.ec2.internal   Ready    <none>   22d     v1.23.17-eks-a59e1f0   10.0.27.190   3.95.250.93      Amazon Linux 2                   5.4.238-148.347.amzn2.x86_64   docker://20.10.17

This version is no longer supported. If possible, can we test using a v1.26 cluster?

RagalahariP added 2 commits May 25, 2023 17:00
…inux nodes. And collecting metrics from Windows nodes will continue the same approach in V2
@juanjjaramillo
Copy link
Contributor

The added selectors look good to me and match others that we have included already. Anyone else has a different opinion?

@@ -161,13 +161,15 @@ integrations that you have configured.
| kubelet.config.retries | int | `3` | Number of retries after timeout expired |
| kubelet.config.scraperMaxReruns | int | `4` | Max number of scraper rerun when scraper runtime error happens |
| kubelet.config.timeout | string | `"10s"` | Timeout for the kubelet APIs contacted by the integration |
| kubelet.enableWindows | bool | `false` | Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure this readme is auto-generated by helm-docs, otherwise, it will be overwritten

@@ -18,6 +18,7 @@ rules:
- "services"
- "nodes"
- "namespaces"
- "pods" # Needed for our Windows integration to be able to access KSM on a Linux node.
Copy link
Contributor

@xqi-nr xqi-nr Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RagalahariP From v2 to v3, the pods resource is removed for linux support for some reason as you mentioned during the meeting. The reason may be reducing the security surface.

What I mean during the meeting is if we can adding it back for windows only, that will be better .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xqi-nr after you left the meeting we agreed that this comment is a good compromise. @RagalahariP will contact @paologallinaharbur to request his review of this line change since he was involved in the project to upgrade V2 to V3 and may have historical context to validate this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Yes, I'll take a look!

In general, we have never added support for Windows due to the lack of a canary environment, tooling, and proper pipelines to build the images.
Consider to specify clearly that this is experimental and might change in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, answering this particular point, it was a permission not needed anymore and therefore it was dropped.
You can reintroduce it without any issues, however if you want to ask for as few permission as possible you can add this specific line behind a "if windows.enabled=true" instead of a comment

Copy link
Member

@paologallinaharbur paologallinaharbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a different approach.
If merged you will end-up in a mixed scenario with both V2 a V3 images running (V2 on Windows and V3 on Linux).

This could lead to multiple issues if not well configured, for example, KSM could be scraped form two different places (from a linux and a Windows node).

I would have expected to see V3 Windows images

nodeSelector:
kubernetes.io/os: linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that kubernetes.io/os: linux is present on all user nodes?

serviceAccountName: {{ include "newrelic.common.serviceAccount.name" $ }}
containers:
- name: agent
image: newrelic/infrastructure-k8s:{{ .imageTag }}
Copy link
Member

@paologallinaharbur paologallinaharbur Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a V2 image, why not installing the nri-bundle-v2 next to the V3 then? In this way we are mixing them up in the same chart!

@@ -161,13 +161,15 @@ integrations that you have configured.
| kubelet.config.retries | int | `3` | Number of retries after timeout expired |
| kubelet.config.scraperMaxReruns | int | `4` | Max number of scraper rerun when scraper runtime error happens |
| kubelet.config.timeout | string | `"10s"` | Timeout for the kubelet APIs contacted by the integration |
| kubelet.enableWindows | bool | `false` | Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding the V2 image under the Kubelet component. However, the V2 is a bundle of kubelet+KSM+controlPlane component

| kubelet.enabled | bool | `true` | Enable kubelet monitoring. Advanced users only. Setting this to `false` is not supported and will break the New Relic experience. |
| kubelet.extraEnv | list | `[]` | Add user environment variables to the agent |
| kubelet.extraEnvFrom | list | `[]` | Add user environment from configMaps or secrets as variables to the agent |
| kubelet.extraVolumeMounts | list | `[]` | Defines where to mount volumes specified with `extraVolumes` |
| kubelet.extraVolumes | list | `[]` | Volumes to mount in the containers |
| kubelet.hostNetwork | bool | Not set | Sets pod's hostNetwork. When set bypasses global/common variable |
| kubelet.tolerations | list | Schedules in all tainted nodes | Tolerations for the control plane DaemonSet. |
| kubelet.windowsOsList | list | `[]` | `windowsOsList` is only required if `enableWindows` is true. It contains list of Windows versions present in the cluster. Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022 |
Copy link
Member

@paologallinaharbur paologallinaharbur Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Our Kubernetes integration supported Windows versions"

I would always specify that it is alpha

template:
metadata:
annotations:
checksum/nri-kubernetes: {{ include (print $.Template.BasePath "/kubelet/scraper-configmap.yaml") $ | sha256sum }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nri-kubernetes in V2 branch does not read any configuration, that was introduced in V3.

I wonder if you actually compiled the V3 branch and pushed it to the old repo newrelic/infrastructure-k8s, is that the case?

There are some discrepancies:

  • kubelet/scraper-configmap.yaml is V3
  • NRIA_PASSTHROUGH_ENVIRONMENT is V2
  • app.kubernetes.io/component: kubelet is V3
  • having just one container is definitely V2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how a user could change the config mounted? or set a proxy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not understand the approach. However, this seems a V2 component being merged into the V3 chart.

The image used is newrelic/infrastructure-k8s, the is just one container, and the nri-kubernetes process seems controlled by the agent. Maybe this is the fastest way to get Windows support, but there is a big risk of tech dept

Copy link
Contributor

@juanjjaramillo juanjjaramillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paologallinaharbur has raised valid and serious issues that need to be addressed before we can merge the PR. I'm just marking the PR such that we do not merge it by mistake before his comments are fully addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants