Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network policy for RKE2 Ingress Nginx #4300

Open
wants to merge 2 commits into
base: dev-v2.10
Choose a base branch
from

Conversation

dharmit
Copy link
Contributor

@dharmit dharmit commented Jul 30, 2024

Issue:

Addresses issue raised in rancher/rancher#45603.

Problem

Prometheus target failing to come up

Solution

Using the modified chart in this PR enables the target to come up

Testing

  • Install rancher-monitoring-crd through this PR
  • Install rancher-monitoring using following Helm command:
    helm install rancher-monitoring charts/rancher-monitoring/105.0.0-rc1+up57.0.3/ \
    --namespace=cattle-monitoring-system \
    --set alertmanager.alertmanagerSpec.configSecret=alertmanager-rancher-monitoring-alertmanager \
    --set alertmanager.alertmanagerSpec.useExistingSecret=true \
    --set global.cattle.clusterId=local \
    --set global.cattle.clusterName=local \
    --set global.cattle.rkePathPrefix="" \
    --set global.cattle.rkeWindowsPathPrefix="" \
    --set global.cattle.systemDefaultRegistry="" \
    --set global.cattle.systemProjectId=p-cwh9d \
    --set global.cattle.url=https://13.202.137.42.sslip.io \
    --set global.systemDefaultRegistry="" \
    --set prometheus.prometheusSpec.evaluationInterval=1m \
    --set prometheus.prometheusSpec.retentionSize=50GiB \
    --set prometheus.prometheusSpec.scrapeInterval=1m \
    --set rke2ControllerManager.enabled=true \
    --set rke2Etcd.enabled=true \
    --set rke2IngressNginx.enabled=true \
    --set rke2IngressNginx.networkPolicy.enabled=true \
    --set rke2Proxy.enabled=true \
    --set rke2Scheduler.enabled=true
  • Go to Rancher UI; Monitoring; Prometheus Targets
  • In the Prometheus UI search for the ServiceMonitor serviceMonitor/kube-system/rancher-monitoring-ingress-nginx/0. It should be UP.

Engineering Testing

Manual Testing

Automated Testing

QA Testing Considerations

Regressions Considerations

Backporting considerations

@dharmit dharmit requested a review from a team as a code owner July 30, 2024 13:48
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Jul 30, 2024

I have honestly got no clue about why there are so many lines modified in this PR. Steps I followed are below:

$ export PACKAGE=rancher-monitoring/rancher-monitoring
$ make prepare
// made changes
$ make patch
$ make charts

Did I do something wrong here? 🤔

Copy link
Contributor

@alexandreLamarre alexandreLamarre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rancher monitoring version needs to be bumped to 104.1.1-rc1. Recommend separating make patch into a separate commit from make charts for exactly this scenario, where you would to do something like rebase with drop.

Need to remove packages/rancher-monitoring/rancher-monitoring/generated-changes/overlay/crds as a bug in charts-build-scripts includes these by accident.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Jul 31, 2024

@joshmeranda @alexandreLamarre thanks for your feedback, guys! I have modified as per your recommendations. Can you PTAL again? Thanks.

Copy link
Contributor

@joshmeranda joshmeranda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better, but git history could use some housekeeping. Can you update the commits to look closer to the steps outline here? Generally this will look like this:

  1. export PACKAGE=rancher-monitoring/rancher-monitoring
  2. make prepare
  3. Make the changes you need to the prepare chart, in this case adding the network policy
  4. make patch
  5. Commit the changes to packages/rancher-monitoring/rancher-monitoring/generated-changes (don't forget to remove generated-changes/overlay/crds introduced by the bug)
  6. Update the version in package.yaml
  7. Commit changes to package.yaml
  8. make charts
  9. Commit changes made by make charts
  10. Update and commit release.yaml with new version of chart

Following these steps will make it a good bit easier for the reviewer to see what was changed and why. Also make each commit smaller and prevents new chart versions from stalling our browsers when trying to load 100+ new files. Thanks!

@dharmit
Copy link
Contributor Author

dharmit commented Jul 31, 2024

@joshmeranda I referred to the Making Changes To Packages section which mentions making version related changes first. In context of your comment, I followed steps 6-9 first and 1-5 after that.

If I follow the steps you suggested, it'd make changes to 104.1.0. And, AFAIK, 104.1.0 has already released, right? Correct me if I'm wrong because there's a good chance I am. :)

@joshmeranda
Copy link
Contributor

joshmeranda commented Jul 31, 2024

@dharmit You're right that we don't want to make changes to 104.1.0 but since you won't run make charts until after you bump the the version in step 6/7 you shouldn't be updating any existing charts.

I keep forgetting that the docs recommend you update the version first. That's fine as long as its a separate commit. Generally I'll update package.yaml in its own commit since its easier to see the new version in a commit with 1 changed file rather than 100+ changes files, then run make charts at the end. This will still create the new charts at versions 104.1.1 since package.yaml will have the bumped version.

index.yaml Outdated
- name: Chart Source
url: https://github.com/prometheus-community/helm-charts
- name: Upstream Project
url: https://github.com/prometheus-operator/kube-prometheus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra space before url:. Strictly speaking it's not a yaml error, as this block is a string, but it looks like it's intended to be input for some other yaml process, and therefore should be conforming yaml code

Copy link
Contributor

@ericpromislow ericpromislow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in this process is generating broken yaml.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 2, 2024

I keep forgetting that the docs recommend you update the version first. That's fine as long as its a separate commit. Generally I'll update package.yaml in its own commit since its easier to see the new version in a commit with 1 changed file rather than 100+ changes files, then run make charts at the end. This will still create the new charts at versions 104.1.1 since package.yaml will have the bumped version.

I see your point. However, what I have done is again just as per the docs recommendation. It recommends updating the version in package.yaml, then doing make charts, and committing the changes.

I can totally do what you're recommending if that's the general practice followed by O&B team. Let me know.

@joshmeranda
Copy link
Contributor

joshmeranda commented Aug 2, 2024

@dharmit Yeah that's totally fine to do it that way, its just less familiar. It makes it easier to ensure that the changes you made were brought into the generated chart. Feel free to continue!

But you'll still want to have separate commits for updating package.yaml, building the new charts at that version, making the necessary changes, and rebuilding the charts. The commit history for something like this pr should probably look like this:

  1. Commit 1: bump rancher-monitoring/rancher-monitoring version to 104.1.1+up57.0.3
  2. Commit 2: make charts
  3. Commit 3: update release.yaml (can go anywhere as long as its updated`
  4. Commit 4: Add network policy for RKE2 Nginx Ingress
  5. Commit 5: 'make charts'

I get that this feels somewhat nitpicky, but because of the sheer amount of files that are updated in charts PRs it makes it much easier for reviewers to give a review.

Copy link

github-actions bot commented Aug 2, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 2, 2024

I get that this feels somewhat nitpicky, but because of the sheer amount of files that are updated in charts PRs it makes it much easier for reviewers to give a review.

No, it doesn't sound nitpicky at all, TBH. Thanks a tonne for laying down the steps. I'll update the PR accordingly. 👍🏽

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

github-actions bot commented Aug 5, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

release.yaml Outdated
fleet:
- 104.0.1+up0.10.1-rc.1
fleet-agent:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid adding references to charts that weren't updated in this PR. Remove the fleet-agent and rancher-vsphere-csi fields please. Otherwise LGTM!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, those changes are based on make check-release-yaml.

Copy link
Contributor

@joshmeranda joshmeranda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see release.yaml comments above

Copy link

github-actions bot commented Aug 9, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@ericpromislow
Copy link
Contributor

I don't think you need my review here -- reviews from Josh and Alexandre are sufficient.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit dharmit changed the base branch from dev-v2.9 to dev-v2.10 August 13, 2024 10:31
@dharmit
Copy link
Contributor Author

dharmit commented Aug 13, 2024

@joshmeranda @alexandreLamarre I have cherry-picked commits to be added on top of dev-v2.10 branch. Can you PTAL? I think the CI needs to be re-triggered to be run against 2.10 branch.

@mallardduck
Copy link
Member

Seeing this upon install:

W0813 13:11:43.161908 44 warnings.go:70] spec.template.spec.containers[2].ports[0]: duplicate port definition with spec.template.spec.containers[1].ports[0]

Copy link
Member

@mallardduck mallardduck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite working out the box for me when I set to enabled. In the rancher installed apps section it thinks that it has network policies, but they 404 when I click their links.

annotations:
np.rke2.io/ingress: resolved
name: rke2-ingress-network-policy
namespace: {{ .Values.rke2IngressNginx.namespaceOverride }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this doesn't actually have a default value. Is there supposed to be some sort of helper or lookup with this as an override when set? Or is this required? If required, we should add it as an empty (null) value and put a note that it should be set or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there in values.yaml.

rke2IngressNginx:
enabled: false
metricsPort: 10254
component: ingress-nginx
# in the RKE2 cluster, the ingress-nginx-controller is deployed
# as a non-hostNetwork workload starting at the following versions
# - >= v1.22.12+rke2r1 < 1.23.0-0
# - >= v1.23.9+rke2r1 < 1.24.0-0
# - >= v1.24.3+rke2r1 < 1.25.0-0
# - >= v1.25.0+rke2r1
# As a result we do not need clients and proxies as we can directly create
# a service that targets the workload with the given app name
namespaceOverride: kube-system

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think looking at other similar situations to this one, what we actually want to do here is:

  1. define a new item in _helpers.tpl for this,
  2. Call that helper here to reference that instead,
  3. Allow the values.yaml to have the namespaceOverride value default be empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the helpers file, I think we would mimic something like:
https://github.com/dharmit/rancher-charts/blob/aa9a6b007735706c311588752309cda89139982f/charts/rancher-monitoring/104.1.1-rc1%2Bup57.0.3/templates/_helpers.tpl#L301-L310

Since we know the default value should generally be kube-system, similar to how the default there is kube-state-metrics.

@mallardduck mallardduck requested review from alexandreLamarre and removed request for alexandreLamarre August 13, 2024 13:28
@mallardduck
Copy link
Member

mallardduck commented Aug 13, 2024

I don't think you need my review here -- reviews from Josh and Alexandre are sufficient.

Screenshot 2024-08-13 at 9 35 29 AM

@ericpromislow - I think that because you left a "Requested Changes" review you must provide an "Approved" review before this PR will be passable. Ignore mine since I added them today intentionally for changes I suggested. But notice your's and alex...I think big picture we need a new PR even after mine and yours are approved since Alex can review currently.

@joshmeranda
Copy link
Contributor

Seeing this upon install:

W0813 13:11:43.161908 44 warnings.go:70] spec.template.spec.containers[2].ports[0]: duplicate port definition with spec.template.spec.containers[1].ports[0]

@mallardduck This one is coming from upstream: https://github.com/grafana/helm-charts/blob/main/charts/grafana/templates/_pod.tpl#L1016-L1021

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

release.yaml Outdated Show resolved Hide resolved
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

@dharmit
Copy link
Contributor Author

dharmit commented Aug 16, 2024

Not quite working out the box for me when I set to enabled. In the rancher installed apps section it thinks that it has network policies, but they 404 when I click their links.

@mallardduck IIUC (which is based on description in rancher/rancher#45603), you have to open Prometheus Targets from the Monitoring view and check if serviceMonitor/kube-system/rancher-monitoring-ingress-nginx is up. I'm not sure where in the Installed Apps would you see a Network Policy because I see none on my RKE2 setup. Also, please set up an RKE2 with CIS enabled. Below are the configuration files I used:

$ cat /etc/rancher/rke2/config.yaml
cni: calico
write-kubeconfig-mode: "0644"
profile: cis
pod-security-admission-config-file: /etc/rancher/rke2/rke2-custom-pss.yaml

$ cat /etc/rancher/rke2/rke2-custom-pss.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "restricted"
      enforce-version: "latest"
      audit: "restricted"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      usernames: []
      runtimeClasses: []
      namespaces: [kube-system, cis-operator-system, tigera-operator, cattle-system, cattle-fleet-system, cattle-fleet-local-system, cattle-monitoring-system]

Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

1 similar comment
Copy link

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Signed-off-by: Dharmit Shah <dharmit.shah@suse.com>
Signed-off-by: Dharmit Shah <dharmit.shah@suse.com>
Copy link

github-actions bot commented Sep 2, 2024

Validation steps

  • Ensure all container images have repository and tag on the same level to ensure that all container images are included in rancher-images.txt which are used by airgap customers.
  Ex:-
    longhorn-controller:
      repository: rancher/hardened-sriov-cni
      tag: v2.6.3-build20230913
  
  • Add a 👍 (thumbs up) reaction to this comment once done. CI won't pass without this reaction to the github-action bot's latest validation comment.
  • Approve the PR to run the CI check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants