ROX-16090: Start data plane CD on integration environment. #1190

porridge · 2023-08-04T07:43:32Z

Description

This change also splits the IDP setup script to resolve some dependency loops between it and the terraform config.

Checklist (Definition of Done)

~~Unit and integration tests added~~
~~Added test description under Test manual~~
Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
CI and all relevant tests are passing
Add the ticket number to the PR title if available, i.e. ROX-12345: ...
~~Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.~~
Add secret to app-interface Vault or Secrets Manager if necessary
~~RDS changes were e2e tested manually~~
~~Check AWS limits are reasonable for changes provisioning new resources~~

Test manual

CI should suffice.

This change also splits the idp setup script to resolve some dependency loops between it and the terraform config.

ebensh

I like the cleanup for sure :) Thank you for splitting it out.

.github/workflows/deploy-integration.yaml

docs/development/setup-osd-cluster-idp.md

dp-terraform/cd-robot-account-setup.sh

ebensh · 2023-08-04T09:20:35Z

dp-terraform/helm/rhacs-terraform/terraform_cluster.sh

@@ -45,6 +45,21 @@ case $ENVIRONMENT in
    SECURED_CLUSTER_ENABLED="true"
    ;;

+  integration)
+    FM_ENDPOINT="https://qj3layty4dynlnz.api.integration.openshift.com"


Please add this to https://docs.engineering.redhat.com/pages/viewpage.action?pageId=288990393#Environments(Dev,Int,Stg,Prd)-ACSManagedServiceEnvironments

I will once it's actually up.

ebensh · 2023-08-04T09:21:06Z

dp-terraform/helm/rhacs-terraform/terraform_cluster.sh

+  integration)
+    FM_ENDPOINT="https://qj3layty4dynlnz.api.integration.openshift.com"
+    OBSERVABILITY_GITHUB_TAG="master"
+    OBSERVABILITY_OBSERVATORIUM_GATEWAY="https://observatorium-mst.api.stage.openshift.com"


Will this be a problem using the Observatorium stage environment? Will Stage and Int environments clobber each other?

Excellent point!
Some time ago @stehessel mentioned we could use the stage observatorium for this environment, but I guess we should first disambiguate the metrics. Taking a quick look at the config, the tenant helm value looks tempting, but I doubt we can just set it to something like rhacs-int in this case without some external setup first?

I guess the same concern applies to the logging, cloudwatch and perhaps audit-logs subcharts?

Prometheus metrics have clusterName as an external label, so I don't see an issue with using the stage observatorium tenant. cloudwatch subchart should also be fine to re-use, just need to adapt the environment and cluster name.

@stehessel can you please clarify what you mean by

just need to adapt the environment and cluster name

Is this something I need to do beforehand?
😅

Also, what about the logging and audit-logs subcharts? Are you familiar with them?

Cluster name and environment are setup via https://github.com/stackrox/acs-fleet-manager/blob/main/dp-terraform/helm/rhacs-terraform/terraform_cluster.sh#L18-L19. For logging and audit-logs I'd suggest to take a look at the current Helm values for stage. You need to adapt a few parameters from stage-dp-02 to integration-....

I'm still at a loss at where I should be looking for "the current Helm values for stage". AFAICT the helm values are being set dynamically by the shell script you pointed at, so as long as deploy_clusters parameter in the GHA workflow file is set correctly (and it is) I don't need to change anything in the charts 🤔

I had a closer look and determined the following w.r.t. to the child charts:

audit-logs

Seems to configure feeding k8s (?) audit logs into AWS cloudwatch service in the configured account. Since integration uses a separate account, there should be no clash with staging.

BTW helm is called with --set audit-logs.annotations.rhacs\\.redhat\\.com/cluster-name="${CLUSTER_NAME}" \ but it's not clear whether this affects the streamed logs in any way.

cloudwatch

Watches AWS resources and exports metrics for prometheus to scrape, so does not directly touch anything related to other environments.

logging

This one is a bit mysterious to me since the README assumes knowledge of OCP technologies that I'm unfamiliar with. Either way, the log forwarder output configuration has groupPrefix: {{ .Values.groupPrefix | quote }} which we set as --set logging.groupPrefix="${CLUSTER_NAME}" \ when calling helm. So while it's not clear to me where these logs end up, I think this should ensure the results are not mixed with other clusters.

observability

As discussed above, clusterId label is being set based on --set observability.clusterName="${CLUSTER_NAME}" \

@ebensh can you please approve?

I meant that there is probably nobody who knows all the charts and their values. So you could look at the generated Helm values for stage, note anything that is tied to the cluster name / env / AWS account, and either confirm that they are dynamically generated or set them manually. Which it looks like is what you did, so all good from my POV.

So while it's not clear to me where these logs end up, I think this should ensure the results are not mixed with other clusters.

The logs should end up in CloudWatch. In the secrets manager there should be AWS credentials for this.

dp-terraform/helm/rhacs-terraform/terraform_cluster.sh

dp-terraform/osd-cluster-idp-setup.sh

openshift-ci · 2023-08-23T08:46:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ebensh, porridge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ebensh,porridge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Add integration environment.

ec54bf4

This change also splits the idp setup script to resolve some dependency loops between it and the terraform config.

porridge temporarily deployed to development August 4, 2023 07:43 — with GitHub Actions Inactive

openshift-ci bot added the approved label Aug 4, 2023

Add integration case in terraform_cluster.sh

39364fb

porridge temporarily deployed to development August 4, 2023 08:15 — with GitHub Actions Inactive

ebensh self-requested a review August 4, 2023 08:49

ebensh requested changes Aug 4, 2023

View reviewed changes

openshift-ci bot assigned ebensh Aug 4, 2023

review comments

6ebf695

porridge temporarily deployed to development August 7, 2023 06:48 — with GitHub Actions Inactive

ebensh approved these changes Aug 23, 2023

View reviewed changes

openshift-ci bot added the lgtm label Aug 23, 2023

porridge merged commit 47653e4 into main Aug 24, 2023
8 checks passed

porridge deleted the idp-setup-fixes branch August 24, 2023 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-16090: Start data plane CD on integration environment. #1190

ROX-16090: Start data plane CD on integration environment. #1190

porridge commented Aug 4, 2023 •

edited

Loading

ebensh left a comment

ebensh Aug 4, 2023

porridge Aug 7, 2023

ebensh Aug 4, 2023

porridge Aug 7, 2023

stehessel Aug 7, 2023

porridge Aug 21, 2023

stehessel Aug 21, 2023

porridge Aug 22, 2023

porridge Aug 22, 2023

stehessel Aug 22, 2023

openshift-ci bot commented Aug 23, 2023

ROX-16090: Start data plane CD on integration environment. #1190

ROX-16090: Start data plane CD on integration environment. #1190

Conversation

porridge commented Aug 4, 2023 • edited Loading

Description

Checklist (Definition of Done)

Test manual

ebensh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

audit-logs

cloudwatch

logging

observability

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 23, 2023

porridge commented Aug 4, 2023 •

edited

Loading