Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-16090: Start data plane CD on integration environment. #1190

Merged
merged 3 commits into from
Aug 24, 2023

Conversation

porridge
Copy link
Collaborator

@porridge porridge commented Aug 4, 2023

Description

This change also splits the IDP setup script to resolve some dependency loops between it and the terraform config.

Checklist (Definition of Done)

  • Unit and integration tests added
  • Added test description under Test manual
  • Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
  • CI and all relevant tests are passing
  • Add the ticket number to the PR title if available, i.e. ROX-12345: ...
  • Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.
  • Add secret to app-interface Vault or Secrets Manager if necessary
  • RDS changes were e2e tested manually
  • Check AWS limits are reasonable for changes provisioning new resources

Test manual

CI should suffice.

This change also splits the idp setup script to resolve some dependency loops
between it and the terraform config.
@porridge porridge temporarily deployed to development August 4, 2023 07:43 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 4, 2023 07:43 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 4, 2023 07:43 — with GitHub Actions Inactive
@openshift-ci openshift-ci bot added the approved label Aug 4, 2023
@porridge porridge temporarily deployed to development August 4, 2023 08:15 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 4, 2023 08:15 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 4, 2023 08:15 — with GitHub Actions Inactive
@ebensh ebensh self-requested a review August 4, 2023 08:49
Copy link
Collaborator

@ebensh ebensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup for sure :) Thank you for splitting it out.

.github/workflows/deploy-integration.yaml Show resolved Hide resolved
docs/development/setup-osd-cluster-idp.md Outdated Show resolved Hide resolved
dp-terraform/cd-robot-account-setup.sh Outdated Show resolved Hide resolved
dp-terraform/cd-robot-account-setup.sh Show resolved Hide resolved
@@ -45,6 +45,21 @@ case $ENVIRONMENT in
SECURED_CLUSTER_ENABLED="true"
;;

integration)
FM_ENDPOINT="https://qj3layty4dynlnz.api.integration.openshift.com"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will once it's actually up.

integration)
FM_ENDPOINT="https://qj3layty4dynlnz.api.integration.openshift.com"
OBSERVABILITY_GITHUB_TAG="master"
OBSERVABILITY_OBSERVATORIUM_GATEWAY="https://observatorium-mst.api.stage.openshift.com"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be a problem using the Observatorium stage environment? Will Stage and Int environments clobber each other?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point!
Some time ago @stehessel mentioned we could use the stage observatorium for this environment, but I guess we should first disambiguate the metrics. Taking a quick look at the config, the tenant helm value looks tempting, but I doubt we can just set it to something like rhacs-int in this case without some external setup first?

I guess the same concern applies to the logging, cloudwatch and perhaps audit-logs subcharts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus metrics have clusterName as an external label, so I don't see an issue with using the stage observatorium tenant. cloudwatch subchart should also be fine to re-use, just need to adapt the environment and cluster name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stehessel can you please clarify what you mean by

just need to adapt the environment and cluster name

Is this something I need to do beforehand?
😅

Also, what about the logging and audit-logs subcharts? Are you familiar with them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name and environment are setup via https://github.com/stackrox/acs-fleet-manager/blob/main/dp-terraform/helm/rhacs-terraform/terraform_cluster.sh#L18-L19. For logging and audit-logs I'd suggest to take a look at the current Helm values for stage. You need to adapt a few parameters from stage-dp-02 to integration-....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still at a loss at where I should be looking for "the current Helm values for stage". AFAICT the helm values are being set dynamically by the shell script you pointed at, so as long as deploy_clusters parameter in the GHA workflow file is set correctly (and it is) I don't need to change anything in the charts 🤔

I had a closer look and determined the following w.r.t. to the child charts:

audit-logs

Seems to configure feeding k8s (?) audit logs into AWS cloudwatch service in the configured account. Since integration uses a separate account, there should be no clash with staging.

BTW helm is called with --set audit-logs.annotations.rhacs\\.redhat\\.com/cluster-name="${CLUSTER_NAME}" \ but it's not clear whether this affects the streamed logs in any way.

cloudwatch

Watches AWS resources and exports metrics for prometheus to scrape, so does not directly touch anything related to other environments.

logging

This one is a bit mysterious to me since the README assumes knowledge of OCP technologies that I'm unfamiliar with. Either way, the log forwarder output configuration has groupPrefix: {{ .Values.groupPrefix | quote }} which we set as --set logging.groupPrefix="${CLUSTER_NAME}" \ when calling helm. So while it's not clear to me where these logs end up, I think this should ensure the results are not mixed with other clusters.

observability

As discussed above, clusterId label is being set based on --set observability.clusterName="${CLUSTER_NAME}" \

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebensh can you please approve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that there is probably nobody who knows all the charts and their values. So you could look at the generated Helm values for stage, note anything that is tied to the cluster name / env / AWS account, and either confirm that they are dynamically generated or set them manually. Which it looks like is what you did, so all good from my POV.

So while it's not clear to me where these logs end up, I think this should ensure the results are not mixed with other clusters.

The logs should end up in CloudWatch. In the secrets manager there should be AWS credentials for this.

dp-terraform/helm/rhacs-terraform/terraform_cluster.sh Outdated Show resolved Hide resolved
dp-terraform/osd-cluster-idp-setup.sh Show resolved Hide resolved
@porridge porridge temporarily deployed to development August 7, 2023 06:48 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 7, 2023 06:48 — with GitHub Actions Inactive
@porridge porridge temporarily deployed to development August 7, 2023 06:48 — with GitHub Actions Inactive
@openshift-ci openshift-ci bot added the lgtm label Aug 23, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 23, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ebensh, porridge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@porridge porridge merged commit 47653e4 into main Aug 24, 2023
8 checks passed
@porridge porridge deleted the idp-setup-fixes branch August 24, 2023 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants