Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-3934: Clean up and parallelize some e2e tests for faster runs #2397

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

machine424
Copy link
Contributor

@machine424 machine424 commented Jun 27, 2024

MON-3934: Parallelize some readonly/non disruptive e2e tests for faster runs

Isolate specific tests to enable parallel execution

Enhance the resilience of some tests and fix those prone to errors.

Fix some tests that were running wrong checks.

Make some the tests idempotent to be easily debugged and run locally
  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 27, 2024
Copy link
Contributor

openshift-ci bot commented Jun 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2024
@machine424 machine424 changed the title WIP (REVIEW NOT NEEDED): Parallel e2e WIP (REVIEW NOT NEEDED): make some e2e tests run in parallel to save some time Jun 27, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2024
@machine424 machine424 changed the title WIP (REVIEW NOT NEEDED): make some e2e tests run in parallel to save some time MON-3934: Parallelize some readonly/non disruptive e2e tests for faster runs Jul 10, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 10, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2024

@machine424: This pull request references MON-3934 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2024
@machine424
Copy link
Contributor Author

/retest

@machine424 machine424 changed the title MON-3934: Parallelize some readonly/non disruptive e2e tests for faster runs MON-3934: Clean up and parallelize some e2e tests for faster runs Jul 12, 2024
@machine424 machine424 force-pushed the parallel-e2e branch 4 times, most recently from 3871d3d to f010c91 Compare August 2, 2024 07:28
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 2, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 2, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 2, 2024

@machine424: This pull request references MON-3934 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -747,51 +752,6 @@ func TestAlertmanagerDisabling(t *testing.T) {
})
}

func TestAlertManagerHasAdditionalAlertRelabelConfigs(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to TestMultinamespacePrometheusRule where we're sure to get an alert with the appropriate labels.

err = framework.Poll(5*time.Second, time.Minute, func() error {
// The uwm alerts port (9095) is only exposed in-cluster, so we need to use
// port forwarding to access kube-rbac-proxy.
host, cleanUp, err := f.ForwardPort(t, f.UserWorkloadMonitoringNs, "alertmanager-user-workload", 9095)
Copy link
Contributor Author

@machine424 machine424 Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside the poll for better resiliency

}
func assertCMOImageRegistryIsUsed(t *testing.T, ns string) {
getRegistry := func(t *testing.T, image string) string {
// This first attempt is needed; otherwise, we may blindly add a second scheme,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the test less error prone.

t.Cleanup(cleanUp)
func verifyAlertmanagerReceivedAlerts(t *testing.T, namespace, svc string) {
err := framework.Poll(time.Second, 5*time.Minute, func() error {
host, cleanUp, err := f.ForwardPort(t, namespace, svc, 9093)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inside the poll for better resiliency

}
}

func TestUserWorkloadMonitoringAlerting(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged them in TestUserWorkloadMonitoring to make use of the same setup and be able to run them in //

}
}

func TestUserWorkloadMonitoringOptOut(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged them in TestUserWorkloadMonitoring to make use of the same setup and be able to run them in //

This check will happen on a standalone namespace userWorkloadOptOutTestNs so it can run in //

}
}

func TestUserWorkloadMonitoringGrpcSecrets(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged them in TestUserWorkloadMonitoring to make use of the same setup and be able to run them in //

@@ -436,6 +380,7 @@ func assertUserWorkloadMetrics(t *testing.T) {
err = framework.Poll(5*time.Second, 5*time.Minute, func() error {
body, err := f.AlertmanagerClient.GetAlertmanagerAlerts(
"filter", `alertname="VersionAlert"`,
"filter", fmt.Sprintf(`namespace="%s"`, userWorkloadTestNs),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now create tow namespaces, see setupUserApplications
This also make the test better isolated and less error prone

f.ThanosQuerierClient.WaitForQueryReturn(
t, time.Minute, fmt.Sprintf(`count(up{service="%s",namespace="openshift-user-workload-monitoring"} == 1)`, service),
t, 5*time.Minute, fmt.Sprintf(`count(up{service="%s",namespace="openshift-user-workload-monitoring"} == 1)`, service),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs in parallel now, some checks that used to run at the end may need more time now.

…er runs

Isolate specific tests to enable parallel execution

Enhance the resilience of some tests and fix those prone to errors.
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 2, 2024

@machine424: This pull request references MON-3934 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

MON-3934: Parallelize some readonly/non disruptive e2e tests for faster runs

Isolate specific tests to enable parallel execution

Enhance the resilience of some tests and fix those prone to errors.

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -1488,7 +1488,7 @@ func (c *Client) CreateOrUpdateConfigMap(ctx context.Context, cm *v1.ConfigMap)
return err
}

func (c *Client) DeleteIfExists(ctx context.Context, nsName string) error {
func (c *Client) DeleteNSIfExists(ctx context.Context, nsName string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is not related to the PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) the "IfExists" is implicit IMHO. Also the function could be moved to the framework package since it's not invoked by the operator.

Suggested change
func (c *Client) DeleteNSIfExists(ctx context.Context, nsName string) error {
func (c *Client) DeleteNamespace(ctx context.Context, nsName string) error {

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 2, 2024

@machine424: This pull request references MON-3934 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

MON-3934: Parallelize some readonly/non disruptive e2e tests for faster runs

Isolate specific tests to enable parallel execution

Enhance the resilience of some tests and fix those prone to errors.

Fix some tests that were running wrong checks.

Make some the tests idempotent to be easily debugged and run locally

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -109,15 +119,14 @@ func createPrometheusRule(t *testing.T) {
}
}

func verifyAlertmanagerAlertReceived(t *testing.T) {

host, cleanUp, err := f.ForwardPort(t, f.Ns, "alertmanager-operated", 9093)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was requesting the default AM instead of the additional one.

Copy link
Contributor

openshift-ci bot commented Aug 2, 2024

@machine424: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node a1cd4c2 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -1488,7 +1488,7 @@ func (c *Client) CreateOrUpdateConfigMap(ctx context.Context, cm *v1.ConfigMap)
return err
}

func (c *Client) DeleteIfExists(ctx context.Context, nsName string) error {
func (c *Client) DeleteNSIfExists(ctx context.Context, nsName string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) the "IfExists" is implicit IMHO. Also the function could be moved to the framework package since it's not invoked by the operator.

Suggested change
func (c *Client) DeleteNSIfExists(ctx context.Context, nsName string) error {
func (c *Client) DeleteNamespace(ctx context.Context, nsName string) error {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep this change for another PR if you don't mind.

if err != nil {
return err
}
t.Cleanup(cleanUp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might leave many port forwards opened if there are retries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is huge. Can you split it to another PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants