Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: fix RTE lane #653

Merged
merged 6 commits into from
Jul 26, 2023
Merged

e2e: fix RTE lane #653

merged 6 commits into from
Jul 26, 2023

Conversation

ffromani
Copy link
Member

We previously failed to bail out early if the RTE lane was failing.
The runner script ultimately returns the exit code of the last executed command. Problem is: also the latest if block
is a command, and the evaluation of the command was successful, swallowing the RTE lane failure (if any).

The test per se seems to be failed because we incorrectly always check for the exporter binary availability. But the exporter binary is not meant to be here, in this lane: it is tested explicitely in the separate RTE local lane.

@ffromani ffromani requested a review from Tal-or July 14, 2023 12:05
@openshift-ci openshift-ci bot requested review from MarSik and mrniranjan July 14, 2023 12:05
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2023
@ffromani ffromani added the cherry-pick-candidate Possible cherry-pick in the future label Jul 14, 2023
@ffromani
Copy link
Member Author

/cherry-pick release-4.13

@openshift-cherrypick-robot

@ffromani: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Collaborator

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2023
@ffromani
Copy link
Member Author

/retest

it seems infra issue?

@ffromani
Copy link
Member Author

ok, the failure is real, but the log is misleading. Will work on that.

@ffromani
Copy link
Member Author

actual error:

End Captured GinkgoWriter Output�[0m

  �[38;5;9merr=unable to upgrade connection: container not found ("resource-topology-exporter") stderr=
  Unexpected error:
      <*errors.errorString | 0xc0001145a0>: {
          s: "unable to upgrade connection: container not found (\"resource-topology-exporter\")",
      }
      unable to upgrade connection: container not found ("resource-topology-exporter")
  occurred�[0m
  �[38;5;9mIn �[1m[It]�[0m�[38;5;9m at: �[1m/go/src/github.com/openshift-kni/numaresources-operator/test/e2e/rte/rte_test.go:315

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2023
@ffromani ffromani mentioned this pull request Jul 15, 2023
@ffromani
Copy link
Member Author

/retest

hack/run-test-e2e.sh Outdated Show resolved Hide resolved
@ffromani ffromani force-pushed the fix-e2e-lane branch 2 times, most recently from da4055b to cf27ee7 Compare July 18, 2023 05:46
@Tal-or
Copy link
Collaborator

Tal-or commented Jul 18, 2023

   W0718 06:42:29.151312   25843 metrics_grabber.go:110] Can't find any pods in namespace kube-system to grab metrics from
    Jul 18 06:42:29.208: INFO: 
    Latency metrics for node ip-10-0-241-243.us-east-2.compute.internal
    Jul 18 06:42:29.208: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
    STEP: Destroying namespace "topology-updater-5473" for this suite. 07/18/23 06:42:29.226
  << End Captured GinkgoWriter Output
  Timed out after 60.004s.
  Expected
      <bool>: false
  to be true
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/vendor/github.com/k8stopologyawareschedwg/resource-topology-exporter/test/e2e/topology_updater/topology_updater.go:100

seems like a real error

@ffromani
Copy link
Member Author

   W0718 06:42:29.151312   25843 metrics_grabber.go:110] Can't find any pods in namespace kube-system to grab metrics from
    Jul 18 06:42:29.208: INFO: 
    Latency metrics for node ip-10-0-241-243.us-east-2.compute.internal
    Jul 18 06:42:29.208: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
    STEP: Destroying namespace "topology-updater-5473" for this suite. 07/18/23 06:42:29.226
  << End Captured GinkgoWriter Output
  Timed out after 60.004s.
  Expected
      <bool>: false
  to be true
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/vendor/github.com/k8stopologyawareschedwg/resource-topology-exporter/test/e2e/topology_updater/topology_updater.go:100

seems like a real error

yup, is a real error, previously masked by the silent failure. Working on it.

@ffromani
Copy link
Member Author

/hold

requires much more tuning

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2023
@ffromani
Copy link
Member Author

ffromani commented Jul 21, 2023

/hold

surely broken until RTE v0.12.0 is released and consumed

@ffromani
Copy link
Member Author

depends on #663

@ffromani
Copy link
Member Author

requires: openshift/release#41614

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2023
@ffromani ffromani force-pushed the fix-e2e-lane branch 3 times, most recently from d4cf414 to a4a2091 Compare July 25, 2023 12:45
@ffromani ffromani force-pushed the fix-e2e-lane branch 2 times, most recently from 921cdc9 to d257f8b Compare July 25, 2023 16:58
Only a well known subset of tests (RTE local) needs
the local exporter binary, so let's look for it and consume
it only for that subset instead for the whole suite.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Focus on the subset of e2e tests u/s deems most stable.
We should increase this set gradually - upstream first.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani changed the title WIP: e2e: fix RTE lane e2e: fix RTE lane Jul 25, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2023
@ffromani
Copy link
Member Author

ended up rewriting a good part of the lane, and there are a lot of potential followups, but fixed at last and reviewable.

@ffromani
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 26, 2023
hack/run-test-e2e.sh Outdated Show resolved Hide resolved
hack/run-test-e2e.sh Outdated Show resolved Hide resolved
rte/hack/deploy-devices.sh Show resolved Hide resolved
tools/catkubeletconfmap/catkubeletconfmap.go Show resolved Hide resolved
hack/run-test-e2e.sh Show resolved Hide resolved
like u/s RTE does, needed to run the RTE tests

Signed-off-by: Francesco Romani <fromani@redhat.com>
add a new tool, to be used in CI mostly, to dump the
Topology Manager configuration from the configmap populated
by the kubeletconfig controller.

Could be useful also for troubleshooting.

Signed-off-by: Francesco Romani <fromani@redhat.com>
Use the newly added catkubeletconfmap tool to populate
the env variables before run the RTE CI lane, so we
can perform the full validation on RTE behavior.

Signed-off-by: Francesco Romani <fromani@redhat.com>
the e2e runner is one of the oldest parts still left

back in time we thought it could make sense to run the e2e against
a real cluster, but this option never materialized, and we ended up
using the runner only on CI.

When running against existing clusters, we run the specific suites
with extra parameters and config.

The end result is the runner script is unnecessary complex and actually
obfuscates the failure. Rewrite and simplify, tailoring for use on CI.

Signed-off-by: Francesco Romani <fromani@redhat.com>
@Tal-or
Copy link
Collaborator

Tal-or commented Jul 26, 2023

/lgtm
/approve
Thanks for the changes

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, swatisehgal, Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Tal-or,ffromani,swatisehgal]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 47ff292 into main Jul 26, 2023
10 checks passed
@openshift-cherrypick-robot

@ffromani: #653 failed to apply on top of branch "release-4.13":

Applying: e2e: rte: check the binary path only in local test
Using index info to reconstruct a base tree...
M	test/e2e/rte/rte_e2e_suite_test.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/rte/rte_e2e_suite_test.go
CONFLICT (content): Merge conflict in test/e2e/rte/rte_e2e_suite_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 e2e: rte: check the binary path only in local test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-candidate Possible cherry-pick in the future lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants