Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1970315: testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update #26208

Conversation

vrutkovs
Copy link
Member

@vrutkovs vrutkovs commented Jun 8, 2021

"pods should successfully create sandboxes" test should mark pod events as flakes if network is being updated.
During network update CNI binaries may be in the middle of update. This may cause sandbox errors like:

  • error adding container to network "ovn-kubernetes": failed to send CNI request: Post "http://dummy/": EOF
  • Multus: [openshift-dns/dns-default-nbkz2]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
  • error adding container to network "openshift-sdn": failed to find plugin "openshift-sdn" in path [/opt/multus/bin /var/lib/cni/bin /usr/libexec/cni]

"never deleted" search in 4.7 -> 4.8 upgrades for last 7 days

If these events are occuring during network/machine-config update and sandboxes eventually get created (i.e. the pod never gets deleted) these events are marked as flakes.

Test runs:

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2021
@openshift-ci openshift-ci bot requested review from csrwng and mfojtik June 8, 2021 10:06
@vrutkovs
Copy link
Member Author

vrutkovs commented Jun 8, 2021

/test e2e-aws-upgrade

@vrutkovs vrutkovs force-pushed the sandboxes-neverdeleted-network-update branch from 92d2deb to 124818d Compare June 8, 2021 14:31
@vrutkovs vrutkovs changed the title WIP testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update Bug 1970315: testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update Jun 10, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2021

@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1970315: testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 10, 2021
@petr-muller
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2021
@@ -44,16 +45,30 @@ func testPodSandboxCreation(events monitorapi.Intervals) []*ginkgo.JUnitTestCase
}
deletionTime := getPodDeletionTime(eventsForPods[event.Locator], event.Locator)
if deletionTime == nil {
// mark sandboxes errors as flakes if networking is being updated
// these pods eventually get created
operatorsProgressing := intervalcreation.IntervalsFromEvents_OperatorProgressing(events, event.From, event.To)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this could be O(N^2) on a pretty big N, have you verified that IntervalsFromEvents uses binary search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems IntervalsFromEvents_OperatorProgressing is O(N)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list should be sorted, so if you know from to you can do a binary search o(logn) to find the start and then same for the end. Or maybe just at the beginning do a single pass and calculate all the intervals that the operator is progressing (which should be very small O) and then just do the smaller loop here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see intervals.go / monitor.go for a method that already uses sort.Search() to do this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this to use monitorapi functions:

  • CopyAndSort to create a copy of events and sort them by type
  • IntervalsFromEvents_OperatorProgressing to build a list of operator progressing events
  • sort.Search to find events for network/machine-config

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@vrutkovs: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 12, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 13, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 14, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

The main branch will open for development of next OCP version. Recalculating validity of PRs linked to this PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 14, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

The main branch will open for development of next OCP version. Recalculating validity of PRs linked to this PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jun 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 15, 2021

@openshift-bot: This pull request references Bugzilla bug 1970315, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/retest

1 similar comment
@yselkowitz
Copy link
Contributor

/retest

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you for working on this @vrutkovs

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2021
@smarterclayton smarterclayton added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021
@smarterclayton
Copy link
Contributor

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 29, 2021

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: petr-muller, ravisantoshgudimetla, vrutkovs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@vrutkovs
Copy link
Member Author

/test e2e-aws-upgrade

@vrutkovs
Copy link
Member Author

/cherrypick release-4.8

@openshift-cherrypick-robot

@vrutkovs: once the present PR merges, I will cherry-pick it on top of release-4.8 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrutkovs
Copy link
Member Author

/skip

e2e-aws-upgrade failing due to openshift/release#19836

@vrutkovs
Copy link
Member Author

/retest

@vrutkovs
Copy link
Member Author

/test e2e-metal-ipi-ovn-ipv6

1 similar comment
@vrutkovs
Copy link
Member Author

/test e2e-metal-ipi-ovn-ipv6

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 30, 2021

@vrutkovs: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-disruptive 124818d link /test e2e-gcp-disruptive
ci/prow/e2e-aws-disruptive 124818d link /test e2e-aws-disruptive
ci/prow/e2e-aws-upgrade 54660ee link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 7a95251 into openshift:master Jul 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@vrutkovs: All pull requests linked via external trackers have merged:

Bugzilla bug 1970315 has been moved to the MODIFIED state.

In response to this:

Bug 1970315: testPodSandboxCreation: skip sandbox errors for pods which were not deleted during network update

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@vrutkovs: new pull request created: #26297

In response to this:

/cherrypick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants