Retry clone kill which is flaky #1950

awels · 2021-09-20T20:12:57Z

Signed-off-by: Alexander Wels awels@redhat.com

What this PR does / why we need it:
[test_id:4000] Create a data volume and then clone it while killing the
container and verify retry count was pretty flaky lately. The likely culprit was the small amount of data being cloned. This causes the pods to be removed before we have a chance to connect issue the kill command.
Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

awels · 2021-09-20T20:13:14Z

/hold
going to retry the runs to ensure this really fixes the problem.

awels · 2021-09-21T17:18:35Z

/test all

brybacki

line 1319:
err = utils.WaitTimeoutForPodReady(f.K8sClient, utils.UploadPodName(targetPvc), targetNs.Name, utils.PodWaitForTime)

This wait has polling interval set to 2 seconds. So, in worst case, it can miss any action that has finished in 2 seconds. I think in good conditions our upload server can handle tinyCoreIso in milliseconds.

[test_id:4000] Create a data volume and then clone it while killing the container and verify retry count was pretty flaky lately. It was failing on attempting to connect to the upload server pod to kill it. This PR causes a retry. Signed-off-by: Alexander Wels <awels@redhat.com>

to kill the process Signed-off-by: Alexander Wels <awels@redhat.com>

Signed-off-by: Alexander Wels <awels@redhat.com>

awels · 2021-09-22T20:12:27Z

/retest

awels · 2021-09-23T00:53:13Z

/test all

brybacki · 2021-09-23T07:39:29Z

hpp destructive has a problem in before each, looks like it cannot find CDI

s: "Unable to find pod containing cdi-deployment",
other than that, looks good, trying again

/test all

brybacki · 2021-09-23T11:01:00Z

/retest

awels · 2021-09-23T13:25:44Z

/test all

awels · 2021-09-23T13:26:51Z

Yeah the destructive lane has a flake where it sometimes messes up the cdi object, which is one of the main reasons we put it in a separate lane to start with. I am mainly interested in seeing several runs where the clone test that is failing often right now doesn't fail.

awels · 2021-09-23T17:55:51Z

/test pull-containerized-data-importer-e2e-k8s-1.19-ceph

awels · 2021-09-23T21:40:13Z

/hold cancel
/test all
/approve

kubevirt-bot · 2021-09-23T21:40:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awels

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [awels]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

brybacki · 2021-09-24T08:08:58Z

/lgtm

brybacki · 2021-09-24T08:09:54Z

/retest

kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 20, 2021

kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2021

kubevirt-bot requested review from aglitke and maya-r September 20, 2021 20:13

kubevirt-bot added size/XS size/S and removed size/XS labels Sep 20, 2021

brybacki reviewed Sep 22, 2021

View reviewed changes

awels added 2 commits September 22, 2021 13:14

Retrying didn't work, try to slow down the clone so we have time

1a2b4c7

to kill the process Signed-off-by: Alexander Wels <awels@redhat.com>

awels force-pushed the fix_cloner_retry_test branch from 38a3edf to 6ce07d7 Compare September 22, 2021 18:15

kubevirt-bot added size/M and removed size/S labels Sep 22, 2021

Add shorter poll interval wait for pod ready

2d604b6

Signed-off-by: Alexander Wels <awels@redhat.com>

awels force-pushed the fix_cloner_retry_test branch from 6ce07d7 to 2d604b6 Compare September 22, 2021 18:26

kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2021

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2021

kubevirt-bot assigned brybacki Sep 24, 2021

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2021

kubevirt-bot merged commit 61684d3 into kubevirt:main Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry clone kill which is flaky #1950

Retry clone kill which is flaky #1950

awels commented Sep 20, 2021 •

edited

Loading

awels commented Sep 20, 2021

awels commented Sep 21, 2021

brybacki left a comment

awels commented Sep 22, 2021

awels commented Sep 23, 2021

brybacki commented Sep 23, 2021

brybacki commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

kubevirt-bot commented Sep 23, 2021

brybacki commented Sep 24, 2021

brybacki commented Sep 24, 2021

Retry clone kill which is flaky #1950

Retry clone kill which is flaky #1950

Conversation

awels commented Sep 20, 2021 • edited Loading

awels commented Sep 20, 2021

awels commented Sep 21, 2021

brybacki left a comment

Choose a reason for hiding this comment

awels commented Sep 22, 2021

awels commented Sep 23, 2021

brybacki commented Sep 23, 2021

brybacki commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

awels commented Sep 23, 2021

kubevirt-bot commented Sep 23, 2021

brybacki commented Sep 24, 2021

brybacki commented Sep 24, 2021

awels commented Sep 20, 2021 •

edited

Loading