Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout when initializing the Podman client (broken Podman should not affect odo dev on cluster) #6808

Conversation

rm3l
Copy link
Member

@rm3l rm3l commented May 10, 2023

What type of PR is this:
/kind bug
/area dev

What does this PR do / why we need it:
This introduces a timeout for the Podman client initialization (e.g., a timeout for the podman version command to finish). The default value is 1s.

For some reason, this made the Podman tests flaky on GitHub (sometimes, podman version would take more than 1s to return). The timeout has therefore been made configurable (via the PODMAN_CMD_INIT_TIMEOUT environment variable, defaulting to 1s), and set to a slightly higher value in the tests.

Which issue(s) this PR fixes:
Fixes #6575

PR acceptance criteria:

How to test changes / Special notes to the reviewer:
This can be tested by adding a script/program to your PATH that would delay all calls to Podman, e.g.:

mkdir -p ~/.local/bin

cat <<EOF > ~/.local/bin/podman-with-delay
#!/bin/bash

sleep 300 && podman "$@"
EOF

export PODMAN_CMD=~/.local/bin/podman-with-delay

# If ~/.local/bin is not already in your PATH
export PATH="~/.local/bin:$PATH"

At this point, odo dev on cluster should no longer hang.

odo dev --platform=podman would error out, but this is expected.

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. area/dev Issues or PRs related to `odo dev` labels May 10, 2023
@netlify
Copy link

netlify bot commented May 10, 2023

Deploy Preview for odo-docusaurus-preview canceled.

Name Link
🔨 Latest commit 2d0a448
🔍 Latest deploy log https://app.netlify.com/sites/odo-docusaurus-preview/deploys/645ce681d6ea6900081e51df

@openshift-ci openshift-ci bot requested review from kadel and rnapoles-rh May 10, 2023 10:18
@rm3l rm3l requested review from feloy and valaparthvi and removed request for rnapoles-rh May 10, 2023 10:18
@odo-robot
Copy link

odo-robot bot commented May 10, 2023

NoCluster Tests on commit 5a0e0ac finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

OpenShift Unauthenticated Tests on commit 5a0e0ac finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

Unit Tests on commit 5a0e0ac finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

Validate Tests on commit 5a0e0ac finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

Windows Tests (OCP) on commit 5a0e0ac finished with errors.
View logs: TXT HTML

@rm3l rm3l changed the title Add a timeout of 1s when initializing the Podman client (broken Podman should not affect odo dev on cluster) Add a timeout when initializing the Podman client (broken Podman should not affect odo dev on cluster) May 10, 2023
@odo-robot
Copy link

odo-robot bot commented May 10, 2023

Kubernetes Tests on commit 5a0e0ac finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

Kubernetes Docs Tests on commit 7ff460b finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 10, 2023

OpenShift Tests on commit 5a0e0ac finished with errors.
View logs: TXT HTML

rm3l and others added 4 commits May 10, 2023 17:48
This command is called at dependency injection time to initialize a (nil-able) Podman client,
even if users won't use Podman at all.
As discussed, this command is supposed to be
quite fast to return, hence this timeout of 1 second.

Initially, we were using cmd.Output to get the command output,
but as reported in [1], cmd.Output does not respect the context timeout.
This explains the workaround of reading from both stdout and stderr pipes,
*and* relying on cmd.Wait() to close those pipes properly when the program exits
(either as expected or when the timeout is reached).

[1] golang/go#57129

Co-authored-by: Philippe Martin <phmartin@redhat.com>
…l Kubernetes/Podman clients could not be initialized

This helps debug such potential issues instead of swallowing the errors.
This will allow setting a different value for environments like
in GitHub where the Podman client would take slightly more time to return
(I guess because of we are running a lot of Podman commands in parallel?).
@rm3l rm3l force-pushed the 6575-broken-podman-affects-odo-dev-run-against-cluster branch from 1e4370e to c3a5b23 Compare May 10, 2023 15:48
Some tests did not pass because the Podman client did not
initialize in 1s; I guess because we are running a lot of Podman commands in parallel?
This should hopefully improve this situation.
@rm3l rm3l force-pushed the 6575-broken-podman-affects-odo-dev-run-against-cluster branch from c3a5b23 to f985ae7 Compare May 10, 2023 15:52
@rm3l rm3l requested a review from feloy May 11, 2023 13:00
pkg/podman/version.go Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. Required by Prow. label May 11, 2023
@rm3l rm3l closed this May 11, 2023
@rm3l rm3l reopened this May 11, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@rm3l
Copy link
Member Author

rm3l commented May 11, 2023

Running oc.exe with args [oc create configmap config-map-for-cleanup --from-literal type=testing --from-literal team=odo -n cmd-devfile-deploy-test161cji] and odo env: []
  [oc] error: failed to create configmap: Post "[https://c114-e.eu-de.containers.cloud.ibm.com:30329/api/v1/namespaces/cmd-devfile-deploy-test161cji/configmaps?fieldManager=kubectl-create&quot;:](https://c114-e.eu-de.containers.cloud.ibm.com:30329/api/v1/namespaces/cmd-devfile-deploy-test161cji/configmaps?fieldManager=kubectl-create%22:) dial tcp 149.81.180.114:30329: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
  [FAILED] in [BeforeEach] - C:/Users/Administrator.ANSIBLE-TEST-VS/3936/tests/helper/helper_cmd_wrapper.go:101 @ 05/11/23 11:03:24.24
  Deleting project: cmd-devfile-deploy-test47osp
  Running oc.exe with args [oc delete project cmd-devfile-deploy-test47osp --wait=false] and odo env: []
  [oc] Error from server (NotFound): namespaces "cmd-devfile-deploy-test47osp" not found
  [FAILED] in [AfterEach] - C:/Users/Administrator.ANSIBLE-TEST-VS/3936/tests/helper/helper_cmd_wrapper.go:101 @ 05/11/23 11:03:24.523
  << Timeline

Network errors.

/override windows-integration-test/Windows-test

@openshift-ci
Copy link

openshift-ci bot commented May 11, 2023

@rm3l: Overrode contexts on behalf of rm3l: windows-integration-test/Windows-test

In response to this:

Running oc.exe with args [oc create configmap config-map-for-cleanup --from-literal type=testing --from-literal team=odo -n cmd-devfile-deploy-test161cji] and odo env: []
 [oc] error: failed to create configmap: Post "[https://c114-e.eu-de.containers.cloud.ibm.com:30329/api/v1/namespaces/cmd-devfile-deploy-test161cji/configmaps?fieldManager=kubectl-create&quot;:](https://c114-e.eu-de.containers.cloud.ibm.com:30329/api/v1/namespaces/cmd-devfile-deploy-test161cji/configmaps?fieldManager=kubectl-create%22:) dial tcp 149.81.180.114:30329: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
 [FAILED] in [BeforeEach] - C:/Users/Administrator.ANSIBLE-TEST-VS/3936/tests/helper/helper_cmd_wrapper.go:101 @ 05/11/23 11:03:24.24
 Deleting project: cmd-devfile-deploy-test47osp
 Running oc.exe with args [oc delete project cmd-devfile-deploy-test47osp --wait=false] and odo env: []
 [oc] Error from server (NotFound): namespaces "cmd-devfile-deploy-test47osp" not found
 [FAILED] in [AfterEach] - C:/Users/Administrator.ANSIBLE-TEST-VS/3936/tests/helper/helper_cmd_wrapper.go:101 @ 05/11/23 11:03:24.523
 << Timeline

Network errors.

/override windows-integration-test/Windows-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l
Copy link
Member Author

rm3l commented May 11, 2023

dial tcp: lookup c100-e.eu-de.containers.cloud.ibm.com

...
Summarizing 1 Failure:
  [FAIL] odo remove binding command tests when the component with binding is bootstrapped (bindingName=my-nodejs-app-cluster-sample-ocp) when odo dev is running when binding is removed [It] should have led odo dev to delete ServiceBinding from the cluster
  /go/odo_1/tests/integration/cmd_remove_binding_test.go:73

Ran 486 of 886 Specs in 1788.480 seconds
FAIL! -- 485 Passed | 1 Failed | 0 Pending | 400 Skipped

Previous run passed.

/override OpenShift-Integration-tests/OpenShift-Integration-tests

@openshift-ci
Copy link

openshift-ci bot commented May 11, 2023

@rm3l: Overrode contexts on behalf of rm3l: OpenShift-Integration-tests/OpenShift-Integration-tests

In response to this:

dial tcp: lookup c100-e.eu-de.containers.cloud.ibm.com

...
Summarizing 1 Failure:
 [FAIL] odo remove binding command tests when the component with binding is bootstrapped (bindingName=my-nodejs-app-cluster-sample-ocp) when odo dev is running when binding is removed [It] should have led odo dev to delete ServiceBinding from the cluster
 /go/odo_1/tests/integration/cmd_remove_binding_test.go:73

Ran 486 of 886 Specs in 1788.480 seconds
FAIL! -- 485 Passed | 1 Failed | 0 Pending | 400 Skipped

Previous run passed.

/override OpenShift-Integration-tests/OpenShift-Integration-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rm3l
Copy link
Member Author

rm3l commented May 11, 2023

/override kubernetes-infra-stage-test

Not related.

@openshift-ci
Copy link

openshift-ci bot commented May 11, 2023

@rm3l: Overrode contexts on behalf of rm3l: kubernetes-infra-stage-test

In response to this:

/override kubernetes-infra-stage-test

Not related.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit c217741 into redhat-developer:main May 11, 2023
@rm3l rm3l deleted the 6575-broken-podman-affects-odo-dev-run-against-cluster branch May 11, 2023 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev Issues or PRs related to `odo dev` kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. Required by Prow.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken podman affects odo dev run against cluster
3 participants