Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "agent: fix the issue of exec hang with a backgroud process" #7042

Conversation

fidencio
Copy link
Member

@fidencio fidencio commented Jun 6, 2023

This reverts commit 25d2fb0.

The reason we're reverting the commit is because it to check whether it's the cause for the regression on devmapper tests.

Fixes: #0000

@fidencio
Copy link
Member Author

fidencio commented Jun 6, 2023

/test-devmapper

@fidencio
Copy link
Member Author

fidencio commented Jun 6, 2023

kata-containers/tests#5682 is the issue raised, and if we can fix it on this repo, I'll move the issue here.

@katacontainersbot katacontainersbot added the size/tiny Smallest and simplest task label Jun 6, 2023
@fidencio
Copy link
Member Author

fidencio commented Jun 6, 2023

@lifupan, I don't think reverting is the way to go, but I'd like to ask if you have some cycles to take a look at the regression introduced by the commit.

@lifupan lifupan marked this pull request as ready for review June 7, 2023 03:40
@jepio
Copy link
Member

jepio commented Jun 7, 2023

/test

@fidencio fidencio added the do-not-merge PR has problems or depends on another label Jun 7, 2023
@lifupan
Copy link
Member

lifupan commented Jun 7, 2023

Hi @fidencio

I had tried to reproduce this issue, but actually I haven't by now.

I had noticed that the failed cases were not always the same one, some times the [run_kubernetes_tests.sh:153] ERROR: bats k8s-pid-ns.bats failed, sometimes failed on "[run_kubernetes_tests.sh:153] ERROR: bats k8s-copy-file.bats", I'd try to figure out the root cause later, and it wouldn't be a big issue, so I think we can just keep this PR open until I find the root cause.

@fidencio
Copy link
Member Author

fidencio commented Jun 7, 2023

Hi @fidencio

I had tried to reproduce this issue, but actually I haven't by now.

I had noticed that the failed cases were not always the same one, some times the [run_kubernetes_tests.sh:153] ERROR: bats k8s-pid-ns.bats failed, sometimes failed on "[run_kubernetes_tests.sh:153] ERROR: bats k8s-copy-file.bats", I'd try to figure out the root cause later, and it wouldn't be a big issue, so I think we can just keep this PR open until I find the root cause.

Hey @lifupan, I agree. :-)
The consistent failing that I got on the systems using devmapper was on the k8s-copy-file.bats, in case that helps.

@ananos
Copy link
Member

ananos commented Jun 21, 2023

/retest-devmapper
/retest-fc

@wainersm
Copy link
Contributor

@lifupan @fidencio the firecracker CI jobs leverage device-mapper, they are mostly failing at this point and preventing PRs from being merged. Could we merge this revert now? An alternative is to make the firecracker jobs non-required....

@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from ee896c8 to 67bdd23 Compare July 7, 2023 10:25
@fidencio
Copy link
Member Author

fidencio commented Jul 7, 2023

/test-dragonball

@fidencio fidencio removed the do-not-merge PR has problems or depends on another label Jul 7, 2023
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from 67bdd23 to 6048ca7 Compare July 7, 2023 11:59
@fidencio
Copy link
Member Author

fidencio commented Jul 7, 2023

/test

@fidencio fidencio added ok-to-test wip Work in Progress (PR incomplete - needs more work or rework) labels Jul 7, 2023
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from 6048ca7 to ec1ca6c Compare July 7, 2023 16:28
@katacontainersbot katacontainersbot added size/small Small and simple task and removed size/tiny Smallest and simplest task labels Jul 7, 2023
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from b066a7c to 5bd6e0d Compare July 8, 2023 08:44
@katacontainersbot katacontainersbot added size/tiny Smallest and simplest task and removed size/small Small and simple task labels Jul 8, 2023
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from 5bd6e0d to 07381b3 Compare July 8, 2023 09:02
@katacontainersbot katacontainersbot added size/small Small and simple task and removed size/tiny Smallest and simplest task labels Jul 8, 2023
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch 2 times, most recently from 763032f to 2419fa3 Compare July 8, 2023 09:31
@fidencio
Copy link
Member Author

fidencio commented Jul 8, 2023

After running tests here I can see the following behavior.

With 25d2fb0 reverted, I got a green run for Cloud Hypervisor, and QEMU.

Dragonball was still failing in two tests:

  • k8s-numbers-cpus
  • k8s-oom

With those two tests disabled for dragonball, I got 5 consecutive green runs for the dragonball Ci:

This makes me seriously believe that we should revert 25d2fb0 ASAP.

@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from 2419fa3 to b25927a Compare July 8, 2023 12:24
@fidencio
Copy link
Member Author

fidencio commented Jul 8, 2023

/test

This reverts commit 25d2fb0.

The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.

Fixes: kata-containers#7253
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's skip the k8s-number-cpus, as the test is currently failing.

We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.

Reference:
kata-containers#7270

Fixes: kata-containers#7253

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's skip the k8s-oom, as the test is currently failing.

We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.

Reference:
kata-containers#7271

Fixes: kata-containers#7253

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@fidencio fidencio force-pushed the topic/revert-25d2fb0fdecf5d76fcb8a4ecde5c54c0d4c3e240 branch from b25927a to 828a721 Compare July 8, 2023 12:28
@fidencio
Copy link
Member Author

fidencio commented Jul 8, 2023

/test

@fidencio
Copy link
Member Author

fidencio commented Jul 8, 2023

What I'd like to do here is ...

This will allow us to have a CI running ASAP.

I'll then do:

  • Work on k8s-number-cpus and have it re-enabled
  • Work on k8s-oom and have it re-enabled

Does it sound like a fair plan to everyone involved?

@fidencio
Copy link
Member Author

fidencio commented Jul 8, 2023

/test

@fidencio fidencio removed the wip Work in Progress (PR incomplete - needs more work or rework) label Jul 8, 2023
@fidencio
Copy link
Member Author

I am closing this one, as it's part of #7272

@fidencio fidencio closed this Jul 10, 2023
justxuewei added a commit to justxuewei/kata-containers that referenced this pull request Jul 16, 2023
Issue kata-containers#4747 and pull request kata-containers#4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in kata-containers#7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: kata-containers#4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
studychao pushed a commit to openanolis/kata-containers that referenced this pull request Nov 22, 2023
Issue kata-containers#4747 and pull request kata-containers#4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in kata-containers#7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: kata-containers#4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
mayamrinal pushed a commit to mayamrinal/kata-containers that referenced this pull request Dec 13, 2023
Issue kata-containers#4747 and pull request kata-containers#4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in kata-containers#7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: kata-containers#4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test size/small Small and simple task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants