Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44049][K8S][TESTS] Fix KubernetesSuite to use inNamespace for validating driver pod cleanup #41586

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 14, 2023

What changes were proposed in this pull request?

This PR aims to fix KubernetesSuite to use inNamespace API for validating driver pod cleanup.

Why are the changes needed?

This is a trick bug because of the following two reasons.

  • Although all test cases passed, currently K8s integration tests are running extremely slowly.
  • The individual test case running time shows correctly.
  • The slowness happens during the transition from a test to another test.

The main root cause is that K8s test shows namespace not specified error after passing tests and this bug blocks every test case at the driver pod clean-up and validation stage up to 3 minutes (the maximum timeouts).

[info]   The code passed to eventually never returned normally. Attempted 190 times over 3.011156453483333 minutes.
Last failure message: namespace not specified for an operation requiring one and no default was found in the Config.. (KubernetesSuite.scala:612)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Also, I manually tested that the suite took 13 minutes correctly. Previously, it took over 1 hour.

[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (17 seconds, 144 milliseconds)
[info] - Run SparkPi with no resources (20 seconds, 406 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (15 seconds, 531 milliseconds)
...
[info] Run completed in 13 minutes, 46 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 842 s (14:02), completed Jun 13, 2023, 9:33:02 PM

@@ -613,6 +613,7 @@ class KubernetesSuite extends SparkFunSuite
Eventually.eventually(TIMEOUT, INTERVAL) {
assert(kubernetesTestComponents.kubernetesClient
.pods()
.inNamespace(kubernetesTestComponents.namespace)
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code pattern with line 610.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@dongjoon-hyun
Copy link
Member Author

Thank you, @pan3793 and @viirya ! Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44049 branch June 14, 2023 07:21
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…r validating driver pod cleanup

### What changes were proposed in this pull request?

This PR aims to fix `KubernetesSuite` to use `inNamespace` API for validating driver pod cleanup.

### Why are the changes needed?

This is a trick bug because of the following two reasons.
- Although all test cases passed, currently K8s integration tests are running extremely slowly.
- The individual test case running time shows correctly.
- The slowness happens during the transition from a test to another test.

The main root cause is that K8s test shows `namespace not specified` error after passing tests and this bug blocks every test case at the driver pod clean-up and validation stage `up to 3 minutes` (the maximum timeouts).

```
[info]   The code passed to eventually never returned normally. Attempted 190 times over 3.011156453483333 minutes.
Last failure message: namespace not specified for an operation requiring one and no default was found in the Config.. (KubernetesSuite.scala:612)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Also, I manually tested that the suite took 13 minutes correctly. Previously, it took over 1 hour.

```
[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (17 seconds, 144 milliseconds)
[info] - Run SparkPi with no resources (20 seconds, 406 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (15 seconds, 531 milliseconds)
...
[info] Run completed in 13 minutes, 46 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 842 s (14:02), completed Jun 13, 2023, 9:33:02 PM
```

Closes apache#41586 from dongjoon-hyun/SPARK-44049.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants