Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature-13763][K8S Task] collect real-time log #14379

Merged
merged 11 commits into from
Jul 20, 2023

Conversation

Radeity
Copy link
Member

@Radeity Radeity commented Jun 19, 2023

Purpose of the pull request

Brief change log

Verify this pull request

  • Manually tested.
image

@Radeity Radeity self-assigned this Jun 19, 2023
@Radeity Radeity added the 3.2.0 for 3.2.0 version label Jun 19, 2023
@Radeity Radeity added this to the 3.2.0 milestone Jun 19, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2023

Codecov Report

Merging #14379 (fa7e57e) into dev (1d0d85b) will decrease coverage by 0.03%.
The diff coverage is 7.44%.

❗ Current head fa7e57e differs from pull request most recent head b019235. Consider uploading reports for the commit b019235 to get more accurate results

@@             Coverage Diff              @@
##                dev   #14379      +/-   ##
============================================
- Coverage     38.50%   38.48%   -0.03%     
- Complexity     4562     4564       +2     
============================================
  Files          1260     1260              
  Lines         43791    43828      +37     
  Branches       4829     4832       +3     
============================================
+ Hits          16861    16866       +5     
- Misses        25057    25089      +32     
  Partials       1873     1873              
Impacted Files Coverage Δ
...duler/plugin/task/api/AbstractCommandExecutor.java 0.00% <0.00%> (ø)
...ugin/task/api/am/KubernetesApplicationManager.java 0.00% <0.00%> (ø)
...scheduler/plugin/task/api/k8s/AbstractK8sTask.java 0.00% <0.00%> (ø)
...r/plugin/task/api/k8s/AbstractK8sTaskExecutor.java 77.77% <ø> (+24.44%) ⬆️
...nscheduler/plugin/task/api/utils/ProcessUtils.java 0.00% <0.00%> (ø)
...uler/plugin/task/api/k8s/impl/K8sTaskExecutor.java 36.11% <13.20%> (-2.19%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Radeity Radeity added the feature new feature label Jun 21, 2023
@Radeity
Copy link
Member Author

Radeity commented Jul 10, 2023

Hi, @caishunfeng , would you like to help review this PR?

@Gallardot
Copy link
Member

HI @Radeity , good job!
I tested this PR. Sometimes it's OK, but sometimes there's an error log.

[INFO] 2023-07-10 19:42:57.004 +0800 - [K8sJobExecutor-echo-77] start to submit job
[INFO] 2023-07-10 19:42:57.073 +0800 - [K8sJobExecutor-echo-77] submitted job successfully
[INFO] 2023-07-10 19:42:57.095 +0800 - event received : job:echo-77 action:ADDED
[INFO] 2023-07-10 19:42:57.421 +0800 - event received : job:echo-77 action:MODIFIED
[INFO] 2023-07-10 19:42:57.421 +0800 - job echo-77 status 1
[INFO] 2023-07-10 19:43:02.459 +0800 - [K8S-pod-log] {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"echo-77\" in pod \"echo-77-pngf7\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}
[INFO] 2023-07-10 19:43:25.109 +0800 - event received : job:echo-77 action:MODIFIED
[INFO] 2023-07-10 19:43:25.109 +0800 - job echo-77 status 0
[INFO] 2023-07-10 19:43:25.110 +0800 - [K8sJobExecutor-echo-77] succeed in k8s

Maybe some fault tolerance work is needed here. For example, before getting pod logs, check the status of the current task pod?

@Radeity
Copy link
Member Author

Radeity commented Jul 10, 2023

Hi @Gallardot, thanks very much for your comment and test!

Maybe some fault tolerance work is needed here. For example, before getting pod logs, check the status of the current task pod?

Agree with you. I will add the status check to handle different pod status :D

return null;
}
Pod driver = driverPod.get(0);
Pod pod = podList.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there will be multiple Pods. Maybe you should filter the latest pods based on when they were created?

https://github.com/apache/dolphinscheduler/blob/dev/dolphinscheduler-task-plugin/dolphinscheduler-task-api/src/main/java/org/apache/dolphinscheduler/plugin/task/api/am/KubernetesApplicationManager.java#L100-L110

When getting the pod list, maybe add jobname as a label filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there will be multiple Pods. Maybe you should filter the latest pods based on when they were created?

You are right, one job can launch multiple pods. For this scenario, I think it's better support to read logs from different pods in future PR, rather read from the latest one, WDYT?

When getting the pod list, maybe add jobname as a label filter?

The label value here is taskAppId which can behave like a unique identifier of one job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. After the current PR merge, we create a new issue to track the issue

@Radeity
Copy link
Member Author

Radeity commented Jul 11, 2023

Hi, @Gallardot , I have added a check only for PENDING pod, do you have some other suggestions here?

if (pod.getStatus().getPhase().equals(PENDING)) {
ThreadUtils.sleep(SLEEP_TIME_MILLIS);
} else {
podIsReady = true;
}

@Gallardot
Copy link
Member

Hi, @Gallardot , I have added a check only for PENDING pod, do you have some other suggestions here?

if (pod.getStatus().getPhase().equals(PENDING)) {
ThreadUtils.sleep(SLEEP_TIME_MILLIS);
} else {
podIsReady = true;
}

I think we can also add UNKNOWN.

Copy link
Member

@Gallardot Gallardot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@Radeity
Copy link
Member Author

Radeity commented Jul 11, 2023

Hi, @Gallardot , I've made some changes based on your suggestions, please help review :D

@Radeity
Copy link
Member Author

Radeity commented Jul 14, 2023

Hi, @caishunfeng @SbloodyS , would you guys like to help double check?

@zhongjiajie
Copy link
Member

re-run the failed ci to find due to code bug or flaky test

Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zhongjiajie
Copy link
Member

seems it still when I re run the CI, could you please take a look @Radeity

@Radeity Radeity changed the title [Improvement-13763][K8S Task] collect real-time log [Feature-13763][K8S Task] collect real-time log Jul 20, 2023
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug C 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

6.8% 6.8% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@SbloodyS SbloodyS merged commit 04800a4 into apache:dev Jul 20, 2023
biaoma-ty pushed a commit to Kasma-Inc/dolphinscheduler that referenced this pull request Aug 17, 2023
* [Improvement-13763][K8S Task] collect real-time log

* fix codesmell

* get pod watcher until pod is ready

* fix codesmell

* specify container name & loop waiting pod creation

* sleep when pod is not initialized

---------

Co-authored-by: Jay Chung <zhongjiajie955@gmail.com>
zhongjiajie pushed a commit that referenced this pull request Aug 30, 2023
* [Improvement-13763][K8S Task] collect real-time log

* fix codesmell

* get pod watcher until pod is ready

* fix codesmell

* specify container name & loop waiting pod creation

* sleep when pod is not initialized

---------

Co-authored-by: Jay Chung <zhongjiajie955@gmail.com>
(cherry picked from commit 04800a4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.2.0 for 3.2.0 version backend feature new feature kubernetes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement][K8S Task] K8S task should collect whole pod log
5 participants