-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception #8963
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Now I am trying kubernetes client invoked by airflow instead of shell cmd, working on it |
@ywan2017 I also have a PR open on airflow to work with spark 3.0 #8730 |
I saw you trying to fix spark watcher on k8s which is awesome! And that influence airflow schedule too much! It's sad that I am using spark 2.4.4 which is difficult to merge with your code change. |
@ywan2017 Yeah once this is merged I want to try to backport it to 2.4.x. But the code has been refactored a lot in 3.x so this will take a while. |
Hi guys, any update on this issue? |
There was attempt of the author to fix this issue #9081 but the PR was abandond. |
This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
description
I use airflow to schedule spark jobs on k8s using SparkSubmitOperator.
when spark jobs run on k8s for long time (>30mins), airflow often mark job failed status when the job is still running even the job finish successfully.
when it happen ,this exception often appears at the same time but not always
env
submit jobs by manual,get k8s logs directly
Scenario 1: job succeed, log return from k8s
Scenario 2:job failed, log return from k8s
airflow logs examples
Scenario 1: job success, log return from airflow ,get right status:
Scenario 2:job failed , log return from airflow ,get right status:
Scenario 3:job success, log return from airflow ,get wrong status(This Scenario is what we want to analysis)
submit job in Scenario 3, by manual, get k8s logs
Scenario 3:log return from k8s
Difference is here
conclusion:
Compared logs details ,there is difference when log is terminated:
airflow side & source code
k8s side (k8s client)
spark side
ask for suggestions
so is there some suggestions to avoid this issue?
actions may be considered
[airflow][source_code_change]
The text was updated successfully, but these errors were encountered: