-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2181] ]Handling for Non transient exceptions #4084
Conversation
65c6873
to
d6bdfea
Compare
d6bdfea
to
fdf09f9
Compare
if (jobFailedTimer != null) { | ||
jobFailedTimer.stop(jobMetadata); | ||
// Only mark the job as failed in case of non transient exceptions | ||
if(!DagProcessingEngine.isTransientException(e)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing spaces
if (!DagProcessingEngine.isTransientException(e)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if(!DagProcessingEngine.isTransientException(e)){ | ||
TimingEvent jobFailedTimer = DagProc.eventSubmitter.getTimingEvent(TimingEvent.LauncherTimings.JOB_FAILED); | ||
String message = "Cannot submit job " + DagUtils.getFullyQualifiedJobName(dagNode) + " on executor " + specExecutorUri; | ||
log.error(message, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we atleast log the message? Even if it is a transient exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now all exceptions would be considered non transient, still updated as per suggestion
} | ||
Dag<JobExecutionPlan> dag = new Dag<>(dagNodeList); | ||
Mockito.doNothing().when(dagManagementStateStore).addJobDagAction(Mockito.anyString(), Mockito.anyString(), Mockito.anyLong(), Mockito.anyString(), Mockito.any()); | ||
DagProcUtils.submitNextNodes(dagManagementStateStore, dag, dagId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a Mockito.verify for all the tests on dagManagementStateStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure makes sense, added in all tests
a1288bc
to
dffb79a
Compare
log.error("Ignoring non transient exception. DagTask {} will conclude and will not be retried. Exception - {} ", | ||
dagTask, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the exception is included as part of the message string instead of passed as a separate parameter, this would log only the exception's toString(), without the full stack trace.
Please update the log statement to pass the exception as a separate parameter to log the stack trace
log.error("Ignoring non-transient exception. DagTask {} will conclude and will not be retried.", dagTask, e);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -67,6 +70,8 @@ public class DagProcessingEngine extends AbstractIdleService { | |||
public static final String DEFAULT_JOB_START_DEADLINE_TIME_MS = "defaultJobStartDeadlineTimeMillis"; | |||
@Getter static long defaultJobStartDeadlineTimeMillis; | |||
public static final String DEFAULT_FLOW_FAILURE_OPTION = FailureOption.FINISH_ALL_POSSIBLE.name(); | |||
// Todo Update to fetch list from config once transient exception handling is implemented and retryable exceptions defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the convention is to use TODO:
, please update from Todo
to TODO:
for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -149,6 +158,13 @@ public void run() { | |||
dagTask.conclude(); | |||
log.info(dagProc.contextualizeStatus("concluded dagTask")); | |||
} catch (Exception e) { | |||
if(!DagProcessingEngine.isTransientException(e)) { | |||
log.error("Ignoring non transient exception. DagTask {} will conclude and will not be retried. Exception - {} ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see toString()
for dagTask, what does dagTask log? It would be useful to have dagId & dagAction in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out, I had moved the existing log as it is, updated it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good start!
...lin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProc.java
Outdated
Show resolved
Hide resolved
...vice/src/main/java/org/apache/gobblin/service/modules/orchestration/DagProcessingEngine.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtils.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/gobblin/service/modules/orchestration/DagProcessingEngineTest.java
Show resolved
Hide resolved
...ce/src/test/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtilsTest.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtilsTest.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtilsTest.java
Outdated
Show resolved
Hide resolved
7418e58
to
ae6aa56
Compare
ae6aa56
to
f2cb9e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly there!
...vice/src/main/java/org/apache/gobblin/service/modules/orchestration/DagProcessingEngine.java
Outdated
Show resolved
Hide resolved
...vice/src/main/java/org/apache/gobblin/service/modules/orchestration/DagProcessingEngine.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProcUtils.java
Show resolved
Hide resolved
@Test | ||
public void isNonTransientExceptionTest(){ | ||
Assert.assertTrue(!DagProcessingEngine.isTransientException(new RuntimeException("Simulating a non retryable exception!"))); | ||
Assert.assertTrue(!DagProcessingEngine.isTransientException(new AzkabanClientException("Simulating a retryable exception!"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use clarifying comment that there's nothing intrinsically transient or not about these, and in fact it just comes down to config, which is not presently provided.
even better however, would be to actually pass some test config and then demonstrate that the impl accordingly judges some as transient and others not. BTW, in authoring such a test you may find that it's challenging to have isTransientException
as a static
method, and yet be configuration-driven
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added clarifying statement, probably test config based tests can be added when we actually start maintaining config and we also add tests for transient exceptions handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and make isTransientException
non-static
to accept config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure makes sense
jobExecutionPlan.getFlowExecutionId(), jobExecutionPlan.getJobName(), | ||
DagActionStore.DagActionType.REEVALUATE); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely an improvement on verification! you forgot Mockito.verifyNoMoreInteractions(DMSS)
(please add to other two tests too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice work here!
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Tests
Commits