Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2113] Process Heartbeat DagAction CDC messages with empty FlowExecutionId str #4004

Merged
merged 3 commits into from
Jul 17, 2024

Conversation

umustafi
Copy link
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    After changing flowExecutionId from a Str to a long in a previous PR, we encounter NumberFormatException in the DagActionStoreChangeMonitor when processing HB events. This ends up killing the HighLevelConsumer queues for the hosts that receive the HB events in their partition. 
{{Encountered exception while processing record so stopping queue processing. Record: LiKafka10ConsumerRecord(consumerRecord=ConsumerRecord(topic = ds_mysql_makto-db-152_prod_SHARED_GOBBLIN_DAG_ACTION_STORE_20221208211255, partition = 0, leaderEpoch = null, offset = 905733, NoTimestampType = -1, serialized key size = -1, serialized value size = -1, headers = RecordHeaders(headers = [], isReadOnly = false), key = , value = {"changeEventIdentifier":
{"key": "", "txId": "", "produceTimestampMillis": 1721078158347, "operationType": "HEARTBEAT"}
, "flowGroup": "", "flowName": "", "flowExecutionId": "", "jobName": "", "dagAction": null})) Exception: java.lang.NumberFormatException: For input string: ""}}

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Updates the following unit test for Heartbeat (HB) events which was using valid flow names, groups, and flowExecutionId for HB event that did not accurately reflect a HB event received from CDC. After updating its values the test failed locally
Gradle suite > Gradle test > org.apache.gobblin.runtime.DagActionStoreChangeMonitorTest > testProcessMessageWithHeartbeatAndNullDagAction FAILED
    java.lang.NumberFormatException: For input string: ""
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:601)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.gobblin.service.monitoring.DagActionStoreChangeMonitor.processMessage(DagActionStoreChangeMonitor.java:203)
        at org.apache.gobblin.runtime.DagActionStoreChangeMonitorTest$MockDagActionStoreChangeMonitor.processMessageForTest(DagActionStoreChangeMonitorTest.java:97)
        at org.apache.gobblin.runtime.DagActionStoreChangeMonitorTest.testProcessMessageWithHeartbeatAndNullDagAction(DagActionStoreChangeMonitorTest.java:139)
Failed tests:
[org.apache.gobblin.runtime.DagActionStoreChangeMonitorTest::testProcessMessageWithHeartbeatAndNullDagAction

It passes after the update.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.72%. Comparing base (b44a91e) to head (36a3f58).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #4004   +/-   ##
=========================================
  Coverage     38.72%   38.72%           
  Complexity     1596     1596           
=========================================
  Files           388      388           
  Lines         15957    15957           
  Branches       1578     1578           
=========================================
  Hits           6180     6180           
  Misses         9283     9283           
  Partials        494      494           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -150,7 +150,7 @@ public void testProcessMessageWithHeartbeatAndNullDagAction() throws SpecNotFoun
@Test (dependsOnMethods = "testProcessMessageWithHeartbeatAndNullDagAction")
public void testProcessMessageWithHeartbeatAndFlowInfo() throws SpecNotFoundException {
Kafka09ConsumerClient.Kafka09ConsumerRecord consumerRecord =
wrapDagActionStoreChangeEvent(OperationType.HEARTBEAT, FLOW_GROUP, FLOW_NAME, FLOW_EXECUTION_ID, DagActionValue.RESUME);
wrapDagActionStoreChangeEvent(OperationType.HEARTBEAT, "", "", "", DagActionValue.RESUME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a real case? can a heartbeat message not have flow name and have dag action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real case but wanted to check some edge case behavior

@umustafi
Copy link
Contributor Author

checks on my branch umustafi#34

if (!jobExecutionPlanDagOptional.isPresent()) {
return Optional.absent();
}
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want the try/catch here?

flowCompilationTimer.stop(flowMetadata);
return jobExecutionPlanDagOptional;
} catch (IOException e) {
log.error("Encountered exception when attempting to compile and perform checks for flow: {}", flowSpec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to log this out maybe only print out the flowspec IDs

@Will-Lo Will-Lo merged commit 0ab5b33 into apache:master Jul 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants