Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7071] Throw exceptions when clustering/index job fail #10050

Merged
merged 4 commits into from
Dec 2, 2023

Conversation

askwang
Copy link
Contributor

@askwang askwang commented Nov 10, 2023

Change Logs

When jobs throw exeption org.apache.hudi.exception.HoodieException: unable to read next record from parquet file, we do not deal with this exception. This results the job not failing and final state is success.

Impact

none

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

LOG.error(errorMessage);
if (t instanceof HoodieException) {
throw new HoodieException(errorMessage, t);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not modify the very basic utilities, it is shared by many components.

@nsivabalan nsivabalan added release-0.14.1 priority:critical production down; pipelines stalled; Need help asap. labels Nov 15, 2023
@danny0405
Copy link
Contributor

Is it fixed via: #10108 ?

@askwang
Copy link
Contributor Author

askwang commented Nov 16, 2023

Is it fixed via: #10108 ?

It's good. All services those calling UtilHelpers.retry have similar problems. I fix the clustering/index job like this

@askwang askwang changed the title [HUDI-7071] Throw exception when clustering/compactin job fail [HUDI-7071] Throw exceptions when clustering/index job fail Nov 16, 2023
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

if (ret != 0) {
throw new HoodieException("Fail to run compaction for " + cfg.tableName + ", return code: " + ret);
}
LOG.info("Success to run compaction for " + cfg.tableName);
jsc.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me understand why remove try-catch block here?
If L175 failed and threw an exception then L180 would not be executed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L175 will calls UtilHelpers.retry internal, which has try-catch block. This try-catch block is unreachable.

}
LOG.info(resultMsg + " success");
jsc.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think we can add a try catch block here to make sure jsc exit gracefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like HoodieIndexer also uses UtilHelpers.retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. HoodieCompactor/HoodieClusteringJob/HoodieIndexer use UtilHelpers.retry, so it's no need to add try catch block in main method.

}
LOG.info(resultMsg + " success");
jsc.stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no try-catch block in HoodieClusteringJob originally. If cluster throws HoodieException, the job returns -1 and jsc stops normally

@askwang
Copy link
Contributor Author

askwang commented Nov 24, 2023

plz cc @CTTY @danny0405

@danny0405
Copy link
Contributor

cc @CTTY for the review~

@askwang askwang requested a review from CTTY December 1, 2023 08:30
Copy link
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants