Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir #46571

Closed
wants to merge 7 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented May 14, 2024

What changes were proposed in this pull request?

This PR adds spark.checkpoint.dir configuration so users can set the checkpoint dir when they submit their application.

Why are the changes needed?

Separate the configuration logic so the same app can run with a different checkpoint.
In addition, this would be useful for Spark Connect with #46570.

Does this PR introduce any user-facing change?

Yes, it adds a new user-facing configuration.

How was this patch tested?

unittest added

Was this patch authored or co-authored using generative AI tooling?

No.

@HyukjinKwon HyukjinKwon marked this pull request as draft May 14, 2024 05:09
@github-actions github-actions bot added the CORE label May 14, 2024
@HyukjinKwon HyukjinKwon changed the title [DO-NOT-MERGE][SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir [SPARK-48268][CORE] Add a configuration for SparkContext.setCheckpointDir May 15, 2024
@HyukjinKwon HyukjinKwon marked this pull request as ready for review May 15, 2024 05:07
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the corner case. What happens when the users have different values for this configuration and SparkContext.setCheckpointDir. It can happen during the migration as a type of human mistakes.

  • I guess SparkContext.setCheckpointDir will override this configuration. If then, please write the precedence in the config documentation.
  • Also, I'm wondering if we want to show a proper warning or even to raise exceptions because this is a critical mistake.

@HyukjinKwon
Copy link
Member Author

Also, I'm wondering if we want to show a proper warning or even to raise exceptions because this is a critical mistake.

This one, I think it's fine. We already have similar configurations such as spark.log.level vs setLogLevel.

@github-actions github-actions bot added the DOCS label May 15, 2024
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment on wording. The code change itself looks good to me.

docs/configuration.md Show resolved Hide resolved
@HyukjinKwon
Copy link
Member Author

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants