-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48928] Log Warning for Calling .unpersist() on Locally Checkpointed RDDs #47391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It is unclear to me what the value of adding this to unpersist is. For example, if there is a node loss, things will fail - even without an unpersist call. |
I think it at least gives users an explict warning in first place rather than relying on the lazily triggered error msg in the case of the misuse of local checkpoint. And local checkpoint is still used by the public API today, |
My point is, use of persist/unpersist is orthogonal to this - so why special case only here ? |
Yes, you are right that the use of persist/unpersist can be orthogonal to this. However, we special case because calling
When user invokes |
If there is no executor failure/loss, whether unpersist is called or not, there should be no failure as there is locally checkpointed data on the node. |
You're right that when there is no executor failure, there is no failure when you call
The last line results in an exception because the checkpointed RDD is evicted, even without node failure. Logging a warning after |
Ah, I see what you mean - sorry, I had forgotten that this is simply relying on storage level and not actually writing to the local disk (similar to hdfs for regular checkpoint). Make sense to add this. |
Merged to master, thanks for fixing this @mingkangli-db ! Also thanks to everyone for answering my queries - I really was misremembering how local checkpoint was happening. |
…inted RDDs ### What changes were proposed in this pull request? This pull request proposes logging a warning message when the `.unpersist()` method is called on RDDs that have been locally checkpointed. The goal is to inform users about the potential risks associated with unpersisting locally checkpointed RDDs without changing the current behavior of the method. ### Why are the changes needed? Local checkpointing truncates the lineage of an RDD, preventing it from being recomputed from its source. If a locally checkpointed RDD is unpersisted, it loses its data and cannot be regenerated, potentially leading to job failures if subsequent actions or transformations are attempted on the RDD (which was seen on some user workloads). Logging a warning message helps users avoid such pitfalls and aids in debugging. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a warning log message when .unpersist() is called on a locally checkpointed RDD, but it does not alter any existing behavior. ### How was this patch tested? This PR does not change any existing behavior and therefore no tests are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47391 from mingkangli-db/warning_unpersist. Authored-by: Mingkang Li <mingkang.li@databricks.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…inted RDDs ### What changes were proposed in this pull request? This pull request proposes logging a warning message when the `.unpersist()` method is called on RDDs that have been locally checkpointed. The goal is to inform users about the potential risks associated with unpersisting locally checkpointed RDDs without changing the current behavior of the method. ### Why are the changes needed? Local checkpointing truncates the lineage of an RDD, preventing it from being recomputed from its source. If a locally checkpointed RDD is unpersisted, it loses its data and cannot be regenerated, potentially leading to job failures if subsequent actions or transformations are attempted on the RDD (which was seen on some user workloads). Logging a warning message helps users avoid such pitfalls and aids in debugging. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a warning log message when .unpersist() is called on a locally checkpointed RDD, but it does not alter any existing behavior. ### How was this patch tested? This PR does not change any existing behavior and therefore no tests are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47391 from mingkangli-db/warning_unpersist. Authored-by: Mingkang Li <mingkang.li@databricks.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
…inted RDDs ### What changes were proposed in this pull request? This pull request proposes logging a warning message when the `.unpersist()` method is called on RDDs that have been locally checkpointed. The goal is to inform users about the potential risks associated with unpersisting locally checkpointed RDDs without changing the current behavior of the method. ### Why are the changes needed? Local checkpointing truncates the lineage of an RDD, preventing it from being recomputed from its source. If a locally checkpointed RDD is unpersisted, it loses its data and cannot be regenerated, potentially leading to job failures if subsequent actions or transformations are attempted on the RDD (which was seen on some user workloads). Logging a warning message helps users avoid such pitfalls and aids in debugging. ### Does this PR introduce _any_ user-facing change? Yes, this PR adds a warning log message when .unpersist() is called on a locally checkpointed RDD, but it does not alter any existing behavior. ### How was this patch tested? This PR does not change any existing behavior and therefore no tests are added. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47391 from mingkangli-db/warning_unpersist. Authored-by: Mingkang Li <mingkang.li@databricks.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
This pull request proposes logging a warning message when the
.unpersist()
method is called on RDDs that have been locally checkpointed. The goal is to inform users about the potential risks associated with unpersisting locally checkpointed RDDs without changing the current behavior of the method.Why are the changes needed?
Local checkpointing truncates the lineage of an RDD, preventing it from being recomputed from its source. If a locally checkpointed RDD is unpersisted, it loses its data and cannot be regenerated, potentially leading to job failures if subsequent actions or transformations are attempted on the RDD (which was seen on some user workloads). Logging a warning message helps users avoid such pitfalls and aids in debugging.
Does this PR introduce any user-facing change?
Yes, this PR adds a warning log message when .unpersist() is called on a locally checkpointed RDD, but it does not alter any existing behavior.
How was this patch tested?
This PR does not change any existing behavior and therefore no tests are added.
Was this patch authored or co-authored using generative AI tooling?
No.