-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-733] Add documentation on use of accumulators in lazy transformation #4022
Conversation
Test build #25474 has started for PR 4022 at commit
|
Test build #25474 has finished for PR 4022 at commit
|
Test PASSed. |
lgtm |
@@ -1316,7 +1316,35 @@ For accumulator updates performed inside <b>actions only</b>, Spark guarantees t | |||
will only be applied once, i.e. restarted tasks will not update the value. In transformations, users should be aware | |||
of that each task's update may be applied more than once if tasks or job stages are re-executed. | |||
|
|||
In addition, accumulators do not maintain lineage for the operations that use them. Consequently, accumulator updates are not guaranteed to be executed when made within a lazy transformation like `map()`. Unless something has triggered the evaluation of the lazy transformation that updates the value of the accumlator, subsequent operations will not themselves trigger that evaluation and the value of the accumulator will remain unchanged. The below code fragment demonstrates this issue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this is worded a bit confusingly: what would it mean for an accumulator to "maintain lineage"? I think this is from @JoshRosen's PR description, but IMO it might be better to remove that particular phrasing. What about a slight re-wording:
Accumulators do not change the lazy evaluation model of Spark. Their value is only
updated once the RDD in which they are being modified is computed as part of an
action. The below code fragment demonstrates this property:
I also didn't call it an "issue" because it's just a property of how they work, I don't think it's necessarily a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion - I've updated the doc.
Test build #25668 has started for PR 4022 at commit
|
Test build #25668 has finished for PR 4022 at commit
|
Test FAILed. |
Okay new version LGTM! Jenkins, test this please. |
Test build #25671 has started for PR 4022 at commit
|
Test build #25671 has finished for PR 4022 at commit
|
Test PASSed. |
…mation I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4022 from ilganeli/SPARK-733 and squashes the following commits: 587def5 [Ilya Ganelin] Updated to clarify verbage df3afd7 [Ilya Ganelin] Revert "Partially updated task metrics to make some vars private" 3f6c512 [Ilya Ganelin] Revert "Completed refactoring to make vars in TaskMetrics class private" 58034fb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 4dc2cdb [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-733 3a38db1 [Ilya Ganelin] Verified documentation update by building via jekyll 33b5a2d [Ilya Ganelin] Added code examples for java and python 1fd59b2 [Ilya Ganelin] Updated documentation for accumulators to highlight lazy evaluation issue 5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private (cherry picked from commit fd3a8a1) Signed-off-by: Imran Rashid <irashid@cloudera.com>
I've added documentation clarifying the particular lack of clarity highlighted in the relevant JIRA. I've also added code examples for this issue to clarify the explanation.