Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] decide skip agg or not based on not only first batch #11770

Open
binmahone opened this issue Nov 26, 2024 · 2 comments
Open

[FEA] decide skip agg or not based on not only first batch #11770

binmahone opened this issue Nov 26, 2024 · 2 comments
Labels
performance A performance related task/issue

Comments

@binmahone
Copy link
Collaborator

This is a follow up issue on #11712 per requested by #11712 (comment).

(for skip agg logic) :

Right now we check the first batch and decide for the entire task if we want to skip full aggregation or not. I think it would probably be better if we did it for each batch individually. We could even do it for each batch when we merge them too.

This priority of this issue depends on whether we'll need it in real customer cases. For now we just leave it as a TODO

@binmahone binmahone added ? - Needs Triage Need team to review and classify feature request New feature or request labels Nov 26, 2024
@revans2
Copy link
Collaborator

revans2 commented Nov 26, 2024

I just noticed that Spark has a very similar feature that is in all versions of Spark that we support.

https://issues.apache.org/jira/browse/SPARK-31973 We may want to go off of this and the corresponding configs as a starting point to evaluate what we really want to do.

@mattahrens mattahrens added performance A performance related task/issue and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 26, 2024
@binmahone
Copy link
Collaborator Author

@revans2 https://issues.apache.org/jira/browse/SPARK-31973 looks like in progress, why you mentioned " is in all versions of Spark that we support." ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants