Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Tighten up error checking on join keys #17517

Merged
merged 3 commits into from
Jul 15, 2024

Conversation

wence-
Copy link
Collaborator

@wence- wence- commented Jul 9, 2024

After discussion in #17184:

  • Reject multiple join key expressions if they are not all elementwise
  • Emit UserWarning if a coalescing join is requested but not all key expressions are column references, in which case the coalescing behaviour is turned off.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.68%. Comparing base (f90753b) to head (ee2cb68).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17517      +/-   ##
==========================================
- Coverage   80.69%   80.68%   -0.01%     
==========================================
  Files        1484     1484              
  Lines      195421   195417       -4     
  Branches     2782     2782              
==========================================
- Hits       157695   157679      -16     
- Misses      37214    37226      +12     
  Partials      512      512              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// Otherwise every expression must be elementwise
// so that we are guaranteed the keys for a join
// are all the same length.
|| aexprs.iter().all(|expr_ir| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use has_aexpr here.

I do think we should be stricter here. For instance a head is not allowed as its length doesn't match the dataframe. Probably allowing only elementwise makes most sense, as it doesn't seem sensible to join by a col(..).sort() anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should be stricter here.

Should I drop the aexprs.len() <= 1 part of the condition?

Is there a full classification of "elementwise" yet? As you note, sort answers "yes" to single_aexpr_is_elementwise, but should probably (?) be rejected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove single_aexpr_is_elementwise and replace it with is_streamable. It seems redundant. is_streamable is more correctly implemented as well.

Should I drop the aexprs.len() <= 1 part of the condition?

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Otherwise every expression must be elementwise
// so that we are guaranteed the keys for a join
// are all the same length.
|| aexprs.iter().all(|expr_ir| {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

};
single_aexpr_is_elementwise(ae)
});
let is_elementwise = is_streamable(expr_ir.node(), arena, Context::Default);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should one perhaps introduce an aliasing name (is_elementwise?) to allow for the option that the implementation of is_streamable (which might encompass more than just elementwise) diverges from the usage here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, might be more future proof indeed. Currently we see it as anything that is elementwise and can operate on batches.

If more than one join key is provided, they should all be elementwise
so that all computed join keys for a given table are the same length.

We use is_streamable to define if an AExpr is elementwise
A key-coalescing join is only supported if all join keys are column
references. If they are not, we turn off coalescing. Join in the case
that the user explicitly requested coalescing but is not going to get it.
Use is_streamable instead in the one remaining place it is used (slice
pushdown).
@ritchie46 ritchie46 merged commit e570e77 into pola-rs:main Jul 15, 2024
26 checks passed
@wence- wence- deleted the wence/fix/17184 branch July 15, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants