Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] More improvements for Tiered Project #11182

Open
revans2 opened this issue Jul 12, 2024 · 0 comments
Open

[FEA] More improvements for Tiered Project #11182

revans2 opened this issue Jul 12, 2024 · 0 comments
Labels
feature request New feature or request performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Jul 12, 2024

Is your feature request related to a problem? Please describe.
When tiered project first went in, it was very conservative in what it tried to deduplicate. #11179 extends that some, but it is not really a complete solution to what we can/should deduplicate.

The complexity is really around wanting to be sure that we don't get into a situation where an expression that could return an error is executed when a conditional (if/else, case when, coalesce) could have prevented it from executing. But, we also want to deduplicate expressions that would execute no matter what, even if they are under a conditional.

I think the algorithm we want to do this is to walk all of the children of all expressions, but keep some metadata about sub-expressions that appear in different conditional paths.

When we see an expression, if it has no side effects, then we just add it to the dedupe map.
If it does have side effects and we are not under any conditional expression, then we just add it to the dedupe map.
If we are under a conditional expression, then we conditionally add it to the dedupe map. Meaning that we include the conditional paths that would be taken to get to execute that expression.

If an expression is conditionally added to the dedupe map, and that expression is already in the map as unconditionally executed, then we can upgrade it to a regular deduplication, because we know it is already going to run no matter what.

If an unconditional expression is added to the map and there are conditional versions in there already, then we can upgrade all of them to always be executed.

After that we need to walk all of the conditionally added expressions and see if at least one version of them will be executed on all possible conditional paths.

For example:

If(x > y, a + 5, ((a + 5) DIV 100)) in ANSI mode.

Here we would end up with (a+5) conditionally added for [expression 0, first if, condition is true] and [expression 0, first if, condition is false] branches (I am just making up an arbitrary names right now). Because it appeared in both branches, it can be upgraded to always be deduplicated.

Another Example:

CASE 
  WHEN info['test'] like '123%' THEN 'FOO'
  WHEN info['test'] like '124%' THEN 'BAR'
  WHEN info['test'] like '125%' THEN 'BAZ'
  ELSE info['test']
END

Again in ANSI mode.

Here info['test'] can fail if 'test' is not in info. But because the first predicate includes info['test'] all of the others can be deduplicated because the first predicate is guaranteed to be executed.

But if we change it slightly.

CASE 
  WHEN map_contains_key(info, 'fixed_test') THEN info['fixed_test']
  WHEN info['test'] like '123%' THEN 'FOO'
  WHEN info['test'] like '124%' THEN 'BAR'
  WHEN info['test'] like '125%' THEN 'BAZ'
  ELSE info['test']
END

Now the first predicate does not include info['test'] and we cannot deduplicate any of them because there is a path through expression where info['test'] would never be executed (The first predicate is true).

But if we change it yet again

CASE 
  WHEN test_info_fixed THEN info['test']
  WHEN info['test'] like '123%' THEN 'FOO'
  WHEN info['test'] like '124%' THEN 'BAR'
  WHEN info['test'] like '125%' THEN 'BAZ'
  ELSE info['test']
END

Now the first predicate does not include info['test'], so we don't get it for free. But we can check each of the paths we see that no matter what happens it will be executed. test_info_fixed == true it executes in the THEN clause. test_info_fixed == false the next predicate will execute so we are good to mark it for deduplication.

Be aware that this can happen between top level expressions in a project. So if only one of the top level expressions upgrades something to always execute, then all of them can dedupe it, because it will execute for at least one of them. Note that this might change the error message that would be thrown, if there are multiple errors that could show up in a single batch. But it will still be a valid error message for that batch.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Jul 12, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants