Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add min_fold_count_limit optimization #423

Merged
merged 22 commits into from
Aug 5, 2023

Conversation

u9g
Copy link
Contributor

@u9g u9g commented Aug 2, 2023

Stop expanding @fold early if the fold contains no @output fields and the count of the fold is not outputted, while also not invaliding other filters by skipping the rest of the iterator such as filtering for an upperbound. (@filter(< 10))

Copy link
Contributor Author

@u9g u9g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some thoughts

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
// We do not apply the min_fold_count_limit optimization if we have an upper bound,
// in the form of max_fold_count_limit, because if we have a required upperbound,
// then we can't stop at the lower bound, if we are required to observe whether we hit the
// upperbound, it's no longer safe to stop at the lower bound.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it too much to repeat the same exact thing another time at the end here after the comma?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded to now be:

If we must collect the fold up to our upperbound of `max_fold_count_limit`,
then we won't use our lowerbound of `min_fold_count_limit`, as by definition
the upperbound will be larger than the lowerbound.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting tripped up on the "by definition" bit. What definition? Don't we get the upper and lower bounds from separate filters, and it's technically possible that we have a nonsensical query with self-contradicting filters that end up violating this "definition"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t that be an impossible filter, and since we solve for the solution space of filters, it would be impossible?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being impossible means it produces no results, not that it can't be executed by a user. The code still has to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assert on it anyways, if that would be better

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assert on it — that's what I'm saying. A query with self-contradicting filters is valid. It's okay to execute it, and Trustfall must not crash on it. We may one day choose to trigger a lint on such queries, but even then, the query is allowed to be executed, and must still produce the correct — empty — result.

Come to think of it, it's probably best to make some query test cases like that. How about these:

{
    Number(min: 30, max: 30) {
        ... on Composite {
            value @output
            
            primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
        }
    }
}

args:
{
    "min": 2,
    "max": 3,
}
  • This one should return { value: 30 } because the fold count is 3, which is >= 2 and <= 3.
  • The new fold count optimization shouldn't kick in because we still need to check the other filter.
{
    Number(min: 30, max: 30) {
        ... on Composite {
            value @output
            
            primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
        }
    }
}

args:
{
    "min": 3,
    "max": 2,
}
  • This one should return no results because the filter conditions are >= 3 && <= 2.
  • The new fold count optimization probably shouldn't kick in here either, for the same reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this

Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicks on wording. Writing clear and concise comments is an art worth practicing.

The code change looks broadly correct and the test case changes look great.

// We do not apply the min_fold_count_limit optimization if we have an upper bound,
// in the form of max_fold_count_limit, because if we have a required upperbound,
// then we can't stop at the lower bound, if we are required to observe whether we hit the
// upperbound, it's no longer safe to stop at the lower bound.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
u9g and others added 5 commits August 3, 2023 21:38
Copy link
Owner

@obi1kenobi obi1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicks to make the Rust more idiomatic, otherwise this is good to go.

trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Outdated Show resolved Hide resolved
trustfall_core/src/interpreter/execution.rs Show resolved Hide resolved
&& tagged_fold_count.kind == FoldSpecificFieldKind::Count
})
});
let safe_to_skip_part_of_fold =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is safe_to_skip_part_of_fold too non-specific as to which part we can skip?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name isn't great. But see the comment above on an idea how we can skip it entirely.

Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, to name this variable something better, I think fold_values_not_observed or fold_values_ignored would suffice since that's what this boolean represents is that the fold values are not observed, neither the final count of values not the individual values by @fold'ing on them.

Comment on lines 344 to 347
// This optimization is only valid if we know that every
// filter applied to the folded element count is a comparison
// that we can determine the number of elements needed to satisfy
// the filter.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this portion of this comment doesn't really make sense to me:

a comparison that we can determine the number of elements needed to satisfy the filter.

To simplify, you can out "this optimization is only valid if" and replace it with "every filter on the fold count must ..."

But also, I'm not sure this comment is at the right level of abstraction. This comment makes a general statement about "this optimization" where the optimization is never defined — the function's docs say "further optimizations." It doesn't say anything about this match branch, so it would be hard for the reader to connect all the clues to figure out what's going on.

Copy link
Contributor Author

@u9g u9g Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, you're right. Maybe something like If we find a filter that we can't partially evaluate, fail early. would be a better comment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better! Small nit: "partial evaluation" is also the term for an unrelated idea, which might confuse the reader. Perhaps "filter that would require evaluating the fold past its minimum count" or something?

I don't love my suggestion, perhaps you can come up with something even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, If we find a filter that requires us to evaluate the entire fold, fail early. is better I think since it doesn't require knowing what we are partially evaluating. Although this gives the mistaken impression that every fold count filter that gets here will need to be fully evaluated which isn't necessarily the case. Another try would be If we don't know how many elements of the fold are required to satisfy this filter, fail early.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is good, I like that. "Requires us to evaluate the entire fold" is technically incorrect as you point out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great, updated to that

trustfall_core/src/interpreter/execution.rs Show resolved Hide resolved
Comment on lines 400 to 408
// Queries that do not observe the fold count nor any fold contents may be able to
// be optimized by only partially expanding the fold, just enough to check any filters
// that may be applied to the fold count.
//
// For example, if `@filter(op: ">", value: ["$ten"])` is our only filter on the count
// of the fold, we can stop computing the rest of the fold after seeing we have 11 elements.
Some(min_fold_count_limit) if safe_to_skip_part_of_fold => {
iterator.take(*min_fold_count_limit).collect()
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth tweaking the implementation here a bit to simplify the logic?

Here's what I noticed: safe_to_skip_part_of_fold = false with min_fold_count_limit = Some(...) is the same as safe_to_skip_part_of_fold = true with min_fold_count_limit = None. Also, safe_to_skip_part_of_fold has a pretty awkward name, which is usually a sign that the concept it represents is muddy as well.

In that light, it seems weird that we're splitting the logic so that the caller of this function prepares the two values, but isn't allowed to combine them. Perhaps we can eliminate safe_to_skip_part_of_fold completely by making the caller turn the min_fold_count_limit to None if the fold expansion couldn't be short-circuited?

In that case, the bulk of this comment would be best moved outside into the caller as well, since that's where most of the logic is on whether this optimization can be applied. This function just ends up doing what it's told.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I removed this if safe_to_skip_part_of_fold from collect_fold() and put it in the caller, and combined the boolean checks, this way collect_fold() only has to execute on the option, it doesn't have to deal with extra conditions if the option already has one built in.

&& tagged_fold_count.kind == FoldSpecificFieldKind::Count
})
});
let safe_to_skip_part_of_fold =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name isn't great. But see the comment above on an idea how we can skip it entirely.

Meta: try naming things for what they represent and not for what should be done with them. Right now, you tend to give a lot of imperative names like that, but those tend to age poorly because they might end up being used differently than originally intended. Names that describe what the value represents, not how it's going to be used, will age much better. Try thinking of some names like that and maybe put a few ideas in reply if you want feedback?

@obi1kenobi obi1kenobi added A-adapter Area: plugging data sources into the interpreter C-enhancement Category: raise the bar on expectations R-relnotes Release: document this in the release notes of the next release labels Aug 5, 2023
@obi1kenobi obi1kenobi merged commit 3fd0c1c into obi1kenobi:main Aug 5, 2023
@u9g u9g deleted the min_fold_count_limit branch August 5, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-adapter Area: plugging data sources into the interpreter C-enhancement Category: raise the bar on expectations R-relnotes Release: document this in the release notes of the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants