Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add min_fold_count_limit optimization #423

Merged
merged 22 commits into from
Aug 5, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2922874
add min_fold_count_limit optimization
u9g Aug 2, 2023
6233217
Merge branch 'main' into min_fold_count_limit
u9g Aug 4, 2023
4d5927a
use usize_from_field_value in get_min_fold_count_limit
u9g Aug 4, 2023
8d8b871
Update trustfall_core/src/interpreter/execution.rs
u9g Aug 4, 2023
a5574af
Update trustfall_core/src/interpreter/execution.rs
u9g Aug 4, 2023
f960c05
Improve comment about the optimization
u9g Aug 4, 2023
d197c10
Merge branch 'main' into min_fold_count_limit
u9g Aug 4, 2023
acd55be
disable optimization when we have a tag on the count of folded elements
u9g Aug 4, 2023
22af47d
rewrite comment again
u9g Aug 4, 2023
735684a
improve comments
u9g Aug 4, 2023
c8a6e23
Merge branch 'main' into min_fold_count_limit
u9g Aug 5, 2023
9b7fe6a
use values() over iter()
u9g Aug 5, 2023
a68060f
improve variable names
u9g Aug 5, 2023
78a49bb
Update trustfall_core/src/interpreter/execution.rs
u9g Aug 5, 2023
62d776e
Merge branch 'min_fold_count_limit' of https://github.com/u9g/trustfa…
u9g Aug 5, 2023
9e6545b
use .any
u9g Aug 5, 2023
625392d
combine booleans to make safe_to_skip_part_of_fold arg
u9g Aug 5, 2023
2318846
Merge branch 'main' into min_fold_count_limit
u9g Aug 5, 2023
3e2caa1
remove comments that point out something obvious and rely on implem
u9g Aug 5, 2023
17ff414
add comment and return None if we try getting the lowerbound of a fil…
u9g Aug 5, 2023
8814461
get rid of safe_to_skip_part_of_fold and move around comments
u9g Aug 5, 2023
7e6354a
improve comment for why we return None
u9g Aug 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 77 additions & 4 deletions trustfall_core/src/interpreter/execution.rs
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,8 @@ fn get_max_fold_count_limit(carrier: &mut QueryCarrier, fold: &IRFold) -> Option
let query_arguments = &carrier.query.as_ref().expect("query was not returned").arguments;
for post_fold_filter in fold.post_filters.iter() {
let next_limit = match post_fold_filter {
// Equals and OneOf must be visited here as they are not visited
// in `get_min_fold_count_limit`
Operation::Equals(FoldSpecificFieldKind::Count, Argument::Variable(var_ref))
| Operation::LessThanOrEqual(
FoldSpecificFieldKind::Count,
Expand Down Expand Up @@ -286,10 +288,52 @@ fn get_max_fold_count_limit(carrier: &mut QueryCarrier, fold: &IRFold) -> Option
result
}

/// If this IRFold has a filter on the folded element count, and that filter imposes
/// a min size that can be statically determined, return that min size so it can
/// be used for further optimizations. Otherwise, return None.
fn get_min_fold_count_limit(carrier: &mut QueryCarrier, fold: &IRFold) -> Option<usize> {
let mut result: Option<usize> = None;

let query_arguments = &carrier.query.as_ref().expect("query was not returned").arguments;
for post_fold_filter in fold.post_filters.iter() {
let next_limit = match post_fold_filter {
// We do not need to visit Equals and OneOf here,
// since those will be handled by `get_max_fold_count_limit`
Operation::GreaterThanOrEqual(
FoldSpecificFieldKind::Count,
Argument::Variable(var_ref),
) => {
let variable_value = query_arguments[&var_ref.variable_name].as_usize().unwrap();
Some(variable_value)
}
Operation::GreaterThan(FoldSpecificFieldKind::Count, Argument::Variable(var_ref)) => {
let variable_value = query_arguments[&var_ref.variable_name].as_usize().unwrap();
Some(variable_value + 1)
u9g marked this conversation as resolved.
Show resolved Hide resolved
}
_ => None,
u9g marked this conversation as resolved.
Show resolved Hide resolved
};

match (result, next_limit) {
(None, _) => result = next_limit,
(Some(l), Some(r)) if l < r => result = next_limit,
_ => {}
}
}

result
}

fn collect_fold_elements<'query, Vertex: Clone + Debug + 'query>(
mut iterator: ContextIterator<'query, Vertex>,
max_fold_count_limit: &Option<usize>,
min_fold_count_limit: &Option<usize>,
obi1kenobi marked this conversation as resolved.
Show resolved Hide resolved
no_outputs_in_fold: bool,
has_output_on_fold_count: bool,
) -> Option<Vec<DataContext<Vertex>>> {
// We do not apply the min_fold_count_limit optimization if we have an upper bound,
// in the form of max_fold_count_limit, because if we have a required upperbound,
// then we can't stop at the lower bound, if we are required to observe whether we hit the
// upperbound, it's no longer safe to stop at the lower bound.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it too much to repeat the same exact thing another time at the end here after the comma?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit hard to read. Can you try rephrasing it a bit, perhaps splitting into two or more sentences with clearer structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded to now be:

If we must collect the fold up to our upperbound of `max_fold_count_limit`,
then we won't use our lowerbound of `min_fold_count_limit`, as by definition
the upperbound will be larger than the lowerbound.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting tripped up on the "by definition" bit. What definition? Don't we get the upper and lower bounds from separate filters, and it's technically possible that we have a nonsensical query with self-contradicting filters that end up violating this "definition"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t that be an impossible filter, and since we solve for the solution space of filters, it would be impossible?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being impossible means it produces no results, not that it can't be executed by a user. The code still has to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assert on it anyways, if that would be better

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assert on it — that's what I'm saying. A query with self-contradicting filters is valid. It's okay to execute it, and Trustfall must not crash on it. We may one day choose to trigger a lint on such queries, but even then, the query is allowed to be executed, and must still produce the correct — empty — result.

Come to think of it, it's probably best to make some query test cases like that. How about these:

{
    Number(min: 30, max: 30) {
        ... on Composite {
            value @output
            
            primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
        }
    }
}

args:
{
    "min": 2,
    "max": 3,
}
  • This one should return { value: 30 } because the fold count is 3, which is >= 2 and <= 3.
  • The new fold count optimization shouldn't kick in because we still need to check the other filter.
{
    Number(min: 30, max: 30) {
        ... on Composite {
            value @output
            
            primeFactor @fold @transform(op: "count") @filter(op: ">=", value: ["$min"]) @filter(op: "<=", value: ["$max"])
        }
    }
}

args:
{
    "min": 3,
    "max": 2,
}
  • This one should return no results because the filter conditions are >= 3 && <= 2.
  • The new fold count optimization probably shouldn't kick in here either, for the same reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this

if let Some(max_fold_count_limit) = max_fold_count_limit {
// If this fold has more than `max_fold_count_limit` elements,
// it will get filtered out by a post-fold filter.
Expand Down Expand Up @@ -318,9 +362,28 @@ fn collect_fold_elements<'query, Vertex: Clone + Debug + 'query>(

Some(fold_elements)
} else {
// We weren't able to find any early-termination condition for materializing the fold,
// so materialize the whole thing and return it.
Some(iterator.collect())
let collected = match min_fold_count_limit {
// If we have a min_fold_count_limit then it is safe to .take(min_fold_count_limit)
// because .take will take at max min_fold_count_limit elements, and if the iterator has
// less it will just take all of them.
u9g marked this conversation as resolved.
Show resolved Hide resolved

// This optimization requires that the user can never observe that we didn't fully
// iterate the entire iterator. This is only possible if the fold has no outputs and
// the user does not have output the count of elements in the fold.

u9g marked this conversation as resolved.
Show resolved Hide resolved
// Additionally, we only apply this optimization if we don't have an upper bound,
// in the form of max_fold_count_limit, because if we dont have a required upperbound
// can we stop at the lower bound, if we are required to observe whether we hit the
// upperbound, it's no longer safe to stop at the lower bound.
u9g marked this conversation as resolved.
Show resolved Hide resolved
Some(min_fold_count_limit) if no_outputs_in_fold && !has_output_on_fold_count => {
iterator.take(*min_fold_count_limit).collect()
}
// We weren't able to find any early-termination condition for materializing the fold,
// so materialize the whole thing and return it.
_ => iterator.collect(),
};

Some(collected)
}
}

Expand Down Expand Up @@ -411,6 +474,10 @@ fn compute_fold<'query, AdapterT: Adapter<'query> + 'query>(
let fold_component = fold.component.clone();
let fold_eid = fold.eid;
let max_fold_size = get_max_fold_count_limit(carrier, fold.as_ref());
let min_fold_size = get_min_fold_count_limit(carrier, fold.as_ref());
let no_outputs_in_fold = fold.component.outputs.is_empty();
let has_output_on_fold_count =
fold.fold_specific_outputs.values().any(|x| *x == FoldSpecificFieldKind::Count);
u9g marked this conversation as resolved.
Show resolved Hide resolved
let moved_fold = fold.clone();
let folded_iterator = edge_iterator.filter_map(move |(mut context, neighbors)| {
let imported_tags = context.imported_tags.clone();
Expand All @@ -434,7 +501,13 @@ fn compute_fold<'query, AdapterT: Adapter<'query> + 'query>(
let fold_elements = if fold_exists {
// N.B.: Note the `?` at the end here!
// This lets us early-discard folds that failed a post-processing filter.
Some(collect_fold_elements(computed_iterator, &max_fold_size)?)
Some(collect_fold_elements(
computed_iterator,
&max_fold_size,
&min_fold_size,
u9g marked this conversation as resolved.
Show resolved Hide resolved
no_outputs_in_fold,
has_output_on_fold_count,
)?)
} else {
None
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,6 @@ TestInterpreterOutputTrace(
),
Opid(15): TraceOp(
opid: Opid(15),
parent_opid: Some(Opid(12)),
content: YieldFrom(ResolveNeighborsInner(2, Prime(PrimeNumber(5)))),
),
Opid(16): TraceOp(
opid: Opid(16),
parent_opid: Some(Opid(12)),
content: OutputIteratorExhausted,
),
Opid(17): TraceOp(
opid: Opid(17),
parent_opid: Some(Opid(4)),
content: YieldInto(SerializableContext(
active_vertex: Some(Composite(CompositeNumber(30, [
Expand Down Expand Up @@ -156,18 +146,12 @@ TestInterpreterOutputTrace(
Vid(2): Some(Prime(PrimeNumber(3))),
},
),
SerializableContext(
active_vertex: Some(Prime(PrimeNumber(5))),
vertices: {
Vid(2): Some(Prime(PrimeNumber(5))),
},
),
]),
},
)),
),
Opid(18): TraceOp(
opid: Opid(18),
Opid(16): TraceOp(
opid: Opid(16),
parent_opid: Some(Opid(4)),
content: YieldFrom(ResolveProperty(SerializableContext(
active_vertex: Some(Composite(CompositeNumber(30, [
Expand Down Expand Up @@ -196,70 +180,64 @@ TestInterpreterOutputTrace(
Vid(2): Some(Prime(PrimeNumber(3))),
},
),
SerializableContext(
active_vertex: Some(Prime(PrimeNumber(5))),
vertices: {
Vid(2): Some(Prime(PrimeNumber(5))),
},
),
]),
},
), Int64(30))),
),
Opid(19): TraceOp(
opid: Opid(19),
Opid(17): TraceOp(
opid: Opid(17),
parent_opid: None,
content: ProduceQueryResult({
"value": Int64(30),
}),
),
Opid(18): TraceOp(
opid: Opid(18),
parent_opid: Some(Opid(4)),
content: AdvanceInputIterator,
),
Opid(19): TraceOp(
opid: Opid(19),
parent_opid: Some(Opid(3)),
content: AdvanceInputIterator,
),
Opid(20): TraceOp(
opid: Opid(20),
parent_opid: Some(Opid(4)),
parent_opid: Some(Opid(2)),
content: AdvanceInputIterator,
),
Opid(21): TraceOp(
opid: Opid(21),
parent_opid: Some(Opid(3)),
content: AdvanceInputIterator,
parent_opid: Some(Opid(1)),
content: OutputIteratorExhausted,
),
Opid(22): TraceOp(
opid: Opid(22),
parent_opid: Some(Opid(2)),
content: AdvanceInputIterator,
content: InputIteratorExhausted,
),
Opid(23): TraceOp(
opid: Opid(23),
parent_opid: Some(Opid(1)),
parent_opid: Some(Opid(2)),
content: OutputIteratorExhausted,
),
Opid(24): TraceOp(
opid: Opid(24),
parent_opid: Some(Opid(2)),
parent_opid: Some(Opid(3)),
content: InputIteratorExhausted,
),
Opid(25): TraceOp(
opid: Opid(25),
parent_opid: Some(Opid(2)),
parent_opid: Some(Opid(3)),
content: OutputIteratorExhausted,
),
Opid(26): TraceOp(
opid: Opid(26),
parent_opid: Some(Opid(3)),
parent_opid: Some(Opid(4)),
content: InputIteratorExhausted,
),
Opid(27): TraceOp(
opid: Opid(27),
parent_opid: Some(Opid(3)),
content: OutputIteratorExhausted,
),
Opid(28): TraceOp(
opid: Opid(28),
parent_opid: Some(Opid(4)),
content: InputIteratorExhausted,
),
Opid(29): TraceOp(
opid: Opid(29),
parent_opid: Some(Opid(4)),
content: OutputIteratorExhausted,
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,6 @@ TestInterpreterOutputTrace(
),
Opid(15): TraceOp(
opid: Opid(15),
parent_opid: Some(Opid(12)),
content: YieldFrom(ResolveNeighborsInner(2, Prime(PrimeNumber(5)))),
),
Opid(16): TraceOp(
opid: Opid(16),
parent_opid: Some(Opid(12)),
content: OutputIteratorExhausted,
),
Opid(17): TraceOp(
opid: Opid(17),
parent_opid: Some(Opid(4)),
content: YieldInto(SerializableContext(
active_vertex: Some(Composite(CompositeNumber(30, [
Expand Down Expand Up @@ -156,18 +146,12 @@ TestInterpreterOutputTrace(
Vid(2): Some(Prime(PrimeNumber(3))),
},
),
SerializableContext(
active_vertex: Some(Prime(PrimeNumber(5))),
vertices: {
Vid(2): Some(Prime(PrimeNumber(5))),
},
),
]),
},
)),
),
Opid(18): TraceOp(
opid: Opid(18),
Opid(16): TraceOp(
opid: Opid(16),
parent_opid: Some(Opid(4)),
content: YieldFrom(ResolveProperty(SerializableContext(
active_vertex: Some(Composite(CompositeNumber(30, [
Expand Down Expand Up @@ -196,70 +180,64 @@ TestInterpreterOutputTrace(
Vid(2): Some(Prime(PrimeNumber(3))),
},
),
SerializableContext(
active_vertex: Some(Prime(PrimeNumber(5))),
vertices: {
Vid(2): Some(Prime(PrimeNumber(5))),
},
),
]),
},
), Int64(30))),
),
Opid(19): TraceOp(
opid: Opid(19),
Opid(17): TraceOp(
opid: Opid(17),
parent_opid: None,
content: ProduceQueryResult({
"value": Int64(30),
}),
),
Opid(18): TraceOp(
opid: Opid(18),
parent_opid: Some(Opid(4)),
content: AdvanceInputIterator,
),
Opid(19): TraceOp(
opid: Opid(19),
parent_opid: Some(Opid(3)),
content: AdvanceInputIterator,
),
Opid(20): TraceOp(
opid: Opid(20),
parent_opid: Some(Opid(4)),
parent_opid: Some(Opid(2)),
content: AdvanceInputIterator,
),
Opid(21): TraceOp(
opid: Opid(21),
parent_opid: Some(Opid(3)),
content: AdvanceInputIterator,
parent_opid: Some(Opid(1)),
content: OutputIteratorExhausted,
),
Opid(22): TraceOp(
opid: Opid(22),
parent_opid: Some(Opid(2)),
content: AdvanceInputIterator,
content: InputIteratorExhausted,
),
Opid(23): TraceOp(
opid: Opid(23),
parent_opid: Some(Opid(1)),
parent_opid: Some(Opid(2)),
content: OutputIteratorExhausted,
),
Opid(24): TraceOp(
opid: Opid(24),
parent_opid: Some(Opid(2)),
parent_opid: Some(Opid(3)),
content: InputIteratorExhausted,
),
Opid(25): TraceOp(
opid: Opid(25),
parent_opid: Some(Opid(2)),
parent_opid: Some(Opid(3)),
content: OutputIteratorExhausted,
),
Opid(26): TraceOp(
opid: Opid(26),
parent_opid: Some(Opid(3)),
parent_opid: Some(Opid(4)),
content: InputIteratorExhausted,
),
Opid(27): TraceOp(
opid: Opid(27),
parent_opid: Some(Opid(3)),
content: OutputIteratorExhausted,
),
Opid(28): TraceOp(
opid: Opid(28),
parent_opid: Some(Opid(4)),
content: InputIteratorExhausted,
),
Opid(29): TraceOp(
opid: Opid(29),
parent_opid: Some(Opid(4)),
content: OutputIteratorExhausted,
),
Expand Down
Loading