-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow rolling aggregations for window functions #464
Allow rolling aggregations for window functions #464
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. If you are contributing on behalf of someone else (e.g., your employer), the individual CLA may not be sufficient and your employer may need the Corporate CLA signed. |
It is WIP because I haven't run any tests. I did simple stupid benchmark on a single node Presto server. Results are below, second run was executed on Presto without this patch:
|
|
@kokosing ping me when you consider this ready for review |
Travis says:
|
8bd5a0c
to
bea1565
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
bea1565
to
cb58f28
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
c4f7fdb
to
2defb0b
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
2defb0b
to
a51c9af
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
a51c9af
to
600dfbc
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
The "Minor fixes" commit looks good. |
600dfbc
to
a28b298
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
Add a test for rolling sum that includes nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @kokosing I'm assuming you did a complete review as well.
{ | ||
// Only include methods which take the same parameters as the corresponding input function | ||
List<Method> removeInputFunctions = FunctionsParserHelper.findPublicStaticMethodsWithAnnotation(clazz, RemoveInputFunction.class).stream() | ||
.filter(method -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be simpler as two filter
calls
.filter(method -> | ||
Arrays.equals(method.getParameterTypes(), inputFunction.getParameterTypes()) | ||
&& Arrays.deepEquals(method.getParameterAnnotations(), inputFunction.getParameterAnnotations())) | ||
.collect(toImmutableList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of this can be replaced with toOptional()
from MoreCollectors
@@ -86,6 +86,12 @@ public InternalAggregationFunction specialize(BoundVariables variables, int arit | |||
|
|||
// Bind provided dependencies to aggregation method handlers | |||
MethodHandle inputHandle = bindDependencies(concreteImplementation.getInputFunction(), concreteImplementation.getInputDependencies(), variables, metadata); | |||
Optional<MethodHandle> removeInputHandle = concreteImplementation.getRemoveInputFunction().map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's format this like
Optional<MethodHandle> removeInputHandle = concreteImplementation.getRemoveInputFunction().map(
removeInputFunction -> bindDependencies(removeInputFunction, concreteImplementation.getRemoveInputDependencies(), variables, metadata));
This visually aligns the bindDependencies
call with the others, allowing us to easily see that it's similar.
currentStart = 0; | ||
currentEnd = -1; | ||
} | ||
int overlapStart = Integer.max(frameStart, currentStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Math.max/min
and static import
return; | ||
} | ||
} | ||
// We couldn't or didn't want to modify the accumulation: instead, discard the current accumulation and start fresh. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting nits:
- add newline before this
- use single space after colon
@@ -57,6 +57,25 @@ public void testCountRowsOrdered() | |||
.build()); | |||
} | |||
|
|||
@Test | |||
public void testCountRowsRolling() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test for average as well, since it might be able to catch a problem that count would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added tests for PARTITION BY
Add a test for the WindowNode use case for Accumulators. Fix a few bugs in the aggregation tests uncovered by the additional coverage. The new tests didn't uncover any product bugs.
Add a removeInput() function to some Accumulators, and when it exists, use it in aggregate window functions to roll the aggregation forward incrementally. Dramatically speeds up queries such as: SELECT COUNT(quantity) OVER (ROWS BETWEEN 2000 PRECEDING AND 2000 FOLLOWING) Extracted from: prestodb/presto#8974
Implement removeInput() in some SUM aggregations, to speed up rolling window functions. This requires additional storage in the AggregationState for the input count, so that the aggregator knows when its result should become null. Extracted from: prestodb/presto#8974
a28b298
to
e6b97d0
Compare
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. For more information, see https://github.com/prestosql/cla. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed.
@@ -57,6 +57,25 @@ public void testCountRowsOrdered() | |||
.build()); | |||
} | |||
|
|||
@Test | |||
public void testCountRowsRolling() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added tests for PARTITION BY
@alandpost Thank you! It took us a while to merge your work. |
@@ -31,27 +32,34 @@ | |||
private LongSumAggregation() {} | |||
|
|||
@InputFunction | |||
public static void sum(@AggregationState NullableLongState state, @SqlType(StandardTypes.BIGINT) long value) | |||
public static void sum(@AggregationState LongLongState state, @SqlType(StandardTypes.BIGINT) long value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and just filed an issue to showcase the problem #18818
Hi Trino, This change appears to have resulted in a substantial performance regression for certain queries we have. For example something that was taking 9min is now taking 57min to run, a 6.3x increase. Per #18818 Is this something that could be looked into? |
No description provided.