Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Paginate MV_EXPAND output #100598

Merged

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Oct 10, 2023

Let MV_EXPAND operator paginate the output, so that the memory footprint remains low.

Now also queries like

row 
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
c = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
d = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
e = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
f = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
g = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 
x = 10000000000000 
| mv_expand a | mv_expand b | mv_expand c | mv_expand d | mv_expand e | mv_expand f | mv_expand g

consume a small amount of memory and avoid to calculate elements that will be discarded by a subsequent LIMIT.

Fixes #100533

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

assertThat(status.pagesProcessed(), equalTo(1));
assertThat(status.noops(), equalTo(0));
}

// TODO: remove this once possible
// https://github.com/elastic/elasticsearch/issues/99826
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory accounting should be fine now.
I also ran this test in a loop a few hundred times and it never failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the whole canLeak method I think. This is the last caller and it needs to go anyway.

@@ -582,7 +582,8 @@ private PhysicalOperation planLimit(LimitExec limit, LocalExecutionPlannerContex

private PhysicalOperation planMvExpand(MvExpandExec mvExpandExec, LocalExecutionPlannerContext context) {
PhysicalOperation source = plan(mvExpandExec.child(), context);
return source.with(new MvExpandOperator.Factory(source.layout.get(mvExpandExec.target().id()).channel()), source.layout);
int blockSize = 5000;// TODO estimate row size and use context.pageSize()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit more plumbing, probably it's not too complicated but we could also consider it for a follow-up PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}

protected Page process() {
Block expandingBlock = prev.getBlock(channel);
Block expandedBlock = expandingBlock.expand();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, this can be done once per block, no need to do it at every process()...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

}

protected Page process() {
Block expandingBlock = prev.getBlock(channel);
Block expandedBlock = expandingBlock.expand();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

return new Page(expandedBlock);
}

int[] duplicateFilter = buildDuplicateExpandingFilter(expandingBlock, expandedBlock.getPositionCount());
int[] duplicateFilter = nextDuplicateExpandingFilter(expandingBlock, pageSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pageSize, right? I think maybe once you move expandedBlock to a member variable too then this won't take any args, right?

private int[] buildDuplicateExpandingFilter(Block expandingBlock, int newPositions) {
int[] duplicateFilter = new int[newPositions];
private int[] nextDuplicateExpandingFilter(Block expandingBlock, int size) {
int[] duplicateFilter = new int[size];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably min(size, expanded.positionCount - nextPositionToProcess). Maybe?

int n = 0;
for (int p = 0; p < expandingBlock.getPositionCount(); p++) {
int count = expandingBlock.getValueCount(p);
while (nextPositionToProcess < expandingBlock.getPositionCount()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use while (true) if here are multiple interesting ways to break from the loop. It's kind of a signal to the reader that "something weird is here"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of while(true), but in this case I think it will make the code easier to read, so 👍

prevCompleted = true;
}
} else {
nextMvToProcess = nextMvToProcess + toAdd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth comments about the meanings of these two arms. I think one is "we're done expanding this page" and another is "we've filled the page we're building and maybe exhausted the current position. But we might not have, we might be half way through expanding a position.

public static final class Status extends AbstractPageMappingOperator.Status {
public static final class Status implements Operator.Status {

private final int pagesProcessed;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have this as pagesIn and pagesOut. That way you can see what the multiplication factor is too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

assertThat(status.pagesProcessed(), equalTo(1));
assertThat(status.noops(), equalTo(0));
}

// TODO: remove this once possible
// https://github.com/elastic/elasticsearch/issues/99826
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the whole canLeak method I think. This is the last caller and it needs to go anyway.

@@ -582,7 +582,8 @@ private PhysicalOperation planLimit(LimitExec limit, LocalExecutionPlannerContex

private PhysicalOperation planMvExpand(MvExpandExec mvExpandExec, LocalExecutionPlannerContext context) {
PhysicalOperation source = plan(mvExpandExec.child(), context);
return source.with(new MvExpandOperator.Factory(source.layout.get(mvExpandExec.target().id()).channel()), source.layout);
int blockSize = 5000;// TODO estimate row size and use context.pageSize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@Override
protected boolean canLeak() {
return true;
public void testExpandWithBytesRefs() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 I stole this from #100548

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@luigidellaquila
Copy link
Contributor Author

I guess this is ready for a final review now

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small comment about the silly mutation testing. Otherwise, looks great to me.

@@ -35,20 +35,22 @@ protected Writeable.Reader<MvExpandOperator.Status> instanceReader() {

@Override
public MvExpandOperator.Status createTestInstance() {
return new MvExpandOperator.Status(randomNonNegativeInt(), randomNonNegativeInt());
return new MvExpandOperator.Status(randomNonNegativeInt(), randomNonNegativeInt(), randomNonNegativeInt());
}

@Override
protected MvExpandOperator.Status mutateInstance(MvExpandOperator.Status instance) {
switch (between(0, 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change to between(0, 2) and there should be an arm that just changes pagesIn and another that just changes pagesOut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, doing it now

@Override
protected boolean canLeak() {
return true;
public void testExpandWithBytesRefs() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@luigidellaquila
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.11

luigidellaquila added a commit to luigidellaquila/elasticsearch that referenced this pull request Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Leaking operators awaitsFix
5 participants