Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account page serde memory #15277

Merged
merged 5 commits into from
Dec 5, 2022

Conversation

arhimondr
Copy link
Contributor

Description

When compression and encryption enabled page serialization / deserialization requires additional buffering space (up to 128kB). This PR ensures this memory is properly accounted.

Additional context and related issues

#15273

Release notes

(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@arhimondr arhimondr requested review from findepi and losipiuk December 2, 2022 00:50
@cla-bot cla-bot bot added the cla-signed label Dec 2, 2022
LocalMemoryContext memoryContext = operatorContext.newLocalUserMemoryContext(MergeOperator.class.getSimpleName());
// memory footprint of deserializer does not change over time
memoryContext.setBytes(deserializer.getRetainedSizeInBytes());
closer.register(memoryContext::close);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so? We call close() on finish. And memory is still not reclaimed yet.

Also you do not use this pattern in ExchangeOperator why a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch.

I honestly don't know why are we calling close in finish.

finish indicates that no more pages will be added to operator via addInput, and it is meaningless for source operators (such as ExchangeOperator, MergeOperator) and it should not get called (and from what I understand it doesn't).

Regardless, today both of these operators simply call close on finish what effectively means operator is no longer usable.

Perhaps we should follow up with an another PR to resolve this confusion.

@@ -60,8 +62,8 @@

private final FileHolder targetFile;
private final Closer closer = Closer.create();
private final PageSerializer serializer;
private final PageDeserializer deserializer;
private final PagesSerdeFactory serdeFactory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit messge you wrote:

Given that spilling streams are not expected to be tiny there is very
little benefit from caching serializer / deserializer.

If they are not tiny, then the caching is a benefit, right?
Or did you mean that a single call to writePages is not short?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a single call to writePages / readPages is not short, but the time spend between these two calls could in theory be long. @findepi could you please take a look and let me know if my understanding is correct?

@arhimondr arhimondr force-pushed the account-page-serde-memory branch from bf79e1d to 15e34ea Compare December 2, 2022 18:05
@arhimondr
Copy link
Contributor Author

Another bug: cb31a78

In ExchangeOperator the memory is now reserved at the moment of the operator creation. When a downstream operator fails during creation already created operators don't get properly closed what results in a memory leak.

I also considered avoiding reserving memory in a constructor, but regardless it doesn't seem right not to close operators hence I decided to keep memory allocation in a constructor.

@arhimondr
Copy link
Contributor Author

Updated

}
}
}
for (OperatorContext operatorContext : driverContext.getOperatorContexts()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that at this point the operator context which are present in driverContext only correspond to operators created above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that Operator creation fails after the corresponding OperatorContext has already been created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that I understand. My quesiton is can there be context from other operators, which should not be closed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any other OperatorContext's but the ones for a driver that we failed to created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - looked into the code. I though there should be 1:1 mapping between Driver and DriverContext, just it looked weird that both are not created at the same time. I guess some optimization.

Usually a single call site (such as an operator) does not need both.
Hence there is a logic for lazy buffer allocation for serialize and
deserialize. Lazy allocation makes memory accounting less
straightforward.
Account memory retained by compressor/decompressor and cipher
Due to potentially high number of streams it could be expensive to cache
serializer / deserializer as those could retain a significant amount of
memory (up to 128kb each).

Given that spilling streams are not expected to be tiny there is very
little benefit from caching serializer / deserializer.
@arhimondr arhimondr force-pushed the account-page-serde-memory branch from 15e34ea to 2f709da Compare December 5, 2022 16:22
@arhimondr arhimondr merged commit 11a9b69 into trinodb:master Dec 5, 2022
@arhimondr arhimondr deleted the account-page-serde-memory branch December 5, 2022 21:24
@github-actions github-actions bot added this to the 404 milestone Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants