-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account page serde memory #15277
Account page serde memory #15277
Conversation
LocalMemoryContext memoryContext = operatorContext.newLocalUserMemoryContext(MergeOperator.class.getSimpleName()); | ||
// memory footprint of deserializer does not change over time | ||
memoryContext.setBytes(deserializer.getRetainedSizeInBytes()); | ||
closer.register(memoryContext::close); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why so? We call close() on finish
. And memory is still not reclaimed yet.
Also you do not use this pattern in ExchangeOperator
why a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch.
I honestly don't know why are we calling close
in finish
.
finish
indicates that no more pages will be added to operator via addInput
, and it is meaningless for source operators (such as ExchangeOperator
, MergeOperator
) and it should not get called (and from what I understand it doesn't).
Regardless, today both of these operators simply call close
on finish
what effectively means operator is no longer usable.
Perhaps we should follow up with an another PR to resolve this confusion.
@@ -60,8 +62,8 @@ | |||
|
|||
private final FileHolder targetFile; | |||
private final Closer closer = Closer.create(); | |||
private final PageSerializer serializer; | |||
private final PageDeserializer deserializer; | |||
private final PagesSerdeFactory serdeFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit messge you wrote:
Given that spilling streams are not expected to be tiny there is very
little benefit from caching serializer / deserializer.
If they are not tiny, then the caching is a benefit, right?
Or did you mean that a single call to writePages
is not short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a single call to writePages / readPages is not short, but the time spend between these two calls could in theory be long. @findepi could you please take a look and let me know if my understanding is correct?
bf79e1d
to
15e34ea
Compare
Another bug: cb31a78 In ExchangeOperator the memory is now reserved at the moment of the operator creation. When a downstream operator fails during creation already created operators don't get properly closed what results in a memory leak. I also considered avoiding reserving memory in a constructor, but regardless it doesn't seem right not to close operators hence I decided to keep memory allocation in a constructor. |
Updated |
} | ||
} | ||
} | ||
for (OperatorContext operatorContext : driverContext.getOperatorContexts()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that at this point the operator context which are present in driverContext only correspond to operators created above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that Operator creation fails after the corresponding OperatorContext
has already been created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that I understand. My quesiton is can there be context from other operators, which should not be closed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any other OperatorContext's but the ones for a driver that we failed to created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - looked into the code. I though there should be 1:1 mapping between Driver and DriverContext, just it looked weird that both are not created at the same time. I guess some optimization.
Usually a single call site (such as an operator) does not need both. Hence there is a logic for lazy buffer allocation for serialize and deserialize. Lazy allocation makes memory accounting less straightforward.
Account memory retained by compressor/decompressor and cipher
Due to potentially high number of streams it could be expensive to cache serializer / deserializer as those could retain a significant amount of memory (up to 128kb each). Given that spilling streams are not expected to be tiny there is very little benefit from caching serializer / deserializer.
15e34ea
to
2f709da
Compare
Description
When compression and encryption enabled page serialization / deserialization requires additional buffering space (up to 128kB). This PR ensures this memory is properly accounted.
Additional context and related issues
#15273
Release notes
(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: