-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ConditionalWeakTable with simpler non-static Dictionary-based cache #2506
Replace ConditionalWeakTable with simpler non-static Dictionary-based cache #2506
Conversation
src/Microsoft.OData.Client/Materialization/EnumValueMaterializationPolicy.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.OData.Client/Materialization/MaterializerAnnotationsCache.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.OData.Client/Materialization/MaterializerAnnotationsCache.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.OData.Client/Materialization/MaterializerAnnotationsCache.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.OData.Client/Materialization/MaterializerAnnotationsCache.cs
Outdated
Show resolved
Hide resolved
.../Client.TDD.Tests/Tests/Materialization/EntryMaterializationPolicyForComplexResourceTests.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.OData.Client/Materialization/IODataMaterializerContext.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving due to the urgent nature of this change, and because I find the design changes to be a significant improvement over what we currently have. I have not reviewed in detail.
I haven't had time to do a line-by-line review, but I agree with the intent and design of this solution. Thanks, Clement, for the research and for putting a solution together so quickly, and thanks everyone who has jumped on to provide feedback and help turn this around. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…er.cs Co-authored-by: Elizabeth Okerio <elizaokerio@gmail.com>
Co-authored-by: John Gathogo <john.gathogo@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Microsoft.OData.Client/Materialization/IODataMaterializerContext.cs
Outdated
Show resolved
Hide resolved
…ontext.cs Co-authored-by: Kennedy Kang'ethe <kemunga@microsoft.com>
310e03b
This PR has Quantification details
Why proper sizing of changes matters
Optimal pull request sizes drive a better predictable PR flow as they strike a
What can I do to optimize my changes
How to interpret the change counts in git diff output
Was this comment helpful? 👍 :ok_hand: :thumbsdown: (Email) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Cache used to store temporary metadata used for materialization of OData items. | ||
/// </summary> | ||
protected MaterializerCache materializerCache = new MaterializerCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to expose this through the constructor rather than always using the default cache in case caches become configurable in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea, but since this is an internal class, I think it would be better to instantiate the cache until there's a need to make it configurable. This way, we think of it as an implementation detail of the BaseAsyncResult
that the caller does not need to know about. If we pass it on the constructor, the caller will have to take responsibility for passing down the cache, and we might make the argument that it should be configurable at that level, all the way up to the DataServiceContext
. But the buck has to stop somewhere since this isn't a user input. I think BaseAsyncResult
is good place because it's what glues the request together, and this cache is scoped to a request.
Issues
*This pull request addresses #2503
When OData Client is reading items from the reader, it stores some metadata as "annotations" in a cache using the
SetAnnotation()
extension method. It retrieves this metadata during materialization when converting the OData items into client CLR types. For the cache, it uses the customInternalDictionary
static class that is based onConditionalWeakTable
.The
ConditionalWeakTable
has the following properties relevant to understand the issue and the solution implemented in this PR:DependentHandle
to be precise) and are not kept alive simply by being in theConditionalWeakTable
. If an entry's key is not referenced from outside the table, it will be eventually garbage collected even if the table is still alive.GetHashCode
,Equals
or comparer implementations)ConditionalWeakTable
is thread-safe. It uses locks to synchronize writes to the tableConditionalWeakTable
contains an internalContainer
class that actually holds the entries array. This class defines a destructor but is not IDisposable. When the array gets full a newContainer
instance with more capacity will be created and the old one will be eventually garbage collector. The destructor also needs to acquire the lock to clean up property and avoid data corruption.ConditionalWeakTable
does not shrink in size. It can recycle the space of old entries that have expired or have been removed, over item, it either remains the size or grows larger (by creating containers with larger capacities).OData Client create static
ConditionalWeakTable<ODataAnnotatable, T>
, a different instance static instance is created for each typeT
that is used.MaterializerEntry
is used as the value type forODataResource
keys,IEnumerable<ODataResource>
as the value type forODataResourceSet
keys, etc.The fact that they are static means they never get garbage collected and they are shared by all requests. This why it's important the tables are thread-safe. But this also causes efficiency and performance issues: when we have a lot of concurrent requests, we will have concurrent writes to the same
ConditionalWeakTable
, which result in:ConditionalWeakTable
being resizedThis explains (at least in part) why the customer reported so much memory use and GC activity (up to 11-12GB heap size). This is not sufficient to explain the discrepancy between heap allocations observed in .NET Core 3.1 (~1-3GB) and .NET 6.0.
Description
This PR removes the
ConditionalWeakTable
-based static cache and replaces it with a newMaterializerCache
. The new cache is based on a simpleDictionary<ODataAnnotatable, object>
and is not static. Instead, it's created for each request and only remains alive for the duration of the request. This has the following benefits over the previous implementation:The customers tested this implementation in a nightly build and reported that memory use dropped from 11-12GB to about 4.5-6GB (~50% improvement) and remains stable around that level. This is however still higher than what it was on .NET Core 3.1, so there's still more investigations to be carried out.
When implementing this PR I had to make the following considerations:
MaterializerCache
is functionally equivalent to old cacheHow
MaterializerCache
is functionally equivalent to the old cache implementationThe old cache was actually a handful of static caches. For example, in a given request, an
<ODataResource, MaterializerEntry>
pair would be stored in a differentConditionalWeakTable
from an<ODataProperty, MaterializedPropertyValue>
pair. So, theConditionalWeakTable<ODataAnnotatable, MaterializedPropertyValue>
actually containsODataProperty
and values from different requests. But the important thing that makes this work is each odata item only has a single value. AnODataProperty
instance is only once once as key and exists in only one dictionary, you won't find anODataProperty
as key in theConditionalWeakTable<ODataAnnotatble, MaterializerEntry>
dictionary. Otherwise, that would lead to incorrect behaviour.Since each instance is only used as key once, then it's safe to have keys of different types in the same
Dictionary<ODataAnnotatable, object>
dictionary. We know that when we fetch the value for anODataResource
it will be aMaterializerEntry
value, and when fetch anODataProperty
it will return aMaterializedPropertyValue
value since this how OData Client is already using the cache. If these assumptions do not hold, then a different design would be necessary. I have also ensured that the dictionary uses a reference-equality comparer.Where to create the
MaterializerCache
I noticed that the
ODataMaterializer
and similar classes accept or create anODataMaterializerContext
instance. Initially I created theMaterializerCache
inside the context class and exposed it as a property. This worked fine forExecuteAsync()
requests but did not work well forSaveChangesAsync
requests. The latter callsHandleOperationResponse
at least twice, leading two different materializer contexts, but expects to find data in the cache that was stored in a previouse context. This caused tests to fail. For this reason, I moved theMaterializerCache
toBaseAsyncResult
. For context,QueryResult
(which extendsBaseAsyncResult
) is created insideExecute()
orGetValue()
methods of aDataServiceRequest
object. So, the lifetime is limited to those methods. I don't know if there's a way to give it short lifespan.How to pass
MaterializerCache
aroundSince the materializer cache is no longer static, I needed to refactor methods that access it to accept is an argument. For most methods, instead of passing the materializer cache directly as method, I passed the materializer context instead. This appeared to more convenient and more robust to changes in the future since we can add stuff to the context instead of adding more granular parameters. It also avoids the inconsistency of having to figure when to pass materializer context and when to pass something else. But on the other hand, it has coupled some methods to the materializer context even though they only need to access the cache. And once, in some cases it has forced me to create a new instance of materializer context where it did not previous exist so that I can pass it around. If people believe that this is not a good approach, I'm open to refactoring it to reduce places where I use the materializer context as parameter (it still makes sense to keep the materializer cache as a property of the materializer context in classes that already had dependency on the materializer context).
Checklist (Uncheck if it is not completed)
Additional work necessary
If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.