Backport of Improve IdentityStore Invalidate performance into release/1.16.x #27230
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #27184 to be assessed for backporting due to the inclusion of the label backport/1.16.x.
🚨
The person who merged in the original PR is:
@marcboudreau
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.
The below text is copied from the body of the original PR.
The Invalidate method of the IdentityStore struct was using a simplistic algorithm to synchronize the MemDB records (entities, groups, local entity aliases) with those from the storage bucket. This simplistic algorithm would result in a large number of MemDB operations within a single transaction whenever the storage bucket contained a large number or records. This large number of operations led to using a much slower comparer function within MemDB which caused the Invalidate function to take a long time to complete and could lead the node to fall so far behind in processing WALs sent over by the primary cluster that the replication state would transition to
merkle-sync
.The simplistic approach basically consisted of deleting everything from MemDB that was associated with the invalidated storage bucket and re-inserting those resources using state contained in the storage bucket. Since invalidations usually occur to signal a single resource has changed, been added, or been deleted; when a large number of unchanged resources also exist in the storage bucket, a lot of unnecessary work was being done (deleting and re-adding).
These changes replace the simplistic approach for the handling of entities and local entity aliases since they are the more likely resource to exist in large numbers where this problem occurs.
The new approach consists of comparing the contents of the invalidated storage bucket with the set of resources from MemDB associated that storage bucket. Resources that match in both systems are left alone, and only differences are rectified in MemDB.
Overview of commits