Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support huge values in RdbLoader #3760

Closed
romange opened this issue Sep 22, 2024 · 2 comments
Closed

Support huge values in RdbLoader #3760

romange opened this issue Sep 22, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@romange
Copy link
Collaborator

romange commented Sep 22, 2024

Currently RdbLoader::LoadKeyValPair loads all the data from a single entry, even if it's huge (set, map , zset, list etc).
This creates the following problems:

  1. A huge spike in memory during the load process (or in replica). Because we need to hold the load traces before we apply them on a shard.
  2. The shard thread will stuck for many seconds while processing these traces (inside LoadItemsBuffer()), which in turn will block subsequent calls to FlushShardAsync and will block replication.
  3. If fullsync replication is stalled on replica, it will eventually propagate to the master, blocking its snapshotting flow. At this point we can not distinguish between a deadlocked replication and a slow one.

This might trigger DflyCmd::BreakStalledFlowsInShard on the master side, effectively cancelling a possibly correct full sync.
If replica would be more responsive the whole chain would disappear.

Suggestion:
Change the loading code to streaming to support huge sets/lists etc. Allow adding multiple entries in parts. While the total CPU time for loading a huge entry will stay the same, the replica won't stall for dozens of seconds if the LoadTrace size will be capped.

@romange romange added the enhancement New feature or request label Sep 22, 2024
@romange
Copy link
Collaborator Author

romange commented Sep 22, 2024

Related to his PR: #3759 and created based on the learnings from this issue #3726

@romange
Copy link
Collaborator Author

romange commented Sep 23, 2024

In more detail: LoadKeyValPair loads all the blobs into memory (OpaqueObj) but does not create Valkey objects (i.e. hashes, strings etc). Instead, it delegates the object creation into the shard thread using FlushShardAsync function that takes batches of Items and creates an object from each one of them inside LoadItemsBuffer.

It works well but with huge objects (sets, for example) we need to load all their blobs into OpaqueObj and it takes up lots of memory and then later, stalls the process inside LoadItemsBuffer just because its CPU is busy creating millions of set fields from the OpaqueObj.

The suggestion here is to break the creation flow into parts. Nothing in the high level flow should change, I think but RdbLoaded::Item should be able to describe whether it's not the whole object but a part of it, and then LoadItemsBuffer will be able to "append" the additional parts to the already existing object.

Note, that right now the logic today - to treat "duplicate" keys during load with a logical error, which of course should change in case an Item is the continutation of the object that was broken down.

To summarize: we have a code that loads series of blobs into memory (ReadObj, ReadSet, ReadHMap etc) and we have code that creates Dragonfly data structures RdbLoaderBase::OpaqueObjLoader.

The first milestone for this issue would be to convert ReadSet, for example, into a streaming variant that adds one or more Items, and make sure that OpaqueObjLoader can handle this correctly.

chakaz added a commit that referenced this issue Nov 19, 2024
We have an internal utility tool that we use to deserialize values in
some use cases:

* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different

We [recently](#3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.

Fixes #4143
chakaz added a commit that referenced this issue Nov 20, 2024
* fix: Huge entries fail to load outside RDB / replication

We have an internal utility tool that we use to deserialize values in
some use cases:

* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different

We [recently](#3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.

Fixes #4143
adiholden pushed a commit that referenced this issue Nov 20, 2024
* fix: Huge entries fail to load outside RDB / replication

We have an internal utility tool that we use to deserialize values in
some use cases:

* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different

We [recently](#3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.

Fixes #4143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants