[hashset feature] Scan shortcut fix #1217

SoftlyRaining · 2024-10-23T23:01:24Z

This is targeting the hashset branch and will update the hashset PR when merged. This fixes the issue I saw where scan can miss elements when we're rehashing: #1186 (comment)

The UT fails without my fix, and passes with my fix 😁

Signed-off-by: Rain Valentine <rsg000@gmail.com>

src/hashset.c

codecov · 2024-10-23T23:31:20Z

Codecov Report

Attention: Patch coverage is 48.27586% with 30 lines in your changes missing coverage. Please review.

Project coverage is 70.34%. Comparing base (5129254) to head (9bf0c39).

Files with missing lines	Patch %	Lines
src/hashset.c	48.27%	30 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           hashset    #1217      +/-   ##
===========================================
- Coverage    70.77%   70.34%   -0.44%     
===========================================
  Files          115      115              
  Lines        64714    63803     -911     
===========================================
- Hits         45804    44880     -924     
- Misses       18910    18923      +13

Files with missing lines	Coverage Δ
src/hashset.c	`40.52% <48.27%> (+0.55%)`	⬆️

... and 14 files with indirect coverage changes

zuiderkwast

Finally you convinced me that the current logic has a problem!

This PR solves the issue indeed. I'm inclined to merge it. I just posted some suggestions.

Letting rehash follow probing chains can add a lot of latency to individual commands though. I'm not sure that's worse than adding very little latency to each scan call.

Some context: When debugging a test case of AOF rewrite, I observed probing chains of length 6000. This is when reading an AOF file. Actually I believe all non-empty buckets of the table were in the same probing chain.

Reading an AOF file means reading commands and executing them. The AOF file create by AOF rewrite has the commands stored per key, and the keys are in iterator order = probing order. Do you see the problem?

All commands that hash to the same bucket in a large table come together in the beginning of the AOF file, creating huge probing chains in small tables, before growing kicks in. A SET command takes 300 times longer than normal. I tried to let rehashing follow probing chains, but this got even worse spikes, with long probe chains + rehashing a very long chains.

Here's the test case: https://github.com/valkey-io/valkey/pull/1178/files#r1814492725

If we could let the iterator walk in some other order (such as index++) we could avoid this, but it doesn't completely solve the problem either. I guess there's simply a chance of getting very large probe chains and this cases a severe degradation.

So what can we do? Maybe keep track of not only the number of 'everfulls', but the max lengths of probe chains, and grow the table if it's too long? This might help.

But I think the best solution is Madelyn's idea about chaining instead of probing.

I think we need to implement it. It has other benefits too, like

we can have higher fill factor and we can avoid rehashing indefinitely when there's a fork running;
scan doesn't need to follow chains wrapping around zero, so it won't return duplicates.

src/hashset.c

zuiderkwast · 2024-10-24T15:02:19Z

src/hashset.c

-            hash = idx;
-        } else {
-            hash = hashElement(s, element);
+    uint8_t prev_bucket_everfull = s->tables[0][prevCursor(idx, expToMask(s->bucket_exp[0]))].everfull;


Computing this requires a memory access that's not useful in the growing case. Growing is normally the most performance sensitive situation.

My guess is that it's cheaper to check it only when we need it...

if (s->bucket_exp[1] <= s->bucket_exp[0] && ...here...)

On the other hand, the previous bucket has most likely been rehashed just before, so it's in the L1 cache already. Anyway, I'm not sure if one is better than the other.

Hmm, good point. I realized we only actually need to check when we're first starting a rehash - in all other cases we know that we ended rehashStep with a bucket without everfull set :)

zuiderkwast · 2024-10-24T15:17:44Z

src/hashset.c

-    /* Mask the start cursor to the bigger of the tables, so we can detect if we
+    /* Mask the start cursor to the smaller of the tables, so we can detect if we
     * come back to the start cursor and break the loop. It can happen if enough
     * tombstones (in both tables while rehashing) make us continue scanning. */
-    cursor = cursor & (expToMask(s->bucket_exp[0]) | expToMask(s->bucket_exp[1]));
+    cursor &= expToMask(s->bucket_exp[0]);
+    if (hashsetIsRehashing(s)) {
+        cursor &= expToMask(s->bucket_exp[1]);
+    }


With this change, I guess there's a small increased risk of returning duplicates.

Time 1:

Table size: 16 buckets.

Scan order: 0 -> 8 -> 4 -> 12 -> 2 -> 10 -> 6 -> 14 -> 1 -> ... -> 15.

SCAN 0 returned new cursor 8.

Time 2:

Table sizes 16 and 2. Shrinking started. Scan order: 0 -> 1.

User continues previous scan with SCAN 8.

Old implementation: SCAN 8 returns elements from small table bucket 0 (masked with small table mask) and large table bucket 8 (masked with large table mask). Large table bucket 0 can safely be skipped.

New implementation: SCAN 8 returns elements from small table bucket 0 (masked with small table mask) and large table bucket 0 and 8 (expansion of cursor masked with small table mask).

Both implementaions: Return new cursor 1.

Hmm, I see! Yes, that makes sense. I think it'll be fixed if I don't modify cursor here and instead apply the masks to start_cursor

Yes, maybe. But it doesn't matter much, especially if we want to change probing to chaining anyway... Let's not spend too much effort on this.

src/hashset.c

SoftlyRaining added 2 commits October 23, 2024 22:56

fix rehashing shortcut used in hashsetScan

3879c1b

Signed-off-by: Rain Valentine <rsg000@gmail.com>

forgot clang format again

9bf0c39

Signed-off-by: Rain Valentine <rsg000@gmail.com>

SoftlyRaining mentioned this pull request Oct 23, 2024

Add hashset (new hash table) and use in command lookup #1186

Open

SoftlyRaining commented Oct 23, 2024

View reviewed changes

src/hashset.c Show resolved Hide resolved

zuiderkwast reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hashset feature] Scan shortcut fix #1217

[hashset feature] Scan shortcut fix #1217

SoftlyRaining commented Oct 23, 2024

codecov bot commented Oct 23, 2024 •

edited

Loading

zuiderkwast left a comment

zuiderkwast Oct 24, 2024

SoftlyRaining Oct 24, 2024

zuiderkwast Oct 24, 2024

SoftlyRaining Oct 24, 2024

zuiderkwast Oct 25, 2024

[hashset feature] Scan shortcut fix #1217

Are you sure you want to change the base?

[hashset feature] Scan shortcut fix #1217

Conversation

SoftlyRaining commented Oct 23, 2024

codecov bot commented Oct 23, 2024 • edited Loading

Codecov Report

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast Oct 24, 2024

Choose a reason for hiding this comment

SoftlyRaining Oct 24, 2024

Choose a reason for hiding this comment

zuiderkwast Oct 24, 2024

Choose a reason for hiding this comment

SoftlyRaining Oct 24, 2024

Choose a reason for hiding this comment

zuiderkwast Oct 25, 2024

Choose a reason for hiding this comment

codecov bot commented Oct 23, 2024 •

edited

Loading