Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memory leak when assigning the same key with ttl multiply times #155

Merged
merged 7 commits into from
Nov 19, 2024

Conversation

rustatian
Copy link
Member

@rustatian rustatian commented Nov 19, 2024

Reason for This PR

closes: roadrunner-server/roadrunner#2051

Description of Changes

  • Fix memory leak.
  • Remove redundant callbacks map.
  • Add memory and goroutine leak test.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a concurrent-safe hash map implementation for improved key-value management.
    • Added a new configuration file for the in-memory driver with specific settings.
    • New test for high-load scenarios in the in-memory key-value store.
  • Bug Fixes

    • Enhanced handling of TTL items and callback management within the memory driver.
  • Documentation

    • Updated configuration files to reflect changes in versioning and removed deprecated sections.

Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
@rustatian rustatian added the bug Something isn't working label Nov 19, 2024
@rustatian rustatian self-assigned this Nov 19, 2024
Copy link

coderabbitai bot commented Nov 19, 2024

Walkthrough

The changes introduce a new field callback in the Item struct of the memorykv package, modifying its structure. The Driver struct is updated to replace the sync.Map with a pointer to a new concurrent-safe hash map type hmap, which integrates callback functionality directly into items. A new file map.go implements the hmap type with methods for concurrent access. Configuration files are updated, and a new test function is added to assess the in-memory key-value store under high load.

Changes

File Change Summary
memorykv/item.go Added field: callback *cb to Item struct.
memorykv/kv.go Changed heap from sync.Map to *hmap, removed callbacks, updated methods to use hmap, modified ttlcallback method.
memorykv/map.go Introduced hmap type with methods: newHMap(), Get(), Set(), LoadAndDelete(), Clean(), Delete(), removeEntry().
tests/configs/.rr-in-memory-memory.yaml Added new configuration file for memory driver with version and settings for RPC and logging.
tests/configs/.rr-in-memory.yaml Updated version declaration format, removed otel section.
tests/kv_memory_test.go Added new test function TestSetManyMemory for high load testing of the in-memory key-value store.

Assessment against linked issues

Objective Addressed Explanation
Identify Memory Leak Changes related to callback management are made, but it's unclear if the leak is resolved.
Analyze Callback Function Callback functionality has been integrated into the item structure.
Address Panic Condition No specific changes to handle panic conditions for TTL settings are evident.
Ensure Compatibility Changes appear to align with existing version requirements.

🐰 In the memory store where items play,
Callbacks hop and dance all day.
With hmaps now to hold them tight,
No leaks to fear, just pure delight!
High loads we test, with joy we cheer,
The rabbit's work is done, have no fear! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (8)
tests/configs/.rr-in-memory.yaml (1)

Line range hint 10-13: Document TTL configuration options

Since this PR fixes a memory leak related to TTL assignments, it would be helpful to document the available TTL configuration options in the memory driver section, even if using default values.

Consider adding comments or documentation like:

 kv:
   memory-rr:
     driver: memory
-    config: {}
+    config: {} # Supports TTL configuration for key expiration
memorykv/item.go (1)

4-7: Consider memory lifecycle management for callbacks

Moving the callback into the Item struct helps fix the memory leak, but requires careful consideration of cleanup:

  1. Ensure callbacks are properly cleaned up when items expire or are removed
  2. Consider weak references or cleanup mechanisms to prevent circular references
  3. Document the lifecycle management expectations for implementers
memorykv/map.go (1)

7-16: Add documentation and consider initial capacity.

The implementation would benefit from:

  1. Documentation explaining the purpose of hmap and its thread-safety guarantees
  2. Justification or configuration option for the initial capacity of 10 items

Consider adding documentation and making the capacity configurable:

+// hmap provides a thread-safe hash map implementation for storing Items
+// with proper cleanup of TTL callbacks on deletion.
 type hmap struct {
 	mu    sync.RWMutex
 	items map[string]*Item
 }

+// newHMap creates a new thread-safe hash map with a default initial capacity.
+// TODO: Consider making the initial capacity configurable based on expected load.
 func newHMap() *hmap {
 	return &hmap{
 		items: make(map[string]*Item, 10),
 	}
 }
tests/kv_memory_test.go (1)

97-119: Enhance test documentation and cleanup

A few suggestions to improve the test setup:

  1. Add a comment explaining why this test needs manual execution
  2. Consider adding a cleanup defer to ensure resources are released even if the test fails

Apply these changes:

 func TestSetManyMemory(t *testing.T) {
-    t.Skip("This test is executed manually")
+    t.Skip("Manual test: Requires significant memory and time to reproduce the memory leak")
+    
+    // Ensure cleanup even if test fails
+    defer func() {
+        if err := cont.Stop(); err != nil {
+            t.Errorf("cleanup failed: %v", err)
+        }
+    }()
memorykv/kv.go (4)

183-189: Correct log message for expired TTL

The log message indicates saving without TTL when the TTL is incorrect. Ensure the message accurately reflects the action taken.

Consider updating the log message for clarity:

-	d.log.Warn("incorrect TTL time, saving without it", zap.String("key", items[i].Key()))
+	d.log.Warn("TTL has already expired, saving the item without TTL", zap.String("key", items[i].Key()))

384-389: Ensure all resources are cleaned up on callback stop

In the case <-stopCbCh: block, the timer is not explicitly stopped. Call ta.Stop() to ensure the ticker is properly stopped when the callback receives a stop signal.

 case <-stopCbCh:
     // item stop channel
     d.log.Debug("ttl removed, callback call",
         zap.String("id", hid),
         zap.Int("ttl seconds", cbttl),
     )
+    ta.Stop()
     return

26-27: Redundant mutex pointer in struct

The mu field is a pointer to sync.Mutex, but it can be a value type since sync.Mutex is already a reference type.

Consider changing mu to a value type:

-	mu              *sync.Mutex
+	mu              sync.Mutex

84-84: Consider removing resolved TODO comment

The TODO comment on moving the map to a sync.Pool might have been addressed with the new hmap implementation. If no longer relevant, consider removing it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 67b3cda and 2e76913.

⛔ Files ignored due to path filters (1)
  • go.work.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • memorykv/item.go (1 hunks)
  • memorykv/kv.go (11 hunks)
  • memorykv/map.go (1 hunks)
  • tests/configs/.rr-in-memory-memory.yaml (1 hunks)
  • tests/configs/.rr-in-memory.yaml (1 hunks)
  • tests/kv_memory_test.go (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/configs/.rr-in-memory-memory.yaml
🔇 Additional comments (6)
tests/configs/.rr-in-memory.yaml (1)

Line range hint 1-13: Consider impact of removing OpenTelemetry configuration

Given that this PR addresses a memory leak issue, removing the otel section might impact our ability to monitor and debug similar memory-related issues in the future. Consider keeping or documenting alternative monitoring approaches.

Let's check if there are any other test configurations that retain OpenTelemetry settings:

memorykv/item.go (2)

Line range hint 9-20: LGTM: Getter methods maintain good encapsulation

The getter methods are appropriately unchanged, maintaining encapsulation of the new callback field while preserving the public API.


4-7: Verify the definition and thread-safety of the cb type

The addition of the callback field aligns with the PR objective to fix the memory leak. However, we should verify the implementation details of the cb type.

Let's check the definition and usage of the cb type:

✅ Verification successful

Let me gather more information about how the cb type is used to verify thread-safety.


Let me gather one final piece of information about the TTL callback implementation.


Based on the gathered information, I can now provide the final response about the cb type's thread-safety and implementation:

The cb type implementation is thread-safe and properly manages TTL callbacks

The callback field in the Item struct is well-implemented with proper concurrency handling:

  • Uses buffered channels (updateCh and stopCh) for thread-safe communication
  • Properly cleaned up when items are removed or updated
  • Integrated with the item lifecycle to prevent memory leaks

Key implementation details:

  • Channel cleanup is handled in hmap.Delete() and hmap.Clear()
  • TTL updates are managed through the updateCh channel
  • Graceful shutdown is implemented via the stopCh channel
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find the definition and usage of the cb type
# Expected: Definition of cb type and its thread-safety guarantees

# Find cb type definition
rg -p "type\s+cb\s+" --type go

# Find cb type usage patterns
ast-grep --pattern 'type cb struct { $$$ }'

Length of output: 258


Script:

#!/bin/bash
# Find usage patterns of cb fields and methods
rg -p "updateCh|stopCh" --type go -A 3 -B 3

# Find how callback is initialized and managed
ast-grep --pattern 'callback = &cb{
  $$$
}'

# Find goroutine usage related to callbacks
rg "go\s+func.*callback" --type go -A 5

Length of output: 10408


Script:

#!/bin/bash
# Find the ttlcallback implementation
rg "ttlcallback" --type go -A 10 -B 2

# Find goroutine management for TTL
rg "go\s+func.*ttl" --type go -A 10

Length of output: 1799

memorykv/map.go (1)

1-5: LGTM!

The package structure and imports are clean and appropriate for a concurrent map implementation.

tests/kv_memory_test.go (1)

9-9: LGTM: Appropriate imports for memory leak detection

The addition of runtime for goroutine monitoring and require for strict error checking aligns well with the test's objective of detecting memory leaks.

Also applies to: 27-27

memorykv/kv.go (1)

292-295: Handle missing TTL in TTL method

In the TTL method, if the item exists but doesn't have a TTL, the returned TTL might be empty. Ensure this case is handled appropriately.

Verify how the method behaves when items without TTL are requested. Consider returning a default value or indicating that the TTL is not set.

memorykv/map.go Outdated Show resolved Hide resolved
memorykv/map.go Show resolved Hide resolved
memorykv/map.go Show resolved Hide resolved
memorykv/map.go Outdated Show resolved Hide resolved
memorykv/map.go Outdated Show resolved Hide resolved
memorykv/kv.go Outdated Show resolved Hide resolved
memorykv/kv.go Show resolved Hide resolved
memorykv/kv.go Show resolved Hide resolved
memorykv/kv.go Outdated Show resolved Hide resolved
memorykv/kv.go Show resolved Hide resolved
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
memorykv/kv.go (1)

84-84: Consider implementing the sync.Pool suggestion

The TODO comment suggests using sync.Pool for map allocation. This could improve performance by reducing GC pressure.

Would you like me to help implement this optimization or create an issue to track it?

memorykv/map.go (1)

39-44: Refactor Duplication: Extract Callback Cleanup into Helper Function

The logic for cleaning up item callbacks is duplicated in the LoadAndDelete, Clean, and Delete methods. Refactoring this code into a helper function will improve maintainability and reduce repetition.

You can create a helper function like this:

func (h *hmap) cleanupCallback(item *Item) {
	if item != nil && item.callback != nil {
		select {
		case item.callback.stopCh <- struct{}{}:
		default:
			// Channel is full or closed, continue
		}
	}
}

And update the methods accordingly. For example, in LoadAndDelete:

 func (h *hmap) LoadAndDelete(key string) (*Item, bool) {
 	h.mu.Lock()
 	defer h.mu.Unlock()

 	item, ok := h.items[key]
 	if ok {
-		if item.callback != nil {
-			select {
-			case item.callback.stopCh <- struct{}{}:
-			default:
-			}
-		}
+		h.cleanupCallback(item)
 		delete(h.items, key)
 	}

 	return item, ok
 }

Apply similar changes to Delete and Clean methods.

Also applies to: 57-63, 75-80

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2e76913 and 1b8af58.

📒 Files selected for processing (2)
  • memorykv/kv.go (11 hunks)
  • memorykv/map.go (1 hunks)
🔇 Additional comments (6)
memorykv/kv.go (5)

26-32: LGTM: Improved synchronization and data structure

The changes improve the design by:

  • Consolidating item and callback management in a single data structure
  • Adding proper synchronization with mutex

50-53: LGTM: Proper initialization of new fields

The initialization of mutex and hmap is correct and consistent with the struct changes.


167-171: LGTM: Proper cleanup of duplicates

The code now properly removes duplicates before setting new items, which helps prevent memory leaks from orphaned callbacks.


328-333: LGTM: Proper callback cleanup in Delete

The code properly cleans up callbacks when deleting items, preventing memory leaks.


375-377: ⚠️ Potential issue

Replace removeEntry with Delete in callback

Using removeEntry in the callback could lead to memory leaks as it bypasses callback cleanup. This is consistent with the issue mentioned in past reviews.

Replace removeEntry with Delete in both TTL expiration and broadcast stop cases:

-d.heap.removeEntry(hid)
+d.heap.Delete(hid)

Also applies to: 385-387

memorykv/map.go (1)

26-31: Existing Issue: Memory Leak in Set Method

As previously noted, the Set method overwrites existing items without cleaning up their callbacks, potentially leading to memory leaks from orphaned goroutines. Please refer to the earlier review comment for details on addressing this issue.

memorykv/kv.go Show resolved Hide resolved
memorykv/kv.go Show resolved Hide resolved
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
tests/kv_memory_test.go (3)

96-99: Remove empty cleanup function

The empty cleanup function serves no purpose and should be removed.

-	t.Cleanup(func() {
-
-	})

164-167: Improve baseline metrics collection

Consider capturing multiple baseline samples to account for normal variations in memory usage.

+    // Collect multiple baseline samples
+    var baselineStats []runtime.MemStats
+    var baselineGoroutines []int
+    for i := 0; i < 3; i++ {
+        runtime.GC()
+        ms := &runtime.MemStats{}
+        runtime.ReadMemStats(ms)
+        baselineStats = append(baselineStats, *ms)
+        baselineGoroutines = append(baselineGoroutines, runtime.NumGoroutine())
+        time.Sleep(time.Second)
+    }
+    
+    // Use average or maximum as baseline
+    var maxAlloc uint64
+    var maxGoroutines int
+    for i := range baselineStats {
+        if baselineStats[i].Alloc > maxAlloc {
+            maxAlloc = baselineStats[i].Alloc
+        }
+        if baselineGoroutines[i] > maxGoroutines {
+            maxGoroutines = baselineGoroutines[i]
+        }
+    }
-    ms := &runtime.MemStats{}
-    runtime.ReadMemStats(ms)
-    prevAlloc := ms.Alloc
-    ngprev := runtime.NumGoroutine()

201-204: Add progress monitoring during the test

Consider adding periodic memory stats collection to better understand memory growth patterns.

     for i := 0; i < 10000; i++ {
+        if i > 0 && i%1000 == 0 {
+            runtime.GC()
+            ms := &runtime.MemStats{}
+            runtime.ReadMemStats(ms)
+            t.Logf("Iteration %d - Memory: %d MB, Goroutines: %d", 
+                i, ms.Alloc/1024/1024, runtime.NumGoroutine())
+        }
         err = client.Call("kv.Set", data, ret)
         require.NoError(t, err)
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8af58 and 9b6e1df.

📒 Files selected for processing (1)
  • tests/kv_memory_test.go (4 hunks)

tests/kv_memory_test.go Show resolved Hide resolved
tests/kv_memory_test.go Show resolved Hide resolved
Signed-off-by: Valery Piashchynski <piashchynski.valery@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
tests/kv_memory_test.go (2)

172-174: Enhance connection cleanup

While the connection is properly closed in the defer block, we should also ensure cleanup of any in-flight requests.

 defer func() {
+    // Cancel any in-flight requests
+    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+    defer cancel()
+    _ = client.Call("kv.Clear", &kvProto.Request{Storage: "memory-rr"}, &kvProto.Response{})
     _ = client.Close()
 }()

204-207: Add progress logging and memory metrics collection

The test loop should log progress and collect metrics periodically to help diagnose any issues.

 for i := 0; i < 10000; i++ {
+    if i%1000 == 0 {
+        runtime.ReadMemStats(ms)
+        t.Logf("Progress: %d/10000, Memory: %d MB, Goroutines: %d",
+            i, ms.Alloc/1024/1024, runtime.NumGoroutine())
+    }
     err = client.Call("kv.Set", data, ret)
     require.NoError(t, err)
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9b6e1df and 66a08bd.

📒 Files selected for processing (1)
  • tests/kv_memory_test.go (4 hunks)
🔇 Additional comments (1)
tests/kv_memory_test.go (1)

164-168: 🛠️ Refactor suggestion

Improve memory leak detection methodology

The current implementation uses hardcoded thresholds which might not be reliable across different environments. Consider using relative thresholds and more sophisticated detection methods.

The previous review already suggested improvements to the leak detection logic. That suggestion is still valid and should be implemented to make the test more robust.

Also applies to: 210-225

tests/kv_memory_test.go Show resolved Hide resolved
@rustatian rustatian merged commit f1c0fe2 into master Nov 19, 2024
9 checks passed
@rustatian rustatian deleted the fix/extensive_memory_allocations branch November 19, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[🐛 BUG]: Memory Leak in KV 'in-memory' driver
1 participant