-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339
config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339
Conversation
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
/run-all-tests |
Hi @dbjoa Thanks for your PR! but use
there are maybe other memory alloc happen between 'memBefore' and 'memAfter', e.g. other user request's goroutine are doing hash join.
itself do STW |
@lysu I also aware that. If new prepared statements are put into the plan cache massively, the PR will be the cause of the performance drop. However, IMHO, the scenario might not be practical because the prepared statements are determined in advance and the number of them can be limited. |
/run-all-tests |
/run-integration-ddl-test |
2 similar comments
/run-integration-ddl-test |
/run-integration-ddl-test |
util/kvcache/simple_lru.go
Outdated
var rtm runtime.MemStats | ||
runtime.ReadMemStats(&rtm) | ||
|
||
if rtm.Alloc > uint64(float64(l.quota)*(1.0-l.guard)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemQuotaQuery
seems to be a session level variable controlling memory usage for each SQL query, whileruntime.ReadMemStats
get memory usage attidb-server
process level, these 2 are not comparable?- if we simply return when guard ratio is hit, it is inconsistent with virtue of LRU. Shall we remove cache entries from back until we can safely put this entry in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eurekaka
(1) I agree that MemQuotaQuery
is not the same level. I should define another configuration to represent the system-wide memory quota or detect the system memory size.
(2) Sure, we can remove the least recently used items until the memory guard condition is met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated PR addresses the two comments.
How about this solution:
|
@zz-jason
How about to remove the least recently used items until the memory guard condition is met? |
/run-all-tests |
/run-sqllogic-test |
2 similar comments
/run-sqllogic-test |
/run-sqllogic-test |
tidb-server/main.go
Outdated
plannercore.PreparedPlanCacheMemoryGuardRatio = cfg.PreparedPlanCache.MemoryGuardRatio | ||
plannercore.PreparedPlanCacheMaxMemory = cfg.Performance.MaxMemory | ||
if plannercore.PreparedPlanCacheMaxMemory == 0 { | ||
v, err := mem.VirtualMemory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://github.com/shirou/gopsutil/blob/3b882b034ca24606010516bb521239cfcaf69cbd/mem/mem_linux.go#L40, though the function name is VirtualMemory()
, v.Total
actually is the physical memory of the machine, because the data source is MemTotal
of /proc/meminfo
.
While ReadMemStats
returns golang memory allocator statistics, so it should be virtual memory consumption of this process(see also https://github.com/golang/go/blob/ae65615fd8784919f11e744b3a26d9dfa844c222/src/runtime/mstats.go#L573), so the check would be too strict indeed.
I still think it is pretty tricky to do memory accounting, it is kind of difficult for the DBA to set a proper value for this parameter, especially in cases where multiple TiDB servers are deployed in a single machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eurekaka
Thank you for the detailed comments.
The updated PR computes the total memory and the used one from the physical memory only. Thus, the PR should resolve the semantic mismatch in the previous PR.
I agree that setting a proper value is not simple. However, we should have provided a way to automatically calculate the value or allow the DBA to define the value in order to reduce or prevent the chance of OOM.
/run-all-tests |
func MemUsed() (uint64, error) { | ||
v, err := mem.VirtualMemory() | ||
return v.Total - (v.Free + v.Buffers + v.Cached), err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please write a bench test for these 2 functions to see how expensive they are? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated PR should include the bench test. Here are the result on my local machine (Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz).
% ~/dev/deps/go/src/github.com/pingcap/tidb/util/memory$ /usr/local/go/bin/go test -v github.com/pingcap/tidb/util/memory -bench "^BenchmarkMemTotal|BenchmarkMemUsed$" -run ^$
goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/util/memory
BenchmarkMemTotal-8 20000 95504 ns/op
BenchmarkMemUsed-8 20000 92750 ns/op
PASS
ok github.com/pingcap/tidb/util/memory 5.735s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// MemUsed returns the total used amount of RAM on this system | ||
func MemUsed() (uint64, error) { | ||
v, err := mem.VirtualMemory() | ||
return v.Total - (v.Free + v.Buffers + v.Cached), err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not directly return v.Used
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v.Used
is v.Total - v.Free
. That is, v.Free
does not count v.Buffers
and v.Cached
, which can be a free memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
@@ -287,6 +289,7 @@ var defaultConf = Config{ | |||
MetricsInterval: 15, | |||
}, | |||
Performance: Performance{ | |||
MaxMemory: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the configed MaxMemory
is larger than the total memory of the machine, should we adjust this value to the total memory of that machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated PR should address the issue.
util/kvcache/simple_lru.go
Outdated
if memUsed > uint64(float64(l.quota)*(1.0-l.guard)) { | ||
memUsed, err = memory.MemUsed() | ||
if err != nil { | ||
memUsed = math.MaxUint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set memUsed to math.MaxUint64
is equals to clear all the cached plans, how about directly clears the cache? thus we can break the for loop early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated PR should address the issue.
2. introduce kvcache.DeleteAll() to clear the cache early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
What problem does this PR solve?
Fix #8330
What is changed and how it works?
gopsutil.VirtualMemory()
whenever the new element is putCheck List
Tests
Code changes
Side effects
Related changes
This change is