config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

dbjoa · 2018-11-16T07:41:24Z

What problem does this PR solve?

Fix #8330

What is changed and how it works?

add new configuration parameters to represent the memory guard ratio and the total memory size to prevent OOM for the plan cache
cache a new input item only if the ratio of the available memory to the total memory size is larger than the memory guard ratio
compute the current memory usage approximately by using gopsutil.VirtualMemory() whenever the new element is put

Check List

Tests

Unit test

Code changes

Has exported function/method change
Has changed the default value
Has added a new configuration parameter

Side effects

None

Related changes

Need to be included in the release note

This change is

sre-bot · 2018-11-16T07:41:27Z

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

dbjoa · 2018-11-16T07:41:53Z

/run-all-tests

lysu · 2018-11-16T07:53:41Z

Hi @dbjoa Thanks for your PR!

but use runtime.ReadMemStats to calcualte memory maybe have some question need be address

ReadMemStats is global runtime stats

there are maybe other memory alloc happen between 'memBefore' and 'memAfter', e.g. other user request's goroutine are doing hash join.

ReadMemStats seem be heavy operation.

func ReadMemStats(m *MemStats) {
	stopTheWorld("read mem stats")

	systemstack(func() {
		readmemstats_m(m)
	})

	startTheWorld()
}

itself do STW

dbjoa · 2018-11-16T08:09:12Z

@lysu I also aware that. If new prepared statements are put into the plan cache massively, the PR will be the cause of the performance drop. However, IMHO, the scenario might not be practical because the prepared statements are determined in advance and the number of them can be limited.

dbjoa · 2018-11-19T11:56:42Z

/run-all-tests

dbjoa · 2018-11-19T13:19:18Z

/run-integration-ddl-test

dbjoa · 2018-11-19T13:34:33Z

/run-integration-ddl-test

dbjoa · 2018-11-19T23:14:32Z

/run-integration-ddl-test

eurekaka · 2018-11-20T02:50:52Z

util/kvcache/simple_lru.go

+	var rtm runtime.MemStats
+	runtime.ReadMemStats(&rtm)
+
+	if rtm.Alloc > uint64(float64(l.quota)*(1.0-l.guard)) {


MemQuotaQuery seems to be a session level variable controlling memory usage for each SQL query, while runtime.ReadMemStats get memory usage at tidb-server process level, these 2 are not comparable?

if we simply return when guard ratio is hit, it is inconsistent with virtue of LRU. Shall we remove cache entries from back until we can safely put this entry in?

@eurekaka
(1) I agree that MemQuotaQuery is not the same level. I should define another configuration to represent the system-wide memory quota or detect the system memory size.
(2) Sure, we can remove the least recently used items until the memory guard condition is met.

The updated PR addresses the two comments.

zz-jason · 2018-11-20T03:01:39Z

How about this solution:

limit the global memory usage of the plan cache, we can use a new config item to achieve this.
use https://github.com/OneOfOne/go-utils/tree/master/memory/ to calculate the memory usage of the to-be-cached plan and key.
remove the least used cache item from that session level plan cache until the to-be-cached plan is safe to be pushed into the cache?

dbjoa · 2018-11-20T05:07:40Z

@zz-jason
I've tested the method mentioned by you already. Due to its performance issue, I had to abandon the using of memory.SizeOf. That is, for a simple plan, TableReader, the SizeOf`() takes 7.1s to compute the plan size (103449103). Here is an example for a simple query:

mysql> prepare stmt_sel2 from 'select * from prepare_test where id > ?';
mysql> set @a=1;execute stmt_sel2 using @a;

How about to remove the least recently used items until the memory guard condition is met?
(Note: this method should call runtime.ReadMemStats whenever deleting the item)

dbjoa · 2018-11-20T08:37:36Z

/run-all-tests

dbjoa · 2018-11-20T11:55:45Z

/run-sqllogic-test

dbjoa · 2018-11-20T14:06:33Z

/run-sqllogic-test

dbjoa · 2018-11-21T05:08:37Z

/run-sqllogic-test

eurekaka · 2018-11-21T08:48:09Z

tidb-server/main.go

+		plannercore.PreparedPlanCacheMemoryGuardRatio = cfg.PreparedPlanCache.MemoryGuardRatio
+		plannercore.PreparedPlanCacheMaxMemory = cfg.Performance.MaxMemory
+		if plannercore.PreparedPlanCacheMaxMemory == 0 {
+			v, err := mem.VirtualMemory()


According to https://github.com/shirou/gopsutil/blob/3b882b034ca24606010516bb521239cfcaf69cbd/mem/mem_linux.go#L40, though the function name is VirtualMemory(), v.Total actually is the physical memory of the machine, because the data source is MemTotal of /proc/meminfo.

While ReadMemStats returns golang memory allocator statistics, so it should be virtual memory consumption of this process(see also https://github.com/golang/go/blob/ae65615fd8784919f11e744b3a26d9dfa844c222/src/runtime/mstats.go#L573), so the check would be too strict indeed.

I still think it is pretty tricky to do memory accounting, it is kind of difficult for the DBA to set a proper value for this parameter, especially in cases where multiple TiDB servers are deployed in a single machine.

@eurekaka
Thank you for the detailed comments.

The updated PR computes the total memory and the used one from the physical memory only. Thus, the PR should resolve the semantic mismatch in the previous PR.

I agree that setting a proper value is not simple. However, we should have provided a way to automatically calculate the value or allow the DBA to define the value in order to reduce or prevent the chance of OOM.

dbjoa · 2018-11-22T02:43:41Z

/run-all-tests

eurekaka · 2018-11-22T03:18:35Z

util/memory/meminfo.go

+func MemUsed() (uint64, error) {
+	v, err := mem.VirtualMemory()
+	return v.Total - (v.Free + v.Buffers + v.Cached), err
+}


Could you please write a bench test for these 2 functions to see how expensive they are? thanks.

The updated PR should include the bench test. Here are the result on my local machine (Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz).

% ~/dev/deps/go/src/github.com/pingcap/tidb/util/memory$ /usr/local/go/bin/go test -v github.com/pingcap/tidb/util/memory -bench "^BenchmarkMemTotal|BenchmarkMemUsed$" -run ^$ goos: linux goarch: amd64 pkg: github.com/pingcap/tidb/util/memory BenchmarkMemTotal-8 20000 95504 ns/op BenchmarkMemUsed-8 20000 92750 ns/op PASS ok github.com/pingcap/tidb/util/memory 5.735s

eurekaka

LGTM

eurekaka · 2018-11-22T08:55:54Z

@zz-jason @lysu PTAL

zz-jason · 2018-11-22T09:08:04Z

util/memory/meminfo.go

+// MemUsed returns the total used amount of RAM on this system
+func MemUsed() (uint64, error) {
+	v, err := mem.VirtualMemory()
+	return v.Total - (v.Free + v.Buffers + v.Cached), err


Why not directly return v.Used?

v.Used is v.Total - v.Free. That is, v.Free does not count v.Buffers and v.Cached, which can be a free memory.

zz-jason · 2018-11-22T09:11:49Z

config/config.go

@@ -287,6 +289,7 @@ var defaultConf = Config{
 		MetricsInterval: 15,
 	},
 	Performance: Performance{
+		MaxMemory:           0,


If the configed MaxMemory is larger than the total memory of the machine, should we adjust this value to the total memory of that machine?

The updated PR should address the issue.

zz-jason · 2018-11-22T10:22:05Z

util/kvcache/simple_lru.go

+		if memUsed > uint64(float64(l.quota)*(1.0-l.guard)) {
+			memUsed, err = memory.MemUsed()
+			if err != nil {
+				memUsed = math.MaxUint64


set memUsed to math.MaxUint64 is equals to clear all the cached plans, how about directly clears the cache? thus we can break the for loop early.

The updated PR should address the issue.

… plan cache

2. introduce kvcache.DeleteAll() to clear the cache early

zz-jason

LGTM

dbjoa · 2018-11-22T12:18:59Z

/run-all-tests

…ache

lysu added contribution This PR is from a community contributor. sig/planner SIG: Planner labels Nov 16, 2018

dbjoa changed the title ~~[WIP] config, util.kvcache: support the byte-sized capacity for the plan cache~~ config, util.kvcache: support the memory guard to prevent OOM for the plan cache Nov 19, 2018

eurekaka reviewed Nov 20, 2018

View reviewed changes

eurekaka reviewed Nov 21, 2018

View reviewed changes

eurekaka reviewed Nov 22, 2018

View reviewed changes

eurekaka added the status/all tests passed label Nov 22, 2018

eurekaka reviewed Nov 22, 2018

View reviewed changes

eurekaka added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 22, 2018

zz-jason reviewed Nov 22, 2018

View reviewed changes

dbjoa added 2 commits November 22, 2018 20:27

config, util.kvcache: support the memory guard to prevent OOM for the…

33bb20a

… plan cache

add the bench test and the license info

acd7ce5

1. adjuest the configuration values if they are in out of ranges

256367a

2. introduce kvcache.DeleteAll() to clear the cache early

zz-jason previously approved these changes Nov 22, 2018

View reviewed changes

dbjoa dismissed zz-jason’s stale review via 256367a November 22, 2018 11:34

zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 22, 2018

fix the CI errors

20b6f62

zz-jason approved these changes Nov 22, 2018

View reviewed changes

Merge branch 'master' into support-byte-sized-capacity-for-the-plan-c…

1c20285

…ache

zz-jason merged commit 84d1299 into pingcap:master Nov 22, 2018

XuHuaiyu mentioned this pull request Dec 3, 2018

planner, executor: set memory size to MAXUint64 for prepare plan cache test #8556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

dbjoa commented Nov 16, 2018 •

edited

Loading

sre-bot commented Nov 16, 2018

dbjoa commented Nov 16, 2018

lysu commented Nov 16, 2018

dbjoa commented Nov 16, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

eurekaka Nov 20, 2018

dbjoa Nov 20, 2018

dbjoa Nov 21, 2018

zz-jason commented Nov 20, 2018 •

edited

Loading

dbjoa commented Nov 20, 2018 •

edited

Loading

dbjoa commented Nov 20, 2018

dbjoa commented Nov 20, 2018

dbjoa commented Nov 20, 2018

dbjoa commented Nov 21, 2018

eurekaka Nov 21, 2018

dbjoa Nov 22, 2018

dbjoa commented Nov 22, 2018

eurekaka Nov 22, 2018

dbjoa Nov 22, 2018 •

edited

Loading

eurekaka left a comment

eurekaka commented Nov 22, 2018

zz-jason Nov 22, 2018

dbjoa Nov 22, 2018

zz-jason Nov 22, 2018

zz-jason Nov 22, 2018

dbjoa Nov 22, 2018

zz-jason Nov 22, 2018

dbjoa Nov 22, 2018

zz-jason left a comment

dbjoa commented Nov 22, 2018

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

Conversation

dbjoa commented Nov 16, 2018 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

sre-bot commented Nov 16, 2018

dbjoa commented Nov 16, 2018

lysu commented Nov 16, 2018

dbjoa commented Nov 16, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

dbjoa commented Nov 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zz-jason commented Nov 20, 2018 • edited Loading

dbjoa commented Nov 20, 2018 • edited Loading

dbjoa commented Nov 20, 2018

dbjoa commented Nov 20, 2018

dbjoa commented Nov 20, 2018

dbjoa commented Nov 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbjoa commented Nov 22, 2018

Choose a reason for hiding this comment

dbjoa Nov 22, 2018 • edited Loading

Choose a reason for hiding this comment

eurekaka left a comment

Choose a reason for hiding this comment

eurekaka commented Nov 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zz-jason left a comment

Choose a reason for hiding this comment

dbjoa commented Nov 22, 2018

dbjoa commented Nov 16, 2018 •

edited

Loading

zz-jason commented Nov 20, 2018 •

edited

Loading

dbjoa commented Nov 20, 2018 •

edited

Loading

dbjoa Nov 22, 2018 •

edited

Loading