Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

Merged
merged 5 commits into from
Nov 22, 2018
Merged

config, util.kvcache: support the memory guard to prevent OOM for the plan cache #8339

merged 5 commits into from
Nov 22, 2018

Conversation

dbjoa
Copy link
Contributor

@dbjoa dbjoa commented Nov 16, 2018

What problem does this PR solve?

Fix #8330

What is changed and how it works?

  • add new configuration parameters to represent the memory guard ratio and the total memory size to prevent OOM for the plan cache
  • cache a new input item only if the ratio of the available memory to the total memory size is larger than the memory guard ratio
  • compute the current memory usage approximately by using gopsutil.VirtualMemory() whenever the new element is put

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
  • Has changed the default value
  • Has added a new configuration parameter

Side effects

  • None

Related changes

  • Need to be included in the release note

This change is Reviewable

@sre-bot
Copy link
Contributor

sre-bot commented Nov 16, 2018

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 16, 2018

/run-all-tests

@lysu lysu added contribution This PR is from a community contributor. sig/planner SIG: Planner labels Nov 16, 2018
@lysu
Copy link
Contributor

lysu commented Nov 16, 2018

Hi @dbjoa Thanks for your PR!

but use runtime.ReadMemStats to calcualte memory maybe have some question need be address

  1. ReadMemStats is global runtime stats

there are maybe other memory alloc happen between 'memBefore' and 'memAfter', e.g. other user request's goroutine are doing hash join.

  1. ReadMemStats seem be heavy operation.
func ReadMemStats(m *MemStats) {
	stopTheWorld("read mem stats")

	systemstack(func() {
		readmemstats_m(m)
	})

	startTheWorld()
}

itself do STW

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 16, 2018

@lysu I also aware that. If new prepared statements are put into the plan cache massively, the PR will be the cause of the performance drop. However, IMHO, the scenario might not be practical because the prepared statements are determined in advance and the number of them can be limited.

@dbjoa dbjoa changed the title [WIP] config, util.kvcache: support the byte-sized capacity for the plan cache config, util.kvcache: support the memory guard to prevent OOM for the plan cache Nov 19, 2018
@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 19, 2018

/run-all-tests

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 19, 2018

/run-integration-ddl-test

2 similar comments
@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 19, 2018

/run-integration-ddl-test

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 19, 2018

/run-integration-ddl-test

var rtm runtime.MemStats
runtime.ReadMemStats(&rtm)

if rtm.Alloc > uint64(float64(l.quota)*(1.0-l.guard)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • MemQuotaQuery seems to be a session level variable controlling memory usage for each SQL query, while runtime.ReadMemStats get memory usage at tidb-server process level, these 2 are not comparable?
  • if we simply return when guard ratio is hit, it is inconsistent with virtue of LRU. Shall we remove cache entries from back until we can safely put this entry in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eurekaka
(1) I agree that MemQuotaQuery is not the same level. I should define another configuration to represent the system-wide memory quota or detect the system memory size.
(2) Sure, we can remove the least recently used items until the memory guard condition is met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated PR addresses the two comments.

@zz-jason
Copy link
Member

zz-jason commented Nov 20, 2018

How about this solution:

  1. limit the global memory usage of the plan cache, we can use a new config item to achieve this.
  2. use https://github.com/OneOfOne/go-utils/tree/master/memory/ to calculate the memory usage of the to-be-cached plan and key.
  3. remove the least used cache item from that session level plan cache until the to-be-cached plan is safe to be pushed into the cache?

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 20, 2018

@zz-jason
I've tested the method mentioned by you already. Due to its performance issue, I had to abandon the using of memory.SizeOf. That is, for a simple plan, TableReader, the SizeOf`() takes 7.1s to compute the plan size (103449103). Here is an example for a simple query:

mysql> prepare stmt_sel2 from 'select * from prepare_test where id > ?';
mysql> set @a=1;execute stmt_sel2 using @a;

How about to remove the least recently used items until the memory guard condition is met?
(Note: this method should call runtime.ReadMemStats whenever deleting the item)

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 20, 2018

/run-all-tests

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 20, 2018

/run-sqllogic-test

2 similar comments
@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 20, 2018

/run-sqllogic-test

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 21, 2018

/run-sqllogic-test

plannercore.PreparedPlanCacheMemoryGuardRatio = cfg.PreparedPlanCache.MemoryGuardRatio
plannercore.PreparedPlanCacheMaxMemory = cfg.Performance.MaxMemory
if plannercore.PreparedPlanCacheMaxMemory == 0 {
v, err := mem.VirtualMemory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/shirou/gopsutil/blob/3b882b034ca24606010516bb521239cfcaf69cbd/mem/mem_linux.go#L40, though the function name is VirtualMemory(), v.Total actually is the physical memory of the machine, because the data source is MemTotal of /proc/meminfo.

While ReadMemStats returns golang memory allocator statistics, so it should be virtual memory consumption of this process(see also https://github.com/golang/go/blob/ae65615fd8784919f11e744b3a26d9dfa844c222/src/runtime/mstats.go#L573), so the check would be too strict indeed.

I still think it is pretty tricky to do memory accounting, it is kind of difficult for the DBA to set a proper value for this parameter, especially in cases where multiple TiDB servers are deployed in a single machine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eurekaka
Thank you for the detailed comments.

The updated PR computes the total memory and the used one from the physical memory only. Thus, the PR should resolve the semantic mismatch in the previous PR.

I agree that setting a proper value is not simple. However, we should have provided a way to automatically calculate the value or allow the DBA to define the value in order to reduce or prevent the chance of OOM.

@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 22, 2018

/run-all-tests

func MemUsed() (uint64, error) {
v, err := mem.VirtualMemory()
return v.Total - (v.Free + v.Buffers + v.Cached), err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please write a bench test for these 2 functions to see how expensive they are? thanks.

Copy link
Contributor Author

@dbjoa dbjoa Nov 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated PR should include the bench test. Here are the result on my local machine (Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz).

% ~/dev/deps/go/src/github.com/pingcap/tidb/util/memory$ /usr/local/go/bin/go test -v github.com/pingcap/tidb/util/memory -bench "^BenchmarkMemTotal|BenchmarkMemUsed$" -run ^$      
goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/util/memory
BenchmarkMemTotal-8        20000             95504 ns/op
BenchmarkMemUsed-8         20000             92750 ns/op
PASS
ok      github.com/pingcap/tidb/util/memory     5.735s

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 22, 2018
@eurekaka
Copy link
Contributor

@zz-jason @lysu PTAL

// MemUsed returns the total used amount of RAM on this system
func MemUsed() (uint64, error) {
v, err := mem.VirtualMemory()
return v.Total - (v.Free + v.Buffers + v.Cached), err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly return v.Used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.Used is v.Total - v.Free. That is, v.Free does not count v.Buffers and v.Cached, which can be a free memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

@@ -287,6 +289,7 @@ var defaultConf = Config{
MetricsInterval: 15,
},
Performance: Performance{
MaxMemory: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the configed MaxMemory is larger than the total memory of the machine, should we adjust this value to the total memory of that machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated PR should address the issue.

if memUsed > uint64(float64(l.quota)*(1.0-l.guard)) {
memUsed, err = memory.MemUsed()
if err != nil {
memUsed = math.MaxUint64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set memUsed to math.MaxUint64 is equals to clear all the cached plans, how about directly clears the cache? thus we can break the for loop early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated PR should address the issue.

2. introduce kvcache.DeleteAll() to clear the cache early
zz-jason
zz-jason previously approved these changes Nov 22, 2018
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 22, 2018
@dbjoa
Copy link
Contributor Author

dbjoa commented Nov 22, 2018

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants