Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk fullness test #25051

Merged
merged 2 commits into from
May 21, 2018
Merged

Add disk fullness test #25051

merged 2 commits into from
May 21, 2018

Conversation

kaavee315
Copy link
Contributor

@kaavee315 kaavee315 commented Apr 24, 2018

Add a test to check the behaviour of cockroach cluster when disk is near full.
This test currently runs a workload on the cluster. Then it fills the cluster
using a ballast file to a given fraction. The test does an insert and delete
in loop to check if GC can keep in sync

Release note (cli change): add the cockroach debug ballast command. It can be used to create ballast file to fill the cockroach store directory

@kaavee315 kaavee315 requested a review from a team April 24, 2018 19:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@kaavee315 kaavee315 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix all the formatting issues

Makefile Outdated
@@ -1321,7 +1321,8 @@ bins := \
bin/teamcity-trigger \
bin/urlcheck \
bin/workload \
bin/zerosum
bin/zerosum \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change order

@kaavee315 kaavee315 force-pushed the try_disk_usage branch 9 times, most recently from d06d037 to cdba17f Compare April 26, 2018 20:15
@kaavee315 kaavee315 changed the title [Do not review] Add disk usage test [WIP] Add disk usage test Apr 27, 2018
@kaavee315 kaavee315 force-pushed the try_disk_usage branch 2 times, most recently from 26ac542 to 61771d8 Compare April 29, 2018 07:29
@kaavee315 kaavee315 force-pushed the try_disk_usage branch 2 times, most recently from 543b2b5 to 843c740 Compare May 7, 2018 11:02
@kaavee315 kaavee315 changed the title [WIP] Add disk usage test Add disk usage test May 7, 2018
@kaavee315 kaavee315 changed the title Add disk usage test Add disk fullness test May 7, 2018
@tbg
Copy link
Member

tbg commented May 8, 2018

This looks pretty good, thanks for the work so far! I have some orthogonal suggestion on the ballast file creation, would be curious if you're interested in picking that up.


Reviewed 6 of 6 files at r3.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/cmd/fill_ballast/main.go, line 30 at r3 (raw file):

	if *fillRatio > 0 && *diskLeftInBytes > 0 {
		fmt.Printf(
			"fill_ratio or disk_left_bytes expected to be positive, " +

Prepend "exactly one of"

Do you also want to check the case in which neither is supplied and fail here?


pkg/cmd/fill_ballast/main.go, line 59 at r3 (raw file):

		os.Exit(1)
	}
	if used == toFill {

This is an odd check, why not fail on equality?


pkg/cmd/fill_ballast/main.go, line 63 at r3 (raw file):

	}
	ballastSize := toFill - used
	createBallastCommand := exec.Command("fallocate", "-l", fmt.Sprint(ballastSize), *ballastFile)

I discussed the idea of a ballast file with @bdarnell and we'd be interested in embedding this in the cockroach binary itself, under ./cockroach debug ballast. Ideally this would also work on OSX, windows, and on Linux filesystems that don't support fallocate or where fallocate is not available. This isn't exactly the use case you have in mind here and I understand if you don't want to get sidetracked, but if you have the capacity, I think this can work:

./cockroach debug ballast --size=X somefile

where X is either of

  • 80%, .8: create file of size 80% of total disk.
  • 5g, 500mb, 129135kb, ...: create file of specified size.
  • -20%, -.2: allocate a file so large that 20% of the disk is left empty (corresponds to fillRatio)
  • -60gb, ...: allocate a file so large that 60gb are left empty (corresponds to diskLeftInBytes).

(the code to parse all of these is already used by, I think, the --store flag).

I assume users usually would use something like 5gb.

For the "portable" implementation, I'd just blindly try to use fallocate and if that doesn't work, fall back to explicitly creating the file and writing to it (DIRECT_IO, where available, could be used but I don't particularly care about these performance optimizations initially, though they'd be nice).


pkg/cmd/roachtest/cluster.go, line 778 at r3 (raw file):

}

func (c *cluster) getDiskUsageInByte(ctx context.Context, node nodeListOption) (int, error) {

Could you make this a method that takes a cluster instead of putting it on the cluster?


pkg/cmd/roachtest/disk_space.go, line 88 at r3 (raw file):

		t.Fatal(err)
	}
	ballastFilePath := "data/ballast_file_to_fill_store"

Use something like {store-dir}/ballast_file


pkg/cmd/roachtest/disk_space.go, line 91 at r3 (raw file):

	for i := 1; i <= numNodes; i++ {
		fillBallastCommand := []string{fillBallastBinary, "--ballast_file", ballastFilePath}
		if tc.ratioDiskFilled !=nil {

nit: ratioDiskFilled looks like it should be a map from index to int, same for diskEmptyInBytes.


pkg/cmd/roachtest/disk_space.go, line 113 at r3 (raw file):

	diskUsages := func() []int {
		diskUsages := make([]int, numNodes)

Rename this variable, I got confused for a second.


pkg/cmd/roachtest/disk_space.go, line 124 at r3 (raw file):

	}

	const stmtZone = `ALTER RANGE default EXPERIMENTAL CONFIGURE ZONE '

You could also make this a one-liner with CONFIGURE ZONE 'gc: {ttlseconds: 10}'.


pkg/cmd/roachtest/disk_space.go, line 143 at r3 (raw file):

// Simluate hot row update loop.


pkg/cmd/roachtest/disk_space.go, line 239 at r3 (raw file):

	duration := 20 * time.Minute

	testCases := []diskUsageTestCase{

The testcases could use some general commentary on what the test exercises. It's not quite clear to me yet (though the commit message does a bit of explanation).


Comments from Reviewable

@kaavee315 kaavee315 force-pushed the try_disk_usage branch 2 times, most recently from d5f9f79 to 00e454c Compare May 8, 2018 11:06
@bdarnell
Copy link
Contributor

bdarnell commented May 8, 2018

Review status: 2 of 6 files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.


pkg/cmd/fill_ballast/main.go, line 63 at r3 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I discussed the idea of a ballast file with @bdarnell and we'd be interested in embedding this in the cockroach binary itself, under ./cockroach debug ballast. Ideally this would also work on OSX, windows, and on Linux filesystems that don't support fallocate or where fallocate is not available. This isn't exactly the use case you have in mind here and I understand if you don't want to get sidetracked, but if you have the capacity, I think this can work:

./cockroach debug ballast --size=X somefile

where X is either of

  • 80%, .8: create file of size 80% of total disk.
  • 5g, 500mb, 129135kb, ...: create file of specified size.
  • -20%, -.2: allocate a file so large that 20% of the disk is left empty (corresponds to fillRatio)
  • -60gb, ...: allocate a file so large that 60gb are left empty (corresponds to diskLeftInBytes).

(the code to parse all of these is already used by, I think, the --store flag).

I assume users usually would use something like 5gb.

For the "portable" implementation, I'd just blindly try to use fallocate and if that doesn't work, fall back to explicitly creating the file and writing to it (DIRECT_IO, where available, could be used but I don't particularly care about these performance optimizations initially, though they'd be nice).

It would also be nice to use the fallocate(2) system call instead of the fallocate(1) binary. Our docker containers currently contain the fallocate(1) binary but we may move to a leaner base image that wouldn't have it.


Comments from Reviewable

@kaavee315 kaavee315 requested a review from a team May 9, 2018 13:55
@kaavee315 kaavee315 requested a review from a team as a code owner May 9, 2018 13:55
@kaavee315 kaavee315 requested a review from a team May 9, 2018 13:55
@kaavee315 kaavee315 force-pushed the try_disk_usage branch 2 times, most recently from 462e7c3 to ad7f598 Compare May 9, 2018 19:16
@kaavee315
Copy link
Contributor Author

kaavee315 commented May 9, 2018

@tschottdorf, I have updated the diff with incorporating the command in cockroach debug. The UT that failed is just due to a debug print, that I will remove. Can you take a look at the debug command changes?

@kaavee315 kaavee315 requested a review from a team May 15, 2018 19:51
@kaavee315 kaavee315 force-pushed the try_disk_usage branch 5 times, most recently from c3debda to 9e3c7c5 Compare May 16, 2018 12:20
@kaavee315
Copy link
Contributor Author

I am getting Error: failed to create google cloud client (You may need to setup the GCS application default credentials: 'gcloud auth application-default login --project=cockroach-shared'): dialing: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. when trying it on our bodega integration.


Comments from Reviewable

@tbg tbg mentioned this pull request May 17, 2018
@kaavee315 kaavee315 force-pushed the try_disk_usage branch 3 times, most recently from 5bea383 to 2ae351f Compare May 18, 2018 07:39
@kaavee315
Copy link
Contributor Author

kaavee315 commented May 18, 2018

@tschottdorf, I have amended on your comments. Please have another look.

@tbg
Copy link
Member

tbg commented May 21, 2018

:lgtm: mod the remaining comments, thanks!


Reviewed 19 of 21 files at r8, 10 of 19 files at r9, 3 of 7 files at r10, 3 of 5 files at r11.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


pkg/base/store_spec.go, line 103 at r11 (raw file):

					"store size (%s) must be between %d%% and %d%%",
					value,
					int(*percentRange.min),

Looks like this should be printed as %f (or the message can be misleading if percentRange.max=99.8


pkg/cli/cliflags/flags.go, line 547 at r6 (raw file):

Previously, kaavee315 (Karan Vaidya) wrote…

Just noticed that the -ve things aren't working like this, as it treats it as a shorthand flag instead. I tried some escaping, but nothing useful and might not be good for user experience too. Better keep it flag?

👍 Good point.


pkg/cmd/roachtest/disk_space.go, line 210 at r11 (raw file):

				// when RocksDB memtable gets flushed, we keep a buffer of extra 10 secs MVCC data as
				// breathing space for GC. We are checking the whole cluster disk space usage to avoid
				// discrepancies due to range rebalancing.

Here too you're probably getting lucky. When a range is rebalanced away, the disk space is usually reclaimed after at least 2 minutes, if at all.


pkg/cmd/roachtest/disk_space.go, line 216 at r11 (raw file):

					currentDiskUsages := clusterDiskUsage()
					// Assuming lower bound of 1ms for inserts and deletes, 2KB of data per row, and
					// replication factor of 3, we expect 60MB of data per node per 10 second(TTL).

This also doesn't account for write amplification, which can be substantial. Since you've measured the numbers experimentally I'd leave this for now, but mention it in the comment.


pkg/util/sysutil/large_file_non_linux.go, line 31 at r11 (raw file):

// ..., which can take a long time.


Comments from Reviewable

@kaavee315 kaavee315 force-pushed the try_disk_usage branch 4 times, most recently from e99a27c to 3d684b7 Compare May 21, 2018 16:16
@kaavee315 kaavee315 requested review from a team May 21, 2018 16:16
Add a test to check the behaviour of cockroach cluster when disk is near full.
This test currently runs a workload on the cluster. Then it fills the cluster
using a ballast file to a given fraction. The test does an insert and delete
in loop to check if GC can keep in sync

Release note: None
@kaavee315
Copy link
Contributor Author

Review status: 2 of 19 files reviewed at latest revision, 6 unresolved discussions.


pkg/base/store_spec.go, line 103 at r11 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Looks like this should be printed as %f (or the message can be misleading if percentRange.max=99.8

Done.


pkg/cmd/roachtest/disk_space.go, line 210 at r11 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Here too you're probably getting lucky. When a range is rebalanced away, the disk space is usually reclaimed after at least 2 minutes, if at all.

I think the reason is that the data being generated is lesser than calculcated, and the buffer over here is enough to take this into account. Adding the same into comment below.


pkg/cmd/roachtest/disk_space.go, line 216 at r11 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This also doesn't account for write amplification, which can be substantial. Since you've measured the numbers experimentally I'd leave this for now, but mention it in the comment.

Done.


pkg/util/sysutil/large_file_non_linux.go, line 31 at r11 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

// ..., which can take a long time.

Done.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 21, 2018

Thanks! Your teamcity build seems weirdly stuck but it has made it past test and testrace, so it's either a teamcity or unrelated code problem.

bors r+


Reviewed 18 of 20 files at r12, 19 of 21 files at r13.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

craig bot pushed a commit that referenced this pull request May 21, 2018
25051: Add disk fullness test r=tschottdorf a=kaavee315

Add a test to check the behaviour of cockroach cluster when disk is near full.
This test currently runs a workload on the cluster. Then it fills the cluster
using a ballast file to a given fraction. The test does an insert and delete
in loop to check if GC can keep in sync

Release note (cli change): add the cockroach debug ballast command. It can be used to create ballast file to fill the cockroach store directory

Co-authored-by: Karan Vaidya <kaavee315@gmail.com>
@craig
Copy link
Contributor

craig bot commented May 21, 2018

Build succeeded

@craig craig bot merged commit aba7d73 into cockroachdb:master May 21, 2018
@kaavee315 kaavee315 deleted the try_disk_usage branch May 22, 2018 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants