Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Enable Sublevel compactions, update PreIngestDelay and metrics #50371

Merged

Conversation

itsbilal
Copy link
Member

This change enables flush splits when writing to L0, by turning on
the relevant flags in pebble's Options structs. This should
deliver a noticeable improvement in read performance
with lots of L0 SSTables, as well as with import performance.

The calculation of PreIngestDelay has also been subsequently updated
to use the new L0 sublevel count as opposed to the count of files
in L0. Lastly, the reported read amplification metric is now the
L0 Sublevel count + non-L0 level count.

Second commit contains most of this change. The first commit is a simple
move of SSTableInfo and SSTableInfosByLevel to a new file.

Release note (performance improvement): Turn on an improved approach
of organizing sstable files in the Pebble storage engine, which
should significantly improve import preformance.

Given these structs are common across storage engines,
it made little sense to keep these in rocksdb.go.

Release note: None.
@itsbilal itsbilal self-assigned this Jun 17, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @petermattis)


pkg/storage/engine.go, line 506 at r2 (raw file):

	PendingCompactionBytesEstimate int64
	L0FileCount                    int64
	L0SublevelCount                int64

It is somewhat odd that L0SublevelCount is zero for RocksDB. I think it would be better to fill this in as -1 for RocksDB to indicate that the value is unknown.


pkg/storage/engine.go, line 739 at r2 (raw file):

func calculatePreIngestDelay(settings *cluster.Settings, stats *Stats) time.Duration {
	maxDelay := ingestDelayTime.Get(&settings.SV)
	l0Filelimit := ingestDelayL0Threshold.Get(&settings.SV)

Nit: s/l0Filelimit/l0ReadAmpLimit/g


pkg/storage/engine.go, line 742 at r2 (raw file):

	const ramp = 10
	l0FileCount := stats.L0FileCount

Nit: s/l0FileCount/l0ReadAmp/g because that is what this variable is tracking.


pkg/storage/pebble.go, line 317 at r2 (raw file):

	}
	opts.Experimental.L0SublevelCompactions = true
	opts.Experimental.FlushSplitBytes = 10 << 20  // 10 MB

Let's add a comment here explaining the motivation behind this number, even if that motivation has weak justification. This will prevent our future selves from assuming some meaning that possibly wasn't present.


pkg/storage/sst_info.go, line 165 at r2 (raw file):

//
// This definition comes from here (minus the sublevel handling, which is a
// Pebble-specific optimization):

This reminds me of #41546. Looping over all of the sstables to reconstitute the level structure Pebble already provides is unfortunate.


pkg/storage/sst_info.go, line 167 at r2 (raw file):

// Pebble-specific optimization):
// https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#level-style-compaction
func (s SSTableInfos) ReadAmplification(sublevels int) int {

Nit: s/sublevels/l0Sublevels/g


pkg/storage/sst_info.go, line 178 at r2 (raw file):

				}
				continue
			}

This structure took me a bit to understand. I think it would be clearer to have a separate step that calculates the sublevels for RocksDB. Assuming that we pass sublevels == -1 for RocksDB, this would look something like:

if sublevels == -1 {
  sublevels = 0
  for i := range s {
    if s[i].Level == 0 {
      sublevels++
    }
  }
}

readAmp := sublevels
seenLevel := make(map[int]bool)
for _, t := range s {
  if t.Level > 0 && !seenLevel[t.level] {
    readAmp++
    seenLevel[t.Level] = true
  }
}

pkg/storage/metamorphic/generator.go, line 89 at r2 (raw file):

	rng := rand.New(rand.NewSource(seed))
	opts.BytesPerSync = 1 << rngIntRange(rng, 8, 30)
	opts.Experimental.FlushSplitBytes = 1 << rngIntRange(rng, 8, 24)

This is slightly different than the Pebble metamorphic test. Any reason for the difference?

@itsbilal itsbilal force-pushed the l0-sublevels-preingestdelay-compactions branch from c92dba2 to 4778f63 Compare June 18, 2020 15:10
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @petermattis)


pkg/storage/engine.go, line 506 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It is somewhat odd that L0SublevelCount is zero for RocksDB. I think it would be better to fill this in as -1 for RocksDB to indicate that the value is unknown.

Done.


pkg/storage/pebble.go, line 317 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Let's add a comment here explaining the motivation behind this number, even if that motivation has weak justification. This will prevent our future selves from assuming some meaning that possibly wasn't present.

Done.


pkg/storage/sst_info.go, line 165 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This reminds me of #41546. Looping over all of the sstables to reconstitute the level structure Pebble already provides is unfortunate.

Yeah, the looping is pretty unfortunate.


pkg/storage/sst_info.go, line 178 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This structure took me a bit to understand. I think it would be clearer to have a separate step that calculates the sublevels for RocksDB. Assuming that we pass sublevels == -1 for RocksDB, this would look something like:

if sublevels == -1 {
  sublevels = 0
  for i := range s {
    if s[i].Level == 0 {
      sublevels++
    }
  }
}

readAmp := sublevels
seenLevel := make(map[int]bool)
for _, t := range s {
  if t.Level > 0 && !seenLevel[t.level] {
    readAmp++
    seenLevel[t.Level] = true
  }
}

Done.


pkg/storage/metamorphic/generator.go, line 89 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is slightly different than the Pebble metamorphic test. Any reason for the difference?

Nope, wasn't intentional. I copied the same thing here. Thanks for pointing that out.

@itsbilal itsbilal force-pushed the l0-sublevels-preingestdelay-compactions branch from 4778f63 to c06e1bf Compare June 18, 2020 15:11
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @jbowens)


pkg/storage/sst_info.go, line 179 at r3 (raw file):

	seenLevel := make(map[int]bool)
	for _, t := range s {
		if !seenLevel[t.Level] && t.Level > 0 {

This is the nittiest of nits: better to perform the t.Level > 0 check first as that is a quicker check than seenLevel[t.Level]. Alternately, perhaps initialize the map with seenLevel[0] = true.

@itsbilal itsbilal force-pushed the l0-sublevels-preingestdelay-compactions branch from c06e1bf to e389d02 Compare June 18, 2020 16:03
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @petermattis)


pkg/storage/sst_info.go, line 179 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is the nittiest of nits: better to perform the t.Level > 0 check first as that is a quicker check than seenLevel[t.Level]. Alternately, perhaps initialize the map with seenLevel[0] = true.

Done.

This change enables flush splits when writing to L0, by turning on
the relevant flags in pebble's Options structs. This should
deliver a noticeable improvement in read performance
with lots of L0 SSTables, as well as with import performance.

The calculation of PreIngestDelay has also been subsequently updated
to use the new L0 sublevel count as opposed to the count of files
in L0. Lastly, the reported read amplification metric is now the
L0 Sublevel count + non-L0 level count.

Release note (performance improvement): Turn on an improved approach
of organizing sstable files in the Pebble storage engine, which
should significantly improve import preformance.
@itsbilal itsbilal force-pushed the l0-sublevels-preingestdelay-compactions branch from e389d02 to 5d5fd3c Compare June 18, 2020 17:27
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens)

@itsbilal
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 18, 2020

Build succeeded

@craig craig bot merged commit 993ec11 into cockroachdb:master Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants