Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: GC system.rangelog #21260

Closed
bdarnell opened this issue Jan 5, 2018 · 6 comments
Closed

storage: GC system.rangelog #21260

bdarnell opened this issue Jan 5, 2018 · 6 comments
Assignees
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community

Comments

@bdarnell
Copy link
Contributor

bdarnell commented Jan 5, 2018

We currently allow the system.rangelog table to grow without bound, on the assumption that its rate of growth will be negligible, but this isn't true. On long-lived clusters of many nodes, the rangelog table can grow to a significant fraction of the total database size. We need some sort of garbage collection here (and we should also take the opportunity to evaluate the usefulness of this log and whether we could be writing less data to it).

@bdarnell bdarnell added the O-community Originated from the community label Jan 5, 2018
a-robinson added a commit to a-robinson/cockroach that referenced this issue Jan 8, 2018
This shrinks the size of the info field in rangelog entries.

gogoproto automatically adds the omitempty json tag for all proto3
fields (that aren't explicitly marked as nullable with a gogoproto tag),
but it can't do so if the user specifies a custom jsontag. I don't know
for sure why we initially added these, but now I can't remove them
without messing up backwards compatibility with the old json field
names, so just add the necessary omitempty annotations.

Helps with range log size as related to cockroachdb#21260.

Release note (sql change): Reduced size of entries stored in the
system.rangelog table by not storing empty JSON fields.
@a-robinson
Copy link
Contributor

@BramGruneir, is there any legacy reason that we need to keep printing the RangeLogEvent_Info protos the way that we do? I don't see any existing readers of it other than tests. It would break existing readers, but we could save a ton of space if we printed them in a more compact format.

I'm separately going to shrink down the allocator details, because I'm most familiar with what's safe to change there, but there'll still be a bunch of space used in the first chunk of the info column. For example, this:

"UpdatedDesc":{"range_id":32,"start_key":"uw==","end_key":"u4mAoBdEMtxE94Q=","replicas":[{"node_id":1,"store_id":1,"replica_id":1},{"node_id":5,"store_id":5,"replica_id":2},{"node_id":2,"store_id":2,"replica_id":3},{"node_id":6,"store_id":6,"replica_id":4}],"next_replica_id":5},"AddReplica":{"node_id":6,"store_id":6,"replica_id":4}

Could just as easily be printed using the more compact RangeDescriptor.String() method, making it:

"UpdatedDesc":"r32:/Table/51{-/1/-6910…} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n6,s6):4,
 next=5]","AddReplica":"(n6,s6):4"

a-robinson added a commit to a-robinson/cockroach that referenced this issue Jan 11, 2018
There's a whole bunch of stuff that we typically don't need to print.
Omit all such fields when we can.

Up-replicate details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}"

to:
"Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"}

Down-replicate details go from:
"Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}"

to:
"Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}"

Rebalance details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}"

To:
"Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"}

Touches cockroachdb#21260, but there's still more that can likely be shrunk down,
as described on the issue.

Release note (sql change): Reduce size of system.rangelog entries to
save disk space.
@a-robinson
Copy link
Contributor

I looked into the above questions and found that the RangeLog admin endpoint relies on being able to parse the aforementioned fields, so we'd have to only start writing the more compact versions after all nodes have updated to v2.0 or later, and we'd have to still support parsing the old version. So it'd be a bit of work, perhaps more than it's worth. There aren't any other dependencies in our code base, though, so the only question mark is whether anyone externally is parsing these entries, which seems unlikely

@BramGruneir
Copy link
Member

I think the best of way doing this would be with a proper migration. It would be great to consider normalizing the table a bit, or changing the string columns to json and adding an index or two.

For now, it might be a lot easier to just delete all tail entires after some limit to stop the table from getting too big. Even better would be to limit the number of entries per range, but without an index, i worry about how slow that transaction might be.

a-robinson added a commit to a-robinson/cockroach that referenced this issue Jan 23, 2018
There's a whole bunch of stuff that we typically don't need to print.
Omit all such fields when we can.

Up-replicate details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}"

to:
"Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"}

Down-replicate details go from:
"Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}"

to:
"Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}"

Rebalance details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}"

To:
"Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"}

Touches cockroachdb#21260, but there's still more that can likely be shrunk down,
as described on the issue.

Release note (sql change): Reduce size of system.rangelog entries to
save disk space.
a-robinson added a commit to a-robinson/cockroach that referenced this issue Jan 23, 2018
There's a whole bunch of stuff that we typically don't need to print.
Omit all such fields when we can.

Up-replicate details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:10, logicalBytes:1.1 KiB, writesPerSecond:1.08, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":204200,\"RangeWritesPerSecond\":321.0712073249542}"

to:
"Details":"{\"Target\":\"s4, converges:0, balance:1, rangeCount:10\"}"}

Down-replicate details go from:
"Details":"{\"Target\":\"s1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:257 MiB, writesPerSecond:1205.47, details:(diversity=0.00, preferred=0)\",\"RangeBytes\":33938441,\"RangeWritesPerSecond\":150.7858082120157}"

to:
"Details":"{\"Target\":\"s2, converges:1, balance:0, rangeCount:20\"}"

Rebalance details go from:
"Details":"{\"Target\":\"s6, valid:true, constraint:0.00, converges:1, balance:1.00(ranges=1, bytes=0.00, writes=0.00), rangeCount:12, logicalBytes:23 MiB, writesPerSecond:254.01, details:(diversity=0.00, preferred=0)\",\"Existing\":\"[\\ns5, valid:true, constraint:0.00, converges:0, balance:0.00(ranges=0, bytes=0.00, writes=0.00), rangeCount:15, logicalBytes:256 MiB, writesPerSecond:1207.65, details:(diversity=0.00, preferred=0)\\ns1, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1458.49, details:(diversity=0.00, preferred=0)\\ns2, valid:true, constraint:0.00, converges:0, balance:-1.00(ranges=-1, bytes=0.00, writes=0.00), rangeCount:16, logicalBytes:280 MiB, writesPerSecond:1459.74, details:(diversity=0.00, preferred=0)]\",\"RangeBytes\":33629514,\"RangeWritesPerSecond\":150.8107606862368}"

To:
"Details":"{\"Target\":\"s2, converges:1, balance:1, rangeCount:14\",\"Existing\":\"[\\ns4, converges:1, balance:1, rangeCount:13\\ns3, converges:1, balance:1, rangeCount:13\\ns1, converges:0, balance:-1, rangeCount:20]\"}"}

Touches cockroachdb#21260, but there's still more that can likely be shrunk down,
as described on the issue.

Release note (sql change): Reduce size of system.rangelog entries to
save disk space.
@a-robinson a-robinson added this to the 2.1 milestone Feb 26, 2018
@bdarnell bdarnell added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 26, 2018
@kaavee315
Copy link
Contributor

Is someone working on it actively? If not, we were planning to take this up. I have some ideas and questions regarding the same:-

  1. I see the various implementations of replicaQueues like GCQueue, raftLogQueue etc. which scan all the replicas and process the required parts. We can implement this as a similar rangeLog queue as part of the store. I think the issue over here is that this framework is at KV layer level and not at SQL level, so reading the data and finding what to be deleted might be a trouble.
  2. Is there a similar framework of processing data periodically at SQL layer level? I couldn't find any.
  3. What should be the criteria to decide if we want to delete a particular entry? Should it just be a limit per rangeId?

@bdarnell
Copy link
Contributor Author

No one is currently working on this.

I don't think the replica queues are a great fit for this because they work at the wrong level. I think we'd need something new at the SQL level to handle the GC of this data. Ideally it would be flexible enough to handle TTLs for any table instead of something specific to system.rangelog. This is a commonly-requested feature (#20239).

What should be the criteria to decide if we want to delete a particular entry? Should it just be a limit per rangeId?

Ranges are an implementation detail and should not be used in deciding when to GC a row. Either make it based on the timestamp in the row or the total number of rows in the table. (I'd generally prefer to make it time-based).

@tbg tbg added the A-kv-client Relating to the KV client and the KV interface. label May 29, 2018
@tbg
Copy link
Member

tbg commented May 29, 2018

I don't really see this in the CF&S area until we add row-level TTLs (#20239). The more likely option right now is some periodic job that deletes from the table (via SQL). @jordanlewis feel free to move this elsewhere. I'm leaving in CF&S mostly because I don't know where to put it.

@tbg tbg added A-sql-execution Relating to SQL execution. and removed A-kv-client Relating to the KV client and the KV interface. labels Jul 19, 2018
@petermattis petermattis modified the milestones: 2.1, 2.2 Sep 25, 2018
mvijaykarthik pushed a commit to mvijaykarthik/cockroach that referenced this issue Oct 3, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, GC of system.rangelog is disabled.

Fixes cockroachdb#21260

Release note: Add configuration to enable GC of system.rangelog
mvijaykarthik pushed a commit to mvijaykarthik/cockroach that referenced this issue Oct 4, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days.

Fixes cockroachdb#21260

Release note: Add configuration to enable GC of system.rangelog
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
mvijaykarthik pushed a commit to mvijaykarthik/cockroach that referenced this issue Oct 9, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days.

Fixes cockroachdb#21260

Release note: Add configuration to enable GC of system.rangelog
mvijaykarthik pushed a commit to mvijaykarthik/cockroach that referenced this issue Oct 10, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days.

Fixes cockroachdb#21260

Release note: Add configuration to enable GC of system.rangelog
tbg pushed a commit to mvijaykarthik/cockroach that referenced this issue Oct 11, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days, and that
for system.eventlog to 90 days.

Fixes cockroachdb#21260.

Release note (sql change): the range log and system events logs will
automatically purge records older than 30 and 90 days, respectively.
This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl
cluster settings.
tbg added a commit to mvijaykarthik/cockroach that referenced this issue Oct 11, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days, and that
for system.eventlog to 90 days.

Fixes cockroachdb#21260.

Release note (sql change): the range log and system events logs will
automatically purge records older than 30 and 90 days, respectively.
This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl
cluster settings.
craig bot pushed a commit that referenced this issue Oct 11, 2018
30913: server: add a configuration to enable GC of system.rangelog r=tschottdorf a=mvijaykarthik

system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days.

Fixes #21260

Release note: Add configuration to enable GC of system.rangelog

31239: sql: attempt to deflake distsql physical planner tests r=tschottdorf a=jordanlewis

Make sure the range cache is populated before verifying things about it.

This seems like a hack, but otherwise I think these will just keep flaking.

Closes #25808.
Closes #31235.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@craig craig bot closed this as completed in #30913 Oct 11, 2018
tbg added a commit to tbg/cockroach that referenced this issue Oct 17, 2018
system.rangelog table currently grows unboundedly. The rate of
growth is slow (as long as there is no replica rebalancing
thrashing), but it can still become a problem in long running
clusters.

This commit adds cluster settings to specify interval and TTL
for rows in system.rangelog.
By default, TTL of system.rangelog is set to 30 days, and that
for system.eventlog to 90 days.

Fixes cockroachdb#21260.

Release note (sql change): the range log and system events logs will
automatically purge records older than 30 and 90 days, respectively.
This can be adjusted via the server.rangelog.ttl and server.eventlog.ttl
cluster settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
None yet
Development

No branches or pull requests

7 participants