-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig,kv: merge adjacent ranges with identical configs #79700
spanconfig,kv: merge adjacent ranges with identical configs #79700
Conversation
8d56409
to
0b02b96
Compare
This should be ready for review. There are some random out-of-package tests that need to be updated since they expect splits post-table creation (they were added after we introduced tenant range splits), and I also want to write a small benchmark for the read path. Like discussed offline, plan for this PR is to let it bake on master for a few days before backporting to 22.1, aiming for the .1 release. I also think this PR makes #78299 redundant (LMK if you disagree) -- so I'm just going to close that as well. |
6209b5f
to
dcce1db
Compare
bda1c2c
to
62ad12e
Compare
Done and done. |
62ad12e
to
28fde02
Compare
Would it be possible to backport this to 22.1? |
Yup.
For single-tenant clusters it won't be enabled by default; there'll be an opt-in cluster setting. |
Awesome! Thanks. |
@ajwerner, @nvanbenschoten and @arulajmani: Bump for reviews. Like discussed elsewhere, this PR should get baking time to build confidence. We're also aiming for this to be part of 22.1.1 when rolling out to multi-tenant clusters. |
Build succeeded: |
@irfansharif Thank you for your time and attention on this matter. I am chiming in just to confirm the timeline here as a quick sanity check. Again, I know you just said that this would be backported to v22.1.1 and you just said:
However just re-confirming, apologies for the redundancy. My understanding is that this fix, in the form of a new hidden config option, will be released as part of v22.1.1 that is scheduled for public release June 6, 2022. Please confirm or let me know if I am misunderstanding. Thank you |
The backport (#80811) was merged, so it'll make it to 22.1.1 whenever that goes out. That said, I don't know which customer you're referring to (DM?) and whether this would help. This PR wasn't intended as a fix for anything, so let's discuss elsewhere what we're expecting this to help with. |
and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation has three components. The first is spanstats.Accessor, which provides an interface for accessing span stats from KV. The second is spanstatsaccessor.LocalAccessor, which provides an implementation of the interface for callers that are co-located on a KV node. The third is an implementation by kvtenantccl.Connector, which provides an implementation of the interface for non co-located callers, like SQL pods. Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'.
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation has three components. The first is spanstats.Accessor, which provides an interface for accessing span stats from KV. The second is spanstatsaccessor.LocalAccessor, which provides an implementation of the interface for callers that are co-located on a KV node. The third is an implementation by kvtenantccl.Connector, which provides an implementation of the interface for non co-located callers, like SQL pods. The interaction between these components is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | serverpb.InternalSpanStatsServer +-----+ | | | | | | +----------------+-------------------+ | | | | | roachpb.InternalSpanStatsResponse | | | | | | | | v | | +----------------------------------+ | | | | | | | spanstatsaccessor.LocalAccessor | | | | | | | +-----------------+----------------+ | | | | | +---------------------+ | | roachpb.InternalSpanStatsResponse | | | | | | | v v | | +-------------------------+ +-----------+ | | | | | | | | | serverpb.StatusServer | | SQLServer | | | | | | | | | +------------+------------+ +-----------+ | | | | | | | +----------------------+-----------------------------+ | | serverpb.SpanStatsResponse | | | | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | | | +--------------------+ | | roachpb.InternalSpanStatsResponse| | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | serverpb.StatusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'.
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'. refactor away from accessor interface remove print
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'. refactor away from accessor interface remove print
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'.
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of "0", instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note: None
96223: kvserver, server: new implementation of SpanStats suitable for use with coalesced ranges r=zachlite a=zachlite This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since GH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: ``` System Tenant ┌────────────────────────────────────────────────────┐ │ │ │ │ │ KV Node fan-out │ │ ┌──────────┐ │ │ │ │ │ │ ┌───────────────────────────────▼────┐ │ │ │ │ │ │ │ │ │ server.systemStatusServer ├─────┘ │ │ │ │ │ │ └────────────────┬───────────────────┘ │ │ │ │ │ │ │ │ │ via TenantStatusServer │ │ ├──────────────┐ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ ┌───────────┐ │ │ │ │ │ │ │ │ │ SQLServer │ │ │ │ │ │ │ │ │ └───────────┘ │ │ │ │ └──────────────────────▼─────────────────────────────┘ │ │ │ │ roachpb.SpanStatsResponse │ │ Secondary Tenant │ ┌──────────────────────┼──────────────────────────────┐ │ │ │ │ ┌────────────▼─────────────┐ │ │ │ │ │ │ │ │ kvtenantccl.Connector │ │ │ │ │ │ │ └────────────┬─────────────┘ │ │ │ │ │ │ via TenantStatusServer │ │ ├────────────────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌─────────────────▼───────┐ ┌──────▼──────┐ │ │ │ │ │ │ │ │ │ server.statusServer │ │ SQLServer │ │ │ │ │ │ │ │ │ └─────────────────────────┘ └─────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────┘ ``` Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note: None 97082: ui: add latency info to stmt pages r=maryliag a=maryliag Part Of: #72954 <img width="1373" alt="Screenshot 2023-02-13 at 5 53 57 PM" src="https://user-images.githubusercontent.com/1017486/218592825-922eed84-2559-4902-b291-852958a59ed6.png"> Add p50, p90, p99, max and min latency to Statement table on SQL Activity page. Release note (ui change): Add columns p50, p90, p99, max and min latency for Statement table on SQL Activity page. Co-authored-by: Zach Lite <zach@cockroachlabs.com> Co-authored-by: maryliag <marylia@cockroachlabs.com>
This is a stop-gap commit that enables the DataDistribution endpoint to handle the parts of the key space belonging to secondary tenants without error. Despite no error, the result returned for secondary tenants is not correct. The DataDistribution endpoint was written before cockroachdb#79700, and therefore doesn't know that multiple tables can exist within a range. Additionally, the results for the system tenant will be incorrect soon because cockroachdb#81008 is in progress. Fixes: cockroachdb#97993 Release note: None
98689: workload: jitter the teardown of connections to prevent thundering herd r=sean- a=sean- workload: jitter the teardown of connections to prevent thundering herd This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition. workload: add flags to manage the age and lifecycle of connection pool Add flags to all workload types to specify: * the max connection age: `--max-conn-lifetime duration` * the max connection age jitter: `--max-conn-lifetime-jitter duration` * the max connection idle time: `--max-conn-idle-time duration` * the connection health check interval: `--conn-healthcheck-period duration` * the min number of connections in the pool: `--min-conns int` workload: add support for remaining pgx query modes Add support for `pgx.QueryExecModeCacheDescribe` and `pgx.QueryExecModeDescribeExec`. Previously, only three of the five query modes were available. workload: fix race condition when recording histogram data Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results. Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See `--conn-healthcheck-period`, `--min-conns`, and the `--max-conn-*` flags for details. Release note (cli change): workload now supports every [PostgreSQL query mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182) available via the underlying pgx driver. 99142: server: fix `DataDistribution` server error when creating a tenant r=zachlite a=zachlite This is a stop-gap commit that enables the DataDistribution endpoint to handle the parts of the key space belonging to secondary tenants without error. Despite no error, the result returned for secondary tenants is not correct. The DataDistribution endpoint was written before #79700, and therefore doesn't know that multiple tables can exist within a range. Additionally, the results for the system tenant will be incorrect soon because #81008 is in progress. Improvements are tracked by #97942 Fixes: #97993 Release note: None 99494: changefeedccl: Do not require rangefeed when running initial scan only. r=miretskiy a=miretskiy Fixes #99470 Release note: None Co-authored-by: Sean Chittenden <sean@chittenden.org> Co-authored-by: zachlite <zachlite@gmail.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
This is a stop-gap commit that enables the DataDistribution endpoint to handle the parts of the key space belonging to secondary tenants without error. Despite no error, the result returned for secondary tenants is not correct. The DataDistribution endpoint was written before #79700, and therefore doesn't know that multiple tables can exist within a range. Additionally, the results for the system tenant will be incorrect soon because #81008 is in progress. Fixes: #97993 Release note: None
spanconfig,kv: merge adjacent ranges with identical configs
Fixes #72389.
Fixes #66063 (gated behind a cluster setting).
This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:
This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:
(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
state when updating entries. On every update, we'd check if the span
we're writing to has the same config as the preceding and/or
succeeding one, and if so, write out a larger span instead. We
previously prototyped a form of this in #68491.
(b) Same as (a) but coalesce configs up in the spanconfig.Store
maintained in reconciler itself.
(c) Same as (a) but through another API on the spanconfig.Store
interface that accepts only a single update at a time and does not
generate deltas (not something we need down in KV). Removes the
implementation complexity.
(d) Keep the contents of
system.span_configurations
and the in-memorystate of spanconfig.Stores as it is today, uncoalesced. When
determining split points, iterate through adjacent configs within
the provided key bounds and see whether we could ignore certain
split keys.
This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):
It's feasible that in the future we re-work this in favor of (c).
spanconfig/store: use templatized btree impl instead
spanconfig/store: intern span configs
Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).