Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlstats: include regions in statement_statistics #95449

Merged
merged 1 commit into from
Mar 6, 2023
Merged

sqlstats: include regions in statement_statistics #95449

merged 1 commit into from
Mar 6, 2023

Conversation

matthewtodd
Copy link
Contributor

Part of #89949.

This completes the work abandoned in #92259, properly storing the regions in which a statement was executed, rather than waiting until UI render time to try to map (dangerously ephemeral) node IDs to their respective regions.

With this work in place, we will be able to remove both the UI mapping code (usages of this selector) and the stop-gap work of #93268.

Release note (sql change): A regions field was added to the statistics column of crdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@matthewtodd
Copy link
Contributor Author

Test failure looks like #96149, possibly related to the unified SQL address resolver?

Part of #89949.

This completes the work abandoned in #92259, properly storing the
regions in which a statement was executed, rather than waiting until UI
render time to try to map (dangerously ephemeral) node IDs to their
respective regions.

With this work in place, we will be able to remove both the UI mapping
code (usages of this [selector][]) and the stop-gap work of #93268.

[selector]: https://github.com/cockroachdb/cockroach/blob/0f6333cbd6c78264a59dc435324c0c33d75ea148/pkg/ui/workspaces/cluster-ui/src/store/nodes/nodes.selectors.ts#L52-L64

Release note (sql change): A regions field was added to the statistics
column of crdb_internal.statement_statistics, reporting the regions of
the nodes on which the statement was executed.
@matthewtodd matthewtodd marked this pull request as ready for review March 4, 2023 00:16
@matthewtodd matthewtodd requested review from a team March 4, 2023 00:16
@matthewtodd matthewtodd requested review from a team as code owners March 4, 2023 00:16
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small request, otherwise :lgtm:

Reviewed 19 of 19 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)


-- commits line 12 at r1:
can you create an issue to track this?

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)

Copy link
Contributor Author

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag and @yuzefovich)


-- commits line 12 at r1:

Previously, maryliag (Marylia Gutierrez) wrote…

can you create an issue to track this?

Done! #98056 and #98057. I'll tackle the first one early this week.

@matthewtodd
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 6, 2023

Build succeeded:

@craig craig bot merged commit 353006b into cockroachdb:master Mar 6, 2023
@matthewtodd matthewtodd deleted the sqlstats-regions branch March 6, 2023 16:39
craig bot pushed a commit that referenced this pull request Mar 14, 2023
97138: ui: add error code to stmt and txn insights details pages r=gtr a=gtr

Part of: #87785.

Previously, the stmt and txn insights details pages did not show any
further information for failed executions. This commit adds an "error
code" column to the insights table for a failed execution in the stmt
and txn insights details pages. Additionally, a "status" column was
added to the stmt and txn workload insights tables which is either
"Completed" or "Failed".

Future work involves adding the error message string in addition to the
error code but it needs to be redacted first. Additionally, the txn
status is missing the implementation of a "Cancelled" status.

Note to reviewers: only consider the second commit, as the first is 
required to get the txn status.

- Loom [demo](https://www.loom.com/share/e82b97ff9f034d82b98640170eb54408).

Release note (ui change): Adds error code column to the insights table
for a failed execution in the stmt and txn insights details page. Adds
status column to the stmt and txn workload insights tables.

98410: cluster-ui: tenants use sqlstats-supplied regions r=matthewtodd a=matthewtodd

Fixes #98056.

As of #95449, the SQL Activity pages in the DB Console can draw regions information directly from the sqlstats tables, rather than having to translate node IDs to regions on page load.
    
Here, we make that switch, but for non-system tenants only, because:
    
1. The ephemeral nature of serverless nodes made this view-time mapping especially problematic in that context. (See further notes in #95449.)
    
2. The system-tenant views also include KV node IDs in a special Regions/Nodes column, which we are unable to recreate given the backend storage structure. (Future design work might suggest removing these node IDs altogether, for a unified UI.)

# Screenshots!
## Statements, with and without regions filter
<img width="1372" alt="statements" src="https://user-images.githubusercontent.com/5261/225033247-739df90a-9173-4aab-a666-a61a1ceeb579.png">
<img width="1372" alt="statements - filtered" src="https://user-images.githubusercontent.com/5261/225033271-1c0d0f82-3dd4-48ea-bdef-11f19af97a85.png">

## Statement details
<img width="1372" alt="statement details" src="https://user-images.githubusercontent.com/5261/225033338-6dff4a6e-a4a3-48c6-863a-84f1375b0a61.png">

## Transactions, with and without regions filter
<img width="1372" alt="transactions" src="https://user-images.githubusercontent.com/5261/225033366-65f44e95-3549-47cc-b0f2-67ad48a1a1fa.png">
<img width="1372" alt="transactions - filtered" src="https://user-images.githubusercontent.com/5261/225033391-50b9a2dc-e9a1-457b-84b1-837426eba35e.png">

## Transaction details
<img width="1372" alt="transaction details" src="https://user-images.githubusercontent.com/5261/225033505-3fdeceef-35dc-4e06-af25-ab4d0c53518f.png">

Release note: None

98464: jobs,upgrades: add migration to backfill job_info table r=dt a=adityamaru

This change adds a migration and corresponding cluster version
after which every job entry in the system.jobs table will have its
Payload and Progress written to two rows in the system.job_info table.

Informs: #97762

Release note: None

98510: backupccl: update restore/nodeshutdown tests to use new roachtest framework r=adityamaru a=msbutler

The restore/nodeshutdown tests have been using a very old workload that will not be restorable when #93804 lands. This patch changes the restore/nodeshutdown workload to a 80GB tpce restore and moves the tests to run on aws instead of gcp.

Release note: None

Epic: None

98579: upgrade/upgrades: skip TestUpgradeSchemaChangerElements r=smg260 a=smg260

Refs: #98062

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
Epic: None

Co-authored-by: gtr <gerardo@cockroachlabs.com>
Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Co-authored-by: adityamaru <adityamaru@gmail.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Miral Gadani <miral@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Mar 14, 2023
96991: roachtest: update mixed-version backup to use new framework r=srosenberg a=renatolabs


This updates the `backup/mixed-version` roachtest to use the recently
introduced mixed-version roachtest framework (`mixedversion` package).

The main behavior exercised remains the same: backups are taken in
mixed-binary state, and those backups are restored and verified at the
end of the test. However, this commit also improves the coverage of
mixed-version backup testing in a few ways:

* **Randomization**. By virtue of using the new framework, most runs
will be different from one another since the order of actions taken by
the test will be different. Previously, backups would always be taken
with 2 nodes in the old version and 2 nodes in the new version. Now,
backups can be taken when an arbitrary number of nodes is running the
new version. As a consequence, it's also possible that some executions
will attempt backups when all nodes are running a new binary version,
but the cluster version itself has not been updated. Other points of
new randomization include the choice of the node's external dir where
backups are stored, which node to connect to when running certain
statements, and how much to wait between backups.

* **Backup Options**. Backups will randomly be created with
`revision_history` enabled, or with an `encryption_passphrase`.

* **Downgrades**. The cluster is also downgraded in mixed-version
tests. No downgrades happened in that test before this commit.

* **Workload**. Instead of using fixed call to `generate_series` to
generate data between backups, the test now runs the `bank` workload
continuously during the test. A random wait between backups allows the
workload to make changes to the underlying table during the test and
for the backups to be taken while writes are taking place.

* **Finalization**: the test _may_ attempt to create a backup as the
upgrade is finalizing (i.e., migrations are running and cluster
version is advancing).

In addition, this test will also see improved coverage as we make more
improvements to test plans generated by the `mixedversion` package.
These changes will create more backup scenarios in the future without
requiring any code changes to this test.

This test has already helped us uncover one backup bug (#97953).

Epic: CRDB-19321

Release note: None

98398: statusccl: stop serving /_status/nodes to tenants r=matthewtodd a=matthewtodd

Fixes #98057.

This reverts the work of #93268, which is no longer necessary now that we are eagerly capturing region information at execution time in #95449.

Release note: None

Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants