Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always capture and surface failed executions #94381

Closed
kevin-v-ngo opened this issue Dec 28, 2022 · 1 comment · Fixed by #97039
Closed

Always capture and surface failed executions #94381

kevin-v-ngo opened this issue Dec 28, 2022 · 1 comment · Fixed by #97039
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@kevin-v-ngo
Copy link

kevin-v-ngo commented Dec 28, 2022

This issue tracks improving the V1 experience for failed executions. Any failed execution is an insight in and of itself. We should capture all failed executions (regardless of whether they were ‘slow’) to arm users with observability information to reduce these errors.

Related issues:

Open question: Should we create a separate table for all failed executions? FYI @gtr

Jira issue: CRDB-22865

@kevin-v-ngo kevin-v-ngo added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 28, 2022
@maryliag
Copy link
Contributor

maryliag commented Jan 9, 2023

cc @dongniwang for design feedback if we should change how we display it

@kevin-v-ngo kevin-v-ngo assigned gtr and unassigned dongniwang Jan 23, 2023
gtr added a commit to gtr/cockroach that referenced this issue Jan 30, 2023
crdb_internal.execution_insights

Part of: cockroachdb#87785, cockroachdb#94381.

Previously, the insights subsystem did not keep track of error code and
error messages for failed executions and only contained a
`FailedExecution` value for the `problem` field. This commit adds
the `last_error_code` and `last_error_msg` columns to the
`crdb_internal.cluster_execution_insights` virtual table. The next PR
will focus on displaying these values in the UI.

Release note (sql change): Added `last_error_code` and `last_error_msg`
columns to the `crdb_internal.cluster_execution_insights` virtual table
which contain the error code and error message for a failed statement
execution, respectively.
gtr added a commit to gtr/cockroach that referenced this issue Jan 31, 2023
crdb_internal.execution_insights

Part of: cockroachdb#87785, cockroachdb#94381.

Previously, the insights subsystem did not keep track of error code and
error messages for failed executions and only contained a
`FailedExecution` value for the `problem` field. This commit adds
the `last_error_code` and `last_error_msg` columns to the
`crdb_internal.cluster_execution_insights` virtual table. The next PR
will focus on displaying these values in the UI.

Release note (sql change): Added `last_error_code` and `last_error_msg`
columns to the `crdb_internal.cluster_execution_insights` virtual table
which contain the error code and error message for a failed statement
execution, respectively.
gtr added a commit to gtr/cockroach that referenced this issue Feb 2, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions. Together with the
`last_error` column, this gives a view of failed executions statements.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table.
gtr added a commit to gtr/cockroach that referenced this issue Feb 6, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions. Together with the
`last_error` column, this gives a view of failed executions statements.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table.
gtr added a commit to gtr/cockroach that referenced this issue Feb 7, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table. Adds `last_error_code`
field to the `statistics` JSON blob in the
`crdb_internal.statement_statistics` and `system.statement_statistics`
tables.
gtr added a commit to gtr/cockroach that referenced this issue Feb 7, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table. Adds `last_error_code`
field to the `statistics` JSON blob in the
`crdb_internal.statement_statistics` and `system.statement_statistics`
tables.
gtr added a commit to gtr/cockroach that referenced this issue Feb 8, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table. Adds `last_error_code`
field to the `statistics` JSON blob in the
`crdb_internal.statement_statistics` and `system.statement_statistics`
tables.
gtr added a commit to gtr/cockroach that referenced this issue Feb 8, 2023
Part of: cockroachdb#94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions.

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statement_statistics` table. Adds `last_error_code`
field to the `statistics` JSON blob in the
`crdb_internal.statement_statistics` and `system.statement_statistics`
tables.
craig bot pushed a commit that referenced this issue Feb 9, 2023
96436: sql: add last error code to statements statistics tables r=gtr a=gtr

Part of: #94381.

This commit adds the `last_error_code` column to the
`crdb_internal.node_statements_statistics` table, which keeps track of
the error code for failed statement executions.

Tests in `pkg/sql/logictest/testdata/logic_test/statement_statistics_errors`.

Querying from systems table:
```sql
SELECT * FROM (
    SELECT
        encode(fingerprint_id, 'hex') AS fingerprint_id,
        metadata ->> 'querySummary' as querySummary,
        statistics -> 'statistics' ->> 'lastError' as lastError,
        statistics -> 'statistics' ->> 'errorCode' as errorCode,
        statistics -> 'statistics' ->> 'lastExecAt' as lastExecAt
    FROM system.statement_statistics
    ORDER BY statistics -> 'statistics' ->> 'lastExecAt'
) WHERE querySummary = 'SELECT _ / _';
```
Result:
```
  fingerprint_id  | querysummary |    lasterror     | errorcode |         lastexecat
------------------+--------------+------------------+-----------+------------------------------
 e030b633313c1280 | SELECT _ / _ | division by zero | 22012     | 2023-02-06T00:23:27.807099Z
```

Querying from cluster table:
```sql
SELECT * FROM (
    SELECT
        encode(fingerprint_id, 'hex') AS fingerprint_id,
        metadata ->> 'querySummary' as querySummary,
        statistics -> 'statistics' ->> 'lastError' as lastError,
        statistics -> 'statistics' ->> 'errorCode' as errorCode,
        statistics -> 'statistics' ->> 'lastExecAt' as lastExecAt
    FROM crdb_internal.statement_statistics
    ORDER BY statistics -> 'statistics' ->> 'lastExecAt'
) WHERE lastError != 'NULL' AND lastError != '';
```
Result:
```
  fingerprint_id  | querysummary |    lasterror     | errorcode |         lastexecat
------------------+--------------+------------------+-----------+------------------------------
 e030b633313c1280 | SELECT _ / _ | division by zero | 22012     | 2023-02-06T00:23:27.807099Z
```

Release note (sql change): Adds `last_error_code` column to the
`crdb_internal.node_statements_statistics` table. Adds `last_error_code`
field to the `statistics` JSON blob in the
`crdb_internal.statement_statistics` and `system.statement_statistics`
tables.

Co-authored-by: gtr <gerardo@cockroachlabs.com>
gtr added a commit to gtr/cockroach that referenced this issue Feb 9, 2023
Part of: cockroachdb#87785, cockroachdb#94381.

Previously, the insights subsystem did not keep track of `error code` for
failed executions and only contailed a `FailedExecution` value for the
problem field. This commit adds the `error_code` column to the
`crdb_internal.{cluster/node}_execution_insights` virtual tables.

Release note (sql change): Adds `error_code` column to the
`crdb_internal.{cluster/node}_execution_insights` virtual tables, which
contains the error code for a failed execution.
gtr added a commit to gtr/cockroach that referenced this issue Feb 10, 2023
Fixes: cockroachdb#94381.

Previously, the insights subsystem did not keep track of `error code` for
failed executions and only contailed a `FailedExecution` value for the
problem field. This commit adds the `error_code` column to the
`crdb_internal.{cluster/node}_execution_insights` virtual tables. This
commit also bubbles up that error code at the transaction level for
writing into the `crdb_internal.{cluster/node}_txn_execution_insights`
virtual table.

Release note (sql change): Adds `error_code` column to the
`crdb_internal.{cluster/node}_execution_insights` virtual tables, which
contains the error code for a failed execution. Adds `last_error_code`
column to the `crdb_internal.{cluster/node}_txn_execution_insights`
virtual tables, which is the error code of the last failed statement in
that transaction.
gtr added a commit to gtr/cockroach that referenced this issue Feb 13, 2023
Fixes: cockroachdb#94381.

Previously, the insights subsystem only kept track of failed executions
that were also slow executions. This commit enables the insights
subsytem to keep track of all failed executions, regardless if they were
slow or not. At the stmt level, the `error_code` field keeps track of
the error code if the stmt failed, and at the txn level, the
`last_error_code` field keeps track of the error code for the last
failed stmt in that txn.

Release note (sql change): The insights subsystem in `sqlstats` is able
to detect failed executions, regardless if they were slow or not. The
fields `error_code` and `last_error_code` were added to the `Statement`
and `Transaction` structs, respectively.
craig bot pushed a commit that referenced this issue Feb 13, 2023
96669: storage: set default of storage.value_blocks.enabled to true r=nicktrav a=sumeerbhola

This enables the use of sstable format TableFormatPebblev3 which improves read performance with MVCC keys with multiple versions.

Informs #1170

Epic: CRDB-20378

Release note: None

97039: insights: detect failed executions at the stmt and txn levels r=gtr a=gtr

Fixes: #94381.

Previously, the insights subsystem only kept track of failed executions
that were also slow executions. This commit enables the insights
subsystem to keep track of all failed executions, regardless if they were
slow or not.

Release note (sql change): The insights subsystem in `sqlstats` is able
to detect failed executions, regardless if they were slow or not.

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: gtr <gerardo@cockroachlabs.com>
@craig craig bot closed this as completed in 2d5449d Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants