Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: after inject or restore of partial statistics, cannot create merged statistics #94101

Closed
Tracked by #125950
faizaanmadhani opened this issue Dec 21, 2022 · 3 comments · Fixed by #127252
Closed
Tracked by #125950
Assignees
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@faizaanmadhani
Copy link
Contributor

faizaanmadhani commented Dec 21, 2022

Now that users can collect partial statistics at the extremes of columns using CREATE STATISTICS ON <col_name> FROM <table_name> USING EXTREMES, ensure that statements bundles include both full statistics and partial statistics.

The other thing we need to be sure of is that merged statistics are correctly created when injecting statment bundle stats so that we can mimic customer behavior.

Jira issue: CRDB-22695

Epic CRDB-25383

@faizaanmadhani faizaanmadhani added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 21, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 21, 2022
@michae2 michae2 added the A-sql-table-stats Table statistics (and their automatic refresh). label Jan 11, 2023
@michae2
Copy link
Collaborator

michae2 commented Jan 13, 2023

Good news is that we are including partial statistics in statement bundles, so the first part is already done.

Bad news is that the second part still needs work. After ALTER TABLE INJECT STATISTICS (or a restore) we cannot produce the original merged statistics because the statisticID / fullStatisticID relationship is not set up correctly. Here's a demonstration:

CREATE TABLE uw (name TEXT PRIMARY KEY, location GEOGRAPHY(POINT)) WITH (sql_stats_automatic_collection_enabled = false);
INSERT INTO uw VALUES ('Washington', 'POINT(-122.30 47.66)'), ('Waterloo', 'POINT(-80.54 43.47)'), ('Wisconsin', 'POINT(-89.42 43.08)');
CREATE STATISTICS uwfull FROM uw;
INSERT INTO uw VALUES ('Warsaw', 'POINT(21.02 52.24)'), ('Winnipeg', 'POINT(-97.15 49.89)'), ('Wyoming', 'POINT(-105.57 41.31)');
CREATE STATISTICS uwpartial ON name FROM uw USING EXTREMES;
SHOW STATISTICS FOR TABLE uw WITH MERGE;
EXPLAIN SELECT * FROM uw;
EXPLAIN ANALYZE (DEBUG) SELECT * FROM uw;

-- Then download and unzip the bundle, and run `cockroach debug statement-bundle recreate`. In that session:

SHOW STATISTICS FOR TABLE uw WITH MERGE;
EXPLAIN SELECT * FROM uw;

Before recreating the bundle this looks like:

demo@127.0.0.1:26257/defaultdb> SHOW STATISTICS FOR TABLE uw WITH MERGE;
  statistics_name | column_names |          created           | row_count | distinct_count | null_count | avg_size |                                  partial_predicate                                  |    histogram_id    | full_histogram_id
------------------+--------------+----------------------------+-----------+----------------+------------+----------+-------------------------------------------------------------------------------------+--------------------+---------------------
  uwfull          | {name}       | 2023-01-13 19:05:17.304051 |         3 |              3 |          0 |       12 | NULL                                                                                | 830886108550234113 |                  0
  uwfull          | {location}   | 2023-01-13 19:05:17.304051 |         3 |              3 |          0 |       75 | NULL                                                                                |               NULL |                  0
  uwpartial       | {name}       | 2023-01-13 19:06:20.958132 |         2 |              2 |          0 |       10 | (name IS NULL) OR ((name < 'Washington':::STRING) OR (name > 'Wisconsin':::STRING)) | 830886317125959681 | 830886108550234113
  __merged__      | {name}       | 2023-01-13 19:06:20.958132 |         5 |              5 |          0 |       11 | NULL                                                                                |                  0 |               NULL
(4 rows)

Time: 6ms total (execution 6ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> EXPLAIN SELECT * FROM uw;
                                     info
-------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 5 (100% of the table; stats collected 3 minutes ago)
    table: uw@uw_pkey
    spans: FULL SCAN
(7 rows)

Time: 1ms total (execution 1ms / network 0ms)

After recreating the bundle we're missing the merged stats:

demo@127.0.0.1:26257/defaultdb> SHOW STATISTICS FOR TABLE uw WITH MERGE;
  statistics_name | column_names |          created           | row_count | distinct_count | null_count | avg_size |                                  partial_predicate                                  |    histogram_id    | full_histogram_id
------------------+--------------+----------------------------+-----------+----------------+------------+----------+-------------------------------------------------------------------------------------+--------------------+---------------------
  uwfull          | {name}       | 2023-01-13 19:05:17.304051 |         3 |              3 |          0 |       12 | NULL                                                                                | 830888138034216961 |               NULL
  uwfull          | {location}   | 2023-01-13 19:05:17.304051 |         3 |              3 |          0 |       75 | NULL                                                                                |               NULL |               NULL
  uwpartial       | {name}       | 2023-01-13 19:06:20.958132 |         2 |              2 |          0 |       10 | (name IS NULL) OR ((name < 'Washington':::STRING) OR (name > 'Wisconsin':::STRING)) | 830888138038083585 | 830886108550234113
(3 rows)

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:26257/defaultdb> EXPLAIN SELECT * FROM uw;
                                      info
--------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 3 (100% of the table; stats collected 11 minutes ago)
    table: uw@uw_pkey
    spans: FULL SCAN
(7 rows)

Time: 4ms total (execution 4ms / network 0ms)

I'm going to change this issue to be about this second problem.

@michae2 michae2 changed the title sql: ensure statement bundles include both full statistics and partial statistics sql/stats: after inject or restore of partial statistics, cannot create merged statistics Jan 13, 2023
@michae2
Copy link
Collaborator

michae2 commented Jan 13, 2023

As a workaround we can fix the restored / injected partial statistics with something like:

UPDATE
  system.table_statistics AS p
SET
  "fullStatisticID"
    = (
      SELECT
        "statisticID"
      FROM
        system.table_statistics AS f
      WHERE
        f."tableID" = p."tableID"
        AND f."columnIDs" = p."columnIDs"
        AND f."createdAt" < p."createdAt"
        AND f."partialPredicate" IS NULL
      ORDER BY
        f."createdAt" DESC
      LIMIT
        1
    )
WHERE
  p."tableID" = 'uw'::REGCLASS::INT8
  AND p."createdAt" > '2023-01-13'
  AND p."partialPredicate" IS NOT NULL
RETURNING
  p."tableID",
  p."columnIDs",
  p.name,
  p."statisticID",
  p."fullStatisticID";

But this won't always be correct, so I think it would be better to really fix this. Probably by changing

_, err := executor.Exec(
ctx, "insert-statistic", txn,
`INSERT INTO system.table_statistics (
"tableID",
"name",
"columnIDs",
"rowCount",
"distinctCount",
"nullCount",
"avgSize",
histogram
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)`,
tableID,
nameVal,
columnIDsVal,
rowCount,
distinctCount,
nullCount,
avgSize,
histogramVal,
)
to not throw away the statisticID.

@mgartner
Copy link
Collaborator

@michae2 can we push this to 23.2 given that we'll disable partial statistics by default in 23.1 (assuming we do #95233).

Uzair5162 added a commit to Uzair5162/cockroach that referenced this issue Jul 16, 2024
Previously, JSON statistics would not include the statistic ID. This
meant that restoring statement bundles would recreate stats with new
IDs, breaking the statisticID/fullStatisticID relationship between full
& partial stats and fail to recreate merged stats as a result. This
commit adds the statistic ID to JSONStatistic and recreates stats with
the same ID if present when injected. Allows for merged stats to be
correctly recreated following inject/restore of full and partial stats.

Fixes: cockroachdb#94101

See also: cockroachdb#125950

Release note (bug fix): Fixed a bug that prevented merged stats from
being created after injecting stats or recreating statement bundles.
This would occur when the injected stats/statement bundle contained
related full and partial statistics.
craig bot pushed a commit that referenced this issue Jul 17, 2024
127155: github: fix check-pebble-dep action r=RaduBerinde a=RaduBerinde

The action is failing with
`invalid object name 'origin/release-23.1'`

This commit adds `fetch-depth: 0` which enables use of all branches.

Epic: none
Release note: None

127252: sql/stats: fix inject/restore of partial stats not creating merged stats r=Uzair5162 a=Uzair5162

Previously, JSON statistics would not include the statistic ID. This meant that restoring statement bundles would recreate stats with new IDs, breaking the statisticID/fullStatisticID relationship between full & partial stats and fail to recreate merged stats as a result. This commit adds the statistic ID to JSONStatistic and recreates stats with the same ID if present when injected. Allows for merged stats to be correctly recreated following inject/restore of full and partial stats.

Fixes: #94101

See also: #125950

Release note (bug fix): Fixed a bug that prevented merged stats from being created after injecting stats or recreating statement bundles. This would occur when the injected stats/statement bundle contained related full and partial statistics.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Uzair Ahmad <uzair.ahmad@cockroachlabs.com>
@craig craig bot closed this as completed in 4ece712 Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants