Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/stats: fix inject/restore of partial stats not creating merged stats #127252

Conversation

Uzair5162
Copy link
Contributor

Previously, JSON statistics would not include the statistic ID. This meant that restoring statement bundles would recreate stats with new IDs, breaking the statisticID/fullStatisticID relationship between full & partial stats and fail to recreate merged stats as a result. This commit adds the statistic ID to JSONStatistic and recreates stats with the same ID if present when injected. Allows for merged stats to be correctly recreated following inject/restore of full and partial stats.

Fixes: #94101

See also: #125950

Release note (bug fix): Fixed a bug that prevented merged stats from being created after injecting stats or recreating statement bundles. This would occur when the injected stats/statement bundle contained related full and partial statistics.

@Uzair5162 Uzair5162 requested review from a team as code owners July 16, 2024 15:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @Uzair5162)


pkg/sql/stats/json.go line 31 at r1 (raw file):

// See TableStatistic for a description of the fields.
type JSONStatistic struct {
	StatisticID   uint64   `json:"statistic_id,omitempty"`

nit: Should this be just ID? Or is StatisticID easier to understand because we use that naming elsewhere?


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 167 at r1 (raw file):


query IITTTIIII colnames
SELECT histogram_id, full_histogram_id, statistics_name, column_names, created, row_count, distinct_count, null_count, avg_size

What's the relationship between full_statistic_id and full_histogram_id, and histogram_id and statistic_id?


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 497 at r1 (raw file):

           "partial_predicate": "(h < 10:::INT8) OR ((h > 40:::INT8) OR (h IS NULL))",
           "row_count": 0,
           "full_statistic_id": 1

I don't see any tests where full_statistic_id is included in the stats output—I only see it being included in the injected stats. Should we add some tests for that?

Previously, JSON statistics would not include the statistic ID. This
meant that restoring statement bundles would recreate stats with new
IDs, breaking the statisticID/fullStatisticID relationship between full
& partial stats and fail to recreate merged stats as a result. This
commit adds the statistic ID to JSONStatistic and recreates stats with
the same ID if present when injected. Allows for merged stats to be
correctly recreated following inject/restore of full and partial stats.

Fixes: cockroachdb#94101

See also: cockroachdb#125950

Release note (bug fix): Fixed a bug that prevented merged stats from
being created after injecting stats or recreating statement bundles.
This would occur when the injected stats/statement bundle contained
related full and partial statistics.
@Uzair5162 Uzair5162 force-pushed the fix-inject-restore-partial-and-full-stat-id-relationship-gh-94101 branch from b41f0bc to 4ece712 Compare July 16, 2024 20:51
Copy link
Contributor Author

@Uzair5162 Uzair5162 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @michae2)


pkg/sql/stats/json.go line 31 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: Should this be just ID? Or is StatisticID easier to understand because we use that naming elsewhere?

Done, I think ID is better.


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 167 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

What's the relationship between full_statistic_id and full_histogram_id, and histogram_id and statistic_id?

(full_statistic_id/full_histogram_id) and (histogram_id/statistic_id) are interchangeable AFAIK. full_statistic_id/full_histogram_id is the ID of the full table statistic that the partial statistic was derived from and is null for full statistics (see aab61b6).


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 497 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I don't see any tests where full_statistic_id is included in the stats output—I only see it being included in the injected stats. Should we add some tests for that?

Done, added checking stats json output for full_statistic_id to an existing test here.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Once Marcus's comments are addressed.

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @michae2)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @michae2 and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2397 at r2 (raw file):

  FROM [SHOW STATISTICS USING JSON FOR TABLE xy]
)
WHERE stat->>'full_statistic_id' = '$statistics_id'

nit: does this need to change to id?


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 167 at r1 (raw file):

Previously, Uzair5162 (Uzair Ahmad) wrote…

(full_statistic_id/full_histogram_id) and (histogram_id/statistic_id) are interchangeable AFAIK. full_statistic_id/full_histogram_id is the ID of the full table statistic that the partial statistic was derived from and is null for full statistics (see aab61b6).

Something to think about, maybe outside of this PR: Should we make an effort to make the names consistent? Is it possible to do so without breaking stats during an upgrade of CRDB versions?

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @mgartner and @Uzair5162)


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 167 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Something to think about, maybe outside of this PR: Should we make an effort to make the names consistent? Is it possible to do so without breaking stats during an upgrade of CRDB versions?

I believe histogram_id and full_histogram_id are only used in SHOW STATISTICS output, everywhere else uses "statistics ID". Maybe it was done this way to indicate a connection with SHOW HISTOGRAM (which also uses the statistics ID)?


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 56 at r2 (raw file):

    "null_count": 0,
    "row_count": 3,
    "statistics_id": 1

Huh, that's interesting. I wonder if we started down this route and forgot to remove this?

Copy link
Contributor Author

@Uzair5162 Uzair5162 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2397 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: does this need to change to id?

Don't think so, statistics_id here is referring to the $statistics_id logic test variable and not the stat JSON field.


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 56 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Huh, that's interesting. I wonder if we started down this route and forgot to remove this?

Likely that was the case since this was the only place that had statistics_id.

@Uzair5162
Copy link
Contributor Author

bors r+

@craig craig bot merged commit d6b590c into cockroachdb:master Jul 17, 2024
22 checks passed
@mgartner
Copy link
Collaborator

pkg/sql/logictest/testdata/logic_test/distsql_stats line 2397 at r2 (raw file):

Previously, Uzair5162 (Uzair Ahmad) wrote…

Don't think so, statistics_id here is referring to the $statistics_id logic test variable and not the stat JSON field.

Sorry, I was referring to full_statistic_id. Hasn't that been replaced with id?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/stats: after inject or restore of partial statistics, cannot create merged statistics
5 participants