-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/stats: fix inject/restore of partial stats not creating merged stats #127252
sql/stats: fix inject/restore of partial stats not creating merged stats #127252
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @Uzair5162)
pkg/sql/stats/json.go
line 31 at r1 (raw file):
// See TableStatistic for a description of the fields. type JSONStatistic struct { StatisticID uint64 `json:"statistic_id,omitempty"`
nit: Should this be just ID
? Or is StatisticID
easier to understand because we use that naming elsewhere?
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 167 at r1 (raw file):
query IITTTIIII colnames SELECT histogram_id, full_histogram_id, statistics_name, column_names, created, row_count, distinct_count, null_count, avg_size
What's the relationship between full_statistic_id
and full_histogram_id
, and histogram_id
and statistic_id
?
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 497 at r1 (raw file):
"partial_predicate": "(h < 10:::INT8) OR ((h > 40:::INT8) OR (h IS NULL))", "row_count": 0, "full_statistic_id": 1
I don't see any tests where full_statistic_id
is included in the stats output—I only see it being included in the injected stats. Should we add some tests for that?
Previously, JSON statistics would not include the statistic ID. This meant that restoring statement bundles would recreate stats with new IDs, breaking the statisticID/fullStatisticID relationship between full & partial stats and fail to recreate merged stats as a result. This commit adds the statistic ID to JSONStatistic and recreates stats with the same ID if present when injected. Allows for merged stats to be correctly recreated following inject/restore of full and partial stats. Fixes: cockroachdb#94101 See also: cockroachdb#125950 Release note (bug fix): Fixed a bug that prevented merged stats from being created after injecting stats or recreating statement bundles. This would occur when the injected stats/statement bundle contained related full and partial statistics.
b41f0bc
to
4ece712
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @michae2)
pkg/sql/stats/json.go
line 31 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Should this be just
ID
? Or isStatisticID
easier to understand because we use that naming elsewhere?
Done, I think ID
is better.
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 167 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
What's the relationship between
full_statistic_id
andfull_histogram_id
, andhistogram_id
andstatistic_id
?
(full_statistic_id
/full_histogram_id
) and (histogram_id
/statistic_id
) are interchangeable AFAIK. full_statistic_id
/full_histogram_id
is the ID of the full table statistic that the partial statistic was derived from and is null for full statistics (see aab61b6).
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 497 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I don't see any tests where
full_statistic_id
is included in the stats output—I only see it being included in the injected stats. Should we add some tests for that?
Done, added checking stats json output for full_statistic_id
to an existing test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once Marcus's comments are addressed.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r2, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @michae2 and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats
line 2397 at r2 (raw file):
FROM [SHOW STATISTICS USING JSON FOR TABLE xy] ) WHERE stat->>'full_statistic_id' = '$statistics_id'
nit: does this need to change to id
?
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 167 at r1 (raw file):
Previously, Uzair5162 (Uzair Ahmad) wrote…
(
full_statistic_id
/full_histogram_id
) and (histogram_id
/statistic_id
) are interchangeable AFAIK.full_statistic_id
/full_histogram_id
is the ID of the full table statistic that the partial statistic was derived from and is null for full statistics (see aab61b6).
Something to think about, maybe outside of this PR: Should we make an effort to make the names consistent? Is it possible to do so without breaking stats during an upgrade of CRDB versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @mgartner and @Uzair5162)
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 167 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Something to think about, maybe outside of this PR: Should we make an effort to make the names consistent? Is it possible to do so without breaking stats during an upgrade of CRDB versions?
I believe histogram_id
and full_histogram_id
are only used in SHOW STATISTICS
output, everywhere else uses "statistics ID". Maybe it was done this way to indicate a connection with SHOW HISTOGRAM
(which also uses the statistics ID)?
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 56 at r2 (raw file):
"null_count": 0, "row_count": 3, "statistics_id": 1
Huh, that's interesting. I wonder if we started down this route and forgot to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 3 of 0 LGTMs obtained (waiting on @mgartner and @michae2)
pkg/sql/logictest/testdata/logic_test/distsql_stats
line 2397 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: does this need to change to
id
?
Don't think so, statistics_id here is referring to the $statistics_id
logic test variable and not the stat JSON field.
pkg/sql/opt/exec/execbuilder/testdata/partial_stats
line 56 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Huh, that's interesting. I wonder if we started down this route and forgot to remove this?
Likely that was the case since this was the only place that had statistics_id
.
bors r+ |
Previously, Uzair5162 (Uzair Ahmad) wrote…
Sorry, I was referring to |
Previously, JSON statistics would not include the statistic ID. This meant that restoring statement bundles would recreate stats with new IDs, breaking the statisticID/fullStatisticID relationship between full & partial stats and fail to recreate merged stats as a result. This commit adds the statistic ID to JSONStatistic and recreates stats with the same ID if present when injected. Allows for merged stats to be correctly recreated following inject/restore of full and partial stats.
Fixes: #94101
See also: #125950
Release note (bug fix): Fixed a bug that prevented merged stats from being created after injecting stats or recreating statement bundles. This would occur when the injected stats/statement bundle contained related full and partial statistics.