Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: fix diffing of textual columns with implicit charsets #14930

Merged
Merged
87 changes: 84 additions & 3 deletions go/vt/schemadiff/column.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,90 @@

// ColumnDiff compares this table statement with another table statement, and sees what it takes to
// change this table to look like the other table.
// It returns an AlterTable statement if changes are found, or nil if not.
// the other table may be of different name; its name is ignored.
func (c *ColumnDefinitionEntity) ColumnDiff(other *ColumnDefinitionEntity, _ *DiffHints) *ModifyColumnDiff {
// It returns an ModifyColumnDiff statement if changes are found, or nil if not.
// The function also requires the charset/collate on the source & target tables. This is because the column's
// charset & collation, if undefined, are really defined by the table's charset & collation.
//
// Anecdotally, in CreateTableEntity.normalize() we actually actively strip away the charset/collate properties
// from the column definition, to get a cleaner table definition.
//
// Things get complicated when we consider hints.TableCharsetCollateStrategy. Consider this test case:
//
// from: "create table t (a varchar(64)) default charset=latin1",
// to: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin)",
//
// In both cases, the column is really a latin1. But the tables themselves have different collations.
// In TableCharsetCollateStrict, the situation is handled correctly by converting the table charset as well as
// pinning the column's charset.
// But in TableCharsetCollateIgnoreEmpty or in TableCharsetCollateIgnoreAlways, schemadiff is not supposed to
// change the table's charset. In which case it does need to identify the fact that the two columns are really of
// the same charset, and not report a diff.
func (c *ColumnDefinitionEntity) ColumnDiff(
other *ColumnDefinitionEntity,
hints *DiffHints,
t1cc *charsetCollate,
t2cc *charsetCollate,
) *ModifyColumnDiff {
// We need to understand whether the table's charset is going to be ignored. If so, we need to denormalize
// the columns' charset/collate properties for the sake of this comparison.
// Denormalization means that if the column does not have charset/collate defined, then it must borrow those
// propeties from the table.
denormalizeCharsetCollate := false
if c.IsTextual() || other.IsTextual() {
switch hints.TableCharsetCollateStrategy {
case TableCharsetCollateStrict:
// No need to do anything
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Also in this case we can get an empty / column charset due to normalization if you change the table level charset and we shouldn't then generate a diff.

This is actually visible in a broken test that was added.

case TableCharsetCollateIgnoreEmpty:
if t1cc.charset != t2cc.charset {
if t1cc.charset == "" || t2cc.charset == "" {
denormalizeCharsetCollate = true
}

Check warning on line 119 in go/vt/schemadiff/column.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/column.go#L115-L119

Added lines #L115 - L119 were not covered by tests
}
if t1cc.collate != t2cc.collate {
if t1cc.collate == "" || t2cc.collate == "" {
denormalizeCharsetCollate = true
}

Check warning on line 124 in go/vt/schemadiff/column.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/column.go#L121-L124

Added lines #L121 - L124 were not covered by tests
}
case TableCharsetCollateIgnoreAlways:
if t1cc.charset != t2cc.charset {
denormalizeCharsetCollate = true
}
if t1cc.collate != t2cc.collate {
denormalizeCharsetCollate = true
}
}
}

if denormalizeCharsetCollate {
// We have concluded that the two tables have different charset/collate, and that the diff hints will cause
// that difference to be ignored.
// Now is the time to populate (denormalize) the column's charset/collate properties.
if c.columnDefinition.Type.Charset.Name == "" {
c.columnDefinition.Type.Charset.Name = t1cc.charset
defer func() {
c.columnDefinition.Type.Charset.Name = ""
}()
}
if c.columnDefinition.Type.Options.Collate == "" {
c.columnDefinition.Type.Options.Collate = t1cc.collate
defer func() {
c.columnDefinition.Type.Options.Collate = ""
}()

Check warning on line 150 in go/vt/schemadiff/column.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/column.go#L147-L150

Added lines #L147 - L150 were not covered by tests
}
if other.columnDefinition.Type.Charset.Name == "" {
other.columnDefinition.Type.Charset.Name = t2cc.charset
defer func() {
other.columnDefinition.Type.Charset.Name = ""
}()

Check warning on line 156 in go/vt/schemadiff/column.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/column.go#L153-L156

Added lines #L153 - L156 were not covered by tests
}
if other.columnDefinition.Type.Options.Collate == "" {
other.columnDefinition.Type.Options.Collate = t2cc.collate
defer func() {
other.columnDefinition.Type.Options.Collate = ""
}()

Check warning on line 162 in go/vt/schemadiff/column.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/column.go#L159-L162

Added lines #L159 - L162 were not covered by tests
}
}

if sqlparser.Equals.RefOfColumnDefinition(c.columnDefinition, other.columnDefinition) {
return nil
}
Expand Down
59 changes: 55 additions & 4 deletions go/vt/schemadiff/diff_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,49 @@ func TestDiffTables(t *testing.T) {
TableQualifierHint: TableQualifierDeclared,
},
},
{
name: "changing table level defaults with column specific settings, ignore charset",
from: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin) default charset=latin1",
to: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin)",
hints: &DiffHints{
AlterTableAlgorithmStrategy: AlterTableAlgorithmStrategyCopy,
TableCharsetCollateStrategy: TableCharsetCollateIgnoreAlways,
},
},
{
name: "changing table level defaults with column specific settings",
from: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin) default charset=latin1",
to: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin)",
diff: "alter table t modify column a varchar(64) character set latin1 collate latin1_bin, charset utf8mb4, algorithm = COPY",
cdiff: "ALTER TABLE `t` MODIFY COLUMN `a` varchar(64) CHARACTER SET latin1 COLLATE latin1_bin, CHARSET utf8mb4, ALGORITHM = COPY",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expectation is wrong. In this case there should not be a MODIFY COLUMN, since the charset / collation of the column doesn't actually change. This should only be alter table t charset utf8mb4 instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the expected output, so that now the PR should be failing until the logic is adapted.

action: "alter",
fromName: "t",
hints: &DiffHints{
AlterTableAlgorithmStrategy: AlterTableAlgorithmStrategyCopy,
TableCharsetCollateStrategy: TableCharsetCollateStrict,
},
},
{
name: "changing table level defaults with column specific settings, table already normalized",
from: "create table t (a varchar(64)) default charset=latin1",
to: "create table t (a varchar(64) CHARACTER SET latin1 COLLATE latin1_bin)",
diff: "alter table t modify column a varchar(64) character set latin1 collate latin1_bin, charset utf8mb4, algorithm = COPY",
cdiff: "ALTER TABLE `t` MODIFY COLUMN `a` varchar(64) CHARACTER SET latin1 COLLATE latin1_bin, CHARSET utf8mb4, ALGORITHM = COPY",
action: "alter",
fromName: "t",
hints: &DiffHints{
AlterTableAlgorithmStrategy: AlterTableAlgorithmStrategyCopy,
TableCharsetCollateStrategy: TableCharsetCollateStrict,
},
},
{
name: "changing table level charset to default",
from: `create table t (i int) default charset=latin1`,
to: `create table t (i int)`,
action: "alter",
diff: "alter table t charset utf8mb4",
cdiff: "ALTER TABLE `t` CHARSET utf8mb4",
},
}
parser := sqlparser.NewTestParser()
for _, ts := range tt {
Expand Down Expand Up @@ -228,8 +271,12 @@ func TestDiffTables(t *testing.T) {
case ts.diff == "":
assert.NoError(t, err)
assert.NoError(t, dqerr)
assert.Nil(t, d)
assert.Nil(t, dq)
if !assert.Nil(t, d) {
assert.Failf(t, "found unexpected diff", "%v", d.CanonicalStatementString())
}
if !assert.Nil(t, dq) {
assert.Failf(t, "found unexpected diff", "%v", dq.CanonicalStatementString())
}
default:
assert.NoError(t, err)
require.NotNil(t, d)
Expand Down Expand Up @@ -357,8 +404,12 @@ func TestDiffViews(t *testing.T) {
case ts.diff == "":
assert.NoError(t, err)
assert.NoError(t, dqerr)
assert.Nil(t, d)
assert.Nil(t, dq)
if !assert.Nil(t, d) {
assert.Failf(t, "found unexpected diff", "%v", d.CanonicalStatementString())
}
if !assert.Nil(t, dq) {
assert.Failf(t, "found unexpected diff", "%v", dq.CanonicalStatementString())
}
default:
assert.NoError(t, err)
require.NotNil(t, d)
Expand Down
87 changes: 45 additions & 42 deletions go/vt/schemadiff/table.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@
"vitess.io/vitess/go/vt/sqlparser"
)

type charsetCollate struct {
charset string
collate string
}

func (c *charsetCollate) defaultCollation() string {
return defaultCharsetCollation(c.charset)
}

type AlterTableEntityDiff struct {
from *CreateTableEntity
to *CreateTableEntity
Expand Down Expand Up @@ -436,21 +445,26 @@
return collationEnv.LookupName(collation)
}

func (c *CreateTableEntity) normalizeColumnOptions() {
tableCharset := defaultCharset()
tableCollation := ""
for _, option := range c.CreateTable.TableSpec.Options {
switch strings.ToUpper(option.Name) {
case "CHARSET":
tableCharset = option.String
case "COLLATE":
tableCollation = option.String
func getTableCharsetCollate(tableOptions *sqlparser.TableOptions) *charsetCollate {
cc := &charsetCollate{
charset: defaultCharset(),
}
for _, option := range *tableOptions {
if strings.EqualFold(option.Name, "charset") {
cc.charset = option.String
}
if strings.EqualFold(option.Name, "collate") {
cc.collate = option.String
}
}
defaultCollation := defaultCharsetCollation(tableCharset)
if tableCollation == "" {
tableCollation = defaultCollation
if cc.collate == "" {
cc.collate = cc.defaultCollation()
}
return cc
}

func (c *CreateTableEntity) normalizeColumnOptions() {
cc := getTableCharsetCollate(&c.CreateTable.TableSpec.Options)

for _, col := range c.CreateTable.TableSpec.Columns {
if col.Type.Options == nil {
Expand Down Expand Up @@ -571,13 +585,13 @@
if _, ok := charsetTypes[col.Type.Type]; ok {
// If the charset is explicitly configured and it mismatches, we don't normalize
// anything for charsets or collations and move on.
if col.Type.Charset.Name != "" && col.Type.Charset.Name != tableCharset {
if col.Type.Charset.Name != "" && col.Type.Charset.Name != cc.charset {
continue
}

// Alright, first check if both charset and collation are the same as
// the table level options, in that case we can remove both since that's equivalent.
if col.Type.Charset.Name == tableCharset && col.Type.Options.Collate == tableCollation {
if col.Type.Charset.Name == cc.charset && col.Type.Options.Collate == cc.collate {
col.Type.Charset.Name = ""
col.Type.Options.Collate = ""
}
Expand All @@ -595,13 +609,13 @@
if col.Type.Charset.Name != "" {
col.Type.Charset.Name = ""
if col.Type.Options.Collate == "" {
col.Type.Options.Collate = defaultCollation
col.Type.Options.Collate = cc.defaultCollation()
}
}

// We now have one case left, which is when we have set a collation but it's the same
// as the table level. In that case, we can clear it since that is equivalent.
if col.Type.Options.Collate == tableCollation {
if col.Type.Options.Collate == cc.collate {
col.Type.Options.Collate = ""
}
}
Expand Down Expand Up @@ -826,21 +840,19 @@
alterTable.Table.Qualifier = other.Table.Qualifier
}

diffedTableCharset := ""
t1cc := getTableCharsetCollate(&c.CreateTable.TableSpec.Options)
t2cc := getTableCharsetCollate(&other.CreateTable.TableSpec.Options)

var parentAlterTableEntityDiff *AlterTableEntityDiff
var partitionSpecs []*sqlparser.PartitionSpec
var superfluousFulltextKeys []*sqlparser.AddIndexDefinition
{
t1Options := c.CreateTable.TableSpec.Options
t2Options := other.CreateTable.TableSpec.Options
diffedTableCharset = c.diffTableCharset(t1Options, t2Options)
}
{
// diff columns
// ordered columns for both tables:

t1Columns := c.CreateTable.TableSpec.Columns
t2Columns := other.CreateTable.TableSpec.Columns
c.diffColumns(alterTable, t1Columns, t2Columns, hints, diffedTableCharset != "")
c.diffColumns(alterTable, t1Columns, t2Columns, hints, t1cc, t2cc)
}
{
// diff keys
Expand Down Expand Up @@ -927,21 +939,11 @@
}

func (c *CreateTableEntity) diffTableCharset(
t1Options sqlparser.TableOptions,
t2Options sqlparser.TableOptions,
t1cc *charsetCollate,
t2cc *charsetCollate,
) string {
getcharset := func(options sqlparser.TableOptions) string {
for _, option := range options {
if strings.EqualFold(option.Name, "CHARSET") {
return option.String
}
}
return ""
}
t1Charset := getcharset(t1Options)
t2Charset := getcharset(t2Options)
if t1Charset != t2Charset {
return t2Charset
if t1cc.charset != t2cc.charset {
return t2cc.charset

Check warning on line 946 in go/vt/schemadiff/table.go

View check run for this annotation

Codecov / codecov/patch

go/vt/schemadiff/table.go#L945-L946

Added lines #L945 - L946 were not covered by tests
}
return ""
}
Expand Down Expand Up @@ -1019,7 +1021,7 @@
case "CHARSET":
switch hints.TableCharsetCollateStrategy {
case TableCharsetCollateStrict:
tableOption = &sqlparser.TableOption{String: ""}
tableOption = &sqlparser.TableOption{Name: "CHARSET", String: defaultCharset(), CaseSensitive: true}
// in all other strategies we ignore the charset
}
case "CHECKSUM":
Expand Down Expand Up @@ -1536,7 +1538,8 @@
t1Columns []*sqlparser.ColumnDefinition,
t2Columns []*sqlparser.ColumnDefinition,
hints *DiffHints,
tableCharsetChanged bool,
t1cc *charsetCollate,
t2cc *charsetCollate,
) {
getColumnsMap := func(cols []*sqlparser.ColumnDefinition) map[string]*columnDetails {
var prevCol *columnDetails
Expand Down Expand Up @@ -1599,13 +1602,13 @@
t2ColEntity := NewColumnDefinitionEntity(t2Col)

// check diff between before/after columns:
modifyColumnDiff := t1ColEntity.ColumnDiff(t2ColEntity, hints)
modifyColumnDiff := t1ColEntity.ColumnDiff(t2ColEntity, hints, t1cc, t2cc)
if modifyColumnDiff == nil {
// even if there's no apparent change, there can still be implicit changes
// it is possible that the table charset is changed. the column may be some col1 TEXT NOT NULL, possibly in both versions 1 and 2,
// but implicitly the column has changed its characters set. So we need to explicitly ass a MODIFY COLUMN statement, so that
// but implicitly the column has changed its character set. So we need to explicitly add a MODIFY COLUMN statement, so that
// MySQL rebuilds it.
if tableCharsetChanged && t2ColEntity.IsTextual() && t2Col.Type.Charset.Name == "" {
if t1cc.charset != t2cc.charset && t2ColEntity.IsTextual() && t2Col.Type.Charset.Name == "" {
modifyColumnDiff = NewModifyColumnDiffByDefinition(t2Col)
}
}
Expand Down
Loading