Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Handle SQL NULL and JSON 'null' correctly for JSON columns #13944

Merged
merged 16 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions go/mysql/binlog/binlog_json_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ func TestMarshalJSONToSQL(t *testing.T) {
{
name: "null",
data: []byte{},
expected: "CAST(null as JSON)",
expected: "CONVERT('null' using utf8mb4)",
},
{
name: `object {"a": "b"}`,
Expand Down Expand Up @@ -340,7 +340,7 @@ func TestMarshalJSONToSQL(t *testing.T) {
{
name: `null`,
data: []byte{4, 0},
expected: `CAST(null as JSON)`,
expected: `CONVERT('null' using utf8mb4)`,
},
{
name: `-1`,
Expand Down
7 changes: 5 additions & 2 deletions go/mysql/json/marshal.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,15 @@ func (v *Value) marshalSQLInternal(top bool, dst []byte) []byte {
}
return dst
case TypeNull:
// JSON null is a 'null' string literal so we cannot use
// CAST(null as JSON) because that (surprisingly?) turns
// into an SQL NULL.
if top {
dst = append(dst, "CAST("...)
dst = append(dst, "CONVERT('"...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is semantically incorrect. The goal is that JSON marshaling also returns a JSON type from the expression, but it does not for this. It kinda works though because on insert now MySQL casts, but that's an implementation side-effect for vreplication, not something generic.

I think this should be CAST('null' as JSON) (instead of CAST(null as JSON)), which should also work I think and returns a JSON type for the expression:

mysql> select CAST(null as JSON);
+--------------------+
| CAST(null as JSON) |
+--------------------+
| NULL               |
+--------------------+
1 row in set (0.00 sec)

mysql> select CAST('null' as JSON);
+----------------------+
| CAST('null' as JSON) |
+----------------------+
| null                 |
+----------------------+
1 row in set (0.00 sec)

Given the MySQL documentation, I think we also should change the serialization for true / false as a toplevel value, and also quote that with a single ' to match the behavior for how we implement null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbussink I agree. And that's what I tried first, but then it became clear that there was another issue (and CONVERT is what we were previously using here):

+----------+-------------+-------+-------+--------------------------+
| order_id | customer_id | sku   | price | json_data                |
+----------+-------------+-------+-------+--------------------------+
|        1 |           1 | sku1  |     1 | NULL                     |
|        2 |           2 | sku2  |     2 | {}                       |
|        3 |           3 | sku3  |     3 | "base64:type15:bnVsbA==" |
|        4 |           4 | sku4  |     4 | true                     |
|        5 |           5 | sku5  |     5 | false                    |
|        6 |          10 | sku10 |    10 | NULL                     |
|        7 |           6 | sku6  |     6 | true                     |
|        8 |           7 | sku7  |     7 | false                    |
|        9 |           8 | sku8  |     8 | "base64:type15:bnVsbA==" |
|       10 |           9 | sku9  |     9 | NULL                     |
+----------+-------------+-------+-------+--------------------------+

This seems to lead us to treating it subsequently as a binary blob. If you think that's another internal bug then I can track that down too. And if you might have any ideas... 🙂

Thanks!

Copy link
Contributor Author

@mattlord mattlord Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for 'true' and 'false' as well — the issue I noted above is not specific to 'null'. I had initially tried to treat true/false/null the same way (as you suggested above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean it needs CAST(CONVERT('null' using utf8mb4) as JSON), or syntactically different but semantically the same, CAST(_utf8mb4'null' as JSON). I know that's messy, but I think that's what needed to be correct.

That it's otherwise binary is due to vreplication using the binary collation and we need to make sure then that inputs are encoding agnostic / explicit about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dbussink ! That makes sense. I can see that we use the _utf8mb4'literal' syntax throughout our JSON handling so I switched to that and it's working as expected.

}
dst = append(dst, "null"...)
if top {
dst = append(dst, " as JSON)"...)
dst = append(dst, "' using utf8mb4)"...)
}
return dst
default:
Expand Down
17 changes: 9 additions & 8 deletions go/vt/sqlparser/parsed_query.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,15 +129,16 @@ func (pq *ParsedQuery) AppendFromRow(buf *bytes2.Buffer, fields []*querypb.Field
case querypb.Type_TUPLE:
return vterrors.Errorf(vtrpcpb.Code_INTERNAL, "unexpected Type_TUPLE for value %d", i)
case querypb.Type_JSON:
buf2 := sqltypes.NullBytes
if col.length >= 0 {
buf2 = row.Values[col.offset : col.offset+col.length]
}
vv, err := vjson.MarshalSQLValue(buf2)
if err != nil {
return err
if col.length < 0 { // An SQL NULL and not an actual JSON value
buf.WriteString(sqltypes.NullStr)
} else { // A JSON value (which may be an JSON null literal value)
buf2 := row.Values[col.offset : col.offset+col.length]
vv, err := vjson.MarshalSQLValue(buf2)
if err != nil {
return err
}
buf.WriteString(vv.RawStr())
}
buf.WriteString(vv.RawStr())
default:
if col.length < 0 {
// -1 means a null variable; serialize it directly
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -708,6 +708,7 @@ func customExpectData(t *testing.T, table string, values [][]string, exec func(c
if err == nil {
return
}
log.Errorf("data mismatch: %v, retrying", err)
time.Sleep(tick)
}
}
Expand All @@ -730,7 +731,7 @@ func compareQueryResults(t *testing.T, query string, values [][]string,
}
for j, val := range row {
if got := qr.Rows[i][j].ToString(); got != val {
return fmt.Errorf("mismatch at (%d, %d): %v, want %s", i, j, qr.Rows[i][j], val)
return fmt.Errorf("mismatch at (%d, %d): got '%s', want '%s'", i, j, qr.Rows[i][j].ToString(), val)
}
}
}
Expand Down
10 changes: 7 additions & 3 deletions go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go
Original file line number Diff line number Diff line change
Expand Up @@ -378,9 +378,13 @@ func (tp *TablePlan) applyChange(rowChange *binlogdatapb.RowChange, executor fun
var newVal *sqltypes.Value
var err error
if field.Type == querypb.Type_JSON {
newVal, err = vjson.MarshalSQLValue(vals[i].Raw())
if err != nil {
return nil, err
if vals[i].IsNull() { // An SQL NULL and not an actual JSON value
newVal = &sqltypes.NULL
} else { // A JSON value (which may be an JSON null literal value)
newVal, err = vjson.MarshalSQLValue(vals[i].Raw())
if err != nil {
return nil, err
}
}
bindVar, err = tp.bindFieldVal(field, newVal)
} else {
Expand Down
11 changes: 7 additions & 4 deletions go/vt/vttablet/tabletmanager/vreplication/vcopier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ func testPlayerCopyTables(t *testing.T) {

execStatements(t, []string{
"create table src1(id int, val varbinary(128), d decimal(8,0), j json, primary key(id))",
"insert into src1 values(2, 'bbb', 1, '{\"foo\": \"bar\"}'), (1, 'aaa', 0, JSON_ARRAY(123456789012345678901234567890, \"abcd\"))",
"insert into src1 values(2, 'bbb', 1, '{\"foo\": \"bar\"}'), (1, 'aaa', 0, JSON_ARRAY(123456789012345678901234567890, \"abcd\")), (3, 'ccc', 2, 'null'), (4, 'ddd', 3, '{\"name\": \"matt\", \"size\": null}'), (5, 'eee', 4, null)",
fmt.Sprintf("create table %s.dst1(id int, val varbinary(128), val2 varbinary(128), d decimal(8,0), j json, primary key(id))", vrepldb),
"create table yes(id int, val varbinary(128), primary key(id))",
fmt.Sprintf("create table %s.yes(id int, val varbinary(128), primary key(id))", vrepldb),
Expand Down Expand Up @@ -617,8 +617,8 @@ func testPlayerCopyTables(t *testing.T) {
// The first fast-forward has no starting point. So, it just saves the current position.
"/update _vt.vreplication set pos=",
"begin",
"insert into dst1(id,val,val2,d,j) values (1,'aaa','aaa',0,JSON_ARRAY(123456789012345678901234567890, _utf8mb4'abcd')), (2,'bbb','bbb',1,JSON_OBJECT(_utf8mb4'foo', _utf8mb4'bar'))",
`/insert into _vt.copy_state \(lastpk, vrepl_id, table_name\) values \('fields:{name:\\"id\\" type:INT32 charset:63 flags:53251} rows:{lengths:1 values:\\"2\\"}'.*`,
"insert into dst1(id,val,val2,d,j) values (1,'aaa','aaa',0,JSON_ARRAY(123456789012345678901234567890, _utf8mb4'abcd')), (2,'bbb','bbb',1,JSON_OBJECT(_utf8mb4'foo', _utf8mb4'bar')), (3,'ccc','ccc',2,CONVERT('null' using utf8mb4)), (4,'ddd','ddd',3,JSON_OBJECT(_utf8mb4'name', _utf8mb4'matt', _utf8mb4'size', null)), (5,'eee','eee',4,null)",
`/insert into _vt.copy_state \(lastpk, vrepl_id, table_name\) values \('fields:{name:\\"id\\" type:INT32 charset:63 flags:53251} rows:{lengths:1 values:\\"5\\"}'.*`,
"commit",
// copy of dst1 is done: delete from copy_state.
"/delete cs, pca from _vt.copy_state as cs left join _vt.post_copy_action as pca on cs.vrepl_id=pca.vrepl_id and cs.table_name=pca.table_name.*dst1",
Expand All @@ -634,9 +634,12 @@ func testPlayerCopyTables(t *testing.T) {
expectData(t, "dst1", [][]string{
{"1", "aaa", "aaa", "0", "[123456789012345678901234567890, \"abcd\"]"},
{"2", "bbb", "bbb", "1", "{\"foo\": \"bar\"}"},
{"3", "ccc", "ccc", "2", "null"},
{"4", "ddd", "ddd", "3", "{\"name\": \"matt\", \"size\": null}"},
{"5", "eee", "eee", "4", ""},
})
expectData(t, "yes", [][]string{})
validateCopyRowCountStat(t, 2)
validateCopyRowCountStat(t, 5)
ctx, cancel := context.WithCancel(context.Background())

type logTestCase struct {
Expand Down
13 changes: 11 additions & 2 deletions go/vt/vttablet/tabletmanager/vreplication/vplayer_flaky_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1641,17 +1641,26 @@ func TestPlayerTypes(t *testing.T) {
},
}, {
input: "insert into vitess_json(val1,val2,val3,val4,val5) values (null,'{}','123','{\"a\":[42,100]}','{\"foo\": \"bar\"}')",
output: "insert into vitess_json(id,val1,val2,val3,val4,val5) values (1,CAST(null as JSON),JSON_OBJECT(),CAST(123 as JSON),JSON_OBJECT(_utf8mb4'a', JSON_ARRAY(42, 100)),JSON_OBJECT(_utf8mb4'foo', _utf8mb4'bar'))",
output: "insert into vitess_json(id,val1,val2,val3,val4,val5) values (1,null,JSON_OBJECT(),CAST(123 as JSON),JSON_OBJECT(_utf8mb4'a', JSON_ARRAY(42, 100)),JSON_OBJECT(_utf8mb4'foo', _utf8mb4'bar'))",
table: "vitess_json",
data: [][]string{
{"1", "", "{}", "123", `{"a": [42, 100]}`, `{"foo": "bar"}`},
},
}, {
input: "update vitess_json set val1 = '{\"bar\": \"foo\"}', val4 = '{\"a\": [98, 123]}', val5 = convert(x'7b7d' using utf8mb4)",
input: "insert into vitess_json(val1,val2,val3,val4,val5) values ('null', '{\"name\":null}','123','{\"a\":[42,100]}','{\"foo\": \"bar\"}')",
output: "insert into vitess_json(id,val1,val2,val3,val4,val5) values (2,CONVERT('null' using utf8mb4),JSON_OBJECT(_utf8mb4'name', null),CAST(123 as JSON),JSON_OBJECT(_utf8mb4'a', JSON_ARRAY(42, 100)),JSON_OBJECT(_utf8mb4'foo', _utf8mb4'bar'))",
table: "vitess_json",
data: [][]string{
{"1", "", "{}", "123", `{"a": [42, 100]}`, `{"foo": "bar"}`},
{"2", "null", `{"name": null}`, "123", `{"a": [42, 100]}`, `{"foo": "bar"}`},
},
}, {
input: "update vitess_json set val1 = '{\"bar\": \"foo\"}', val4 = '{\"a\": [98, 123]}', val5 = convert(x'7b7d' using utf8mb4) where id=1",
output: "update vitess_json set val1=JSON_OBJECT(_utf8mb4'bar', _utf8mb4'foo'), val2=JSON_OBJECT(), val3=CAST(123 as JSON), val4=JSON_OBJECT(_utf8mb4'a', JSON_ARRAY(98, 123)), val5=JSON_OBJECT() where id=1",
table: "vitess_json",
data: [][]string{
{"1", `{"bar": "foo"}`, "{}", "123", `{"a": [98, 123]}`, `{}`},
{"2", "null", `{"name": null}`, "123", `{"a": [42, 100]}`, `{"foo": "bar"}`},
},
}}

Expand Down