Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Cannot create a JSON value from a string with CHARACTER SET 'binary' in online ddl #13986

Open
olyazavr opened this issue Sep 14, 2023 · 18 comments
Assignees
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug

Comments

@olyazavr
Copy link
Contributor

olyazavr commented Sep 14, 2023

Overview of the Issue

When running a Vitess onlineddl schema migration with json generated columns, the migration fails in the copy phase with Cannot create a JSON value from a string with CHARACTER SET 'binary'

Reproduction Steps

Create table:

CREATE TABLE `orders` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `orderPaymentMethod` json DEFAULT NULL,
  `stripePaymentMethod` tinyint(4) GENERATED ALWAYS AS (json_unquote(json_extract(`orderPaymentMethod`,'$.paymentMethod'))) STORED,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

Insert:

insert into orders(orderPaymentMethod) values (convert('{\"customerId\": \"cus_Il7WwbIODMoaE1\"}' using utf8mb4))

Note that if orderPaymentMethod is null here, the error does not surface, nor does it surface if there are no json generated columns

Run migration:

vtctl ApplySchema -- --ddl_strategy "vitess" --sql "ALTER TABLE orders ENGINE=INNODB;" TestKeyspace

See failures:

executor.go:4069] updateMigrationMessage: uuid=f3387b12_5323_11ee_8c7e_0a4b134e1d27, message=task error: failed inserting rows: Cannot create a JSON value from a string with CHARACTER SET 'binary'. (errno 3144) (sqlstate 22032) during query: insert into _f3387b12_5323_11ee_8c7e_0a4b134e1d27_20230914172717_vrepl(id,orderPaymentMethod) values (1,convert(JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4))

executor.go:2082] cancelMigrations: cancelling f3387b12_5323_11ee_8c7e_0a4b134e1d27; reason: task error: failed inserting rows: Cannot create a JSON value from a string with CHARACTER SET 'binary'. (errno 3144) (sqlstate 22032) during query: insert into _f3387b12_5323_11ee_8c7e_0a4b134e1d27_20230914172717_vrepl(id,orderPaymentMethod) values (1,convert(JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4))

controller.go:261] vreplication stream 5 going into error state due to Cannot create a JSON value from a string with CHARACTER SET 'binary'. (errno 3144) (sqlstate 22032) during query: insert into _f3387b12_5323_11ee_8c7e_0a4b134e1d27_20230914172717_vrepl(id,orderPaymentMethod) values (1,convert(JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4))

Binary Version

v18

Operating System and Environment details

Centos8, Linux 5.4.141-hs22.el8.x86_64

Log Fragments

No response

@olyazavr olyazavr added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Sep 14, 2023
@frouioui frouioui added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) and removed Needs Triage This issue needs to be correctly labelled and triaged labels Sep 15, 2023
@frouioui
Copy link
Member

@vitessio/online-ddl

@dbussink
Copy link
Contributor

Binary Version

v18

@olyazavr There's no v18 yet, which specific commit / sha is this?

executor.go:2082] cancelMigrations: cancelling f3387b12_5323_11ee_8c7e_0a4b134e1d27; reason: task error: failed inserting rows: Cannot create a JSON value from a string with CHARACTER SET 'binary'. (errno 3144) (sqlstate 22032) during query: insert into _f3387b12_5323_11ee_8c7e_0a4b134e1d27_20230914172717_vrepl(id,orderPaymentMethod) values (1,convert(JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4))

@mattlord It seems like there's an incorrect convert here around this statement? It's from here:

case sourceCol.Type == vrepl.JSONColumnType:
sb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name)))

I think that logic just needs to be dropped entirely, since the JSON marshaller already does the right thing and this wrapping breaks stuff.

@olyazavr
Copy link
Contributor Author

There's no v18 yet, which specific commit / sha is this?

Hi, this is from commit e7c0bb07ac, I pulled from main a couple of days ago

@dbussink
Copy link
Contributor

case sourceCol.Type == vrepl.JSONColumnType:
sb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name)))

I think that logic just needs to be dropped entirely, since the JSON marshaller already does the right thing and this wrapping breaks stuff.

Hmm, this is weird. This logic deals with column names from what I can see, not column values but for some reason we wrap a column value here 😕. No other usages of convert seem like what we're hitting here though 😕.

@olyazavr
Copy link
Contributor Author

olyazavr commented Sep 19, 2023

I'm encountering something very bizarre. I made the fix in v18 to not have that extra convert and it's a valid mysql query and yet:

failed inserting rows: Cannot create a JSON value from a string with CHARACTER SET \'binary\'. (errno 3144) (sqlstate 22032) during query: insert into _cabdd788_5729_11ee_94e0_0e0c58f9c8c3_20230919201911_vrepl(id,orderPaymentMethod) values (1,JSON_OBJECT(_utf8mb4\'customerId\', _utf8mb4\'cus_Il7WwbIODMoaE1\'))'

if you run insert into _cabdd788_5729_11ee_94e0_0e0c58f9c8c3_20230919201911_vrepl(id,orderPaymentMethod) values (1,JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1')) it is valid mysql and it works

from the general log, that is exactly what it's doing, but rolling back:

2023-09-19T20:19:11.468574Z       679 Query     begin
2023-09-19T20:19:11.468764Z       679 Query     insert into _cabdd788_5729_11ee_94e0_0e0c58f9c8c3_20230919201911_vrepl(id,orderPaymentMethod) values (1,JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1'))
2023-09-19T20:19:11.469323Z       679 Query     rollback

I even added a log line into ExecuteFetch, to ensure the query and the error matched what was being reported to me, and they did

as an aside, I can also repro this in v14, before the JSON_OBJECT overhaul, where it fails to run this migration and the query it fails on is valid: insert into _0bbf4258_5728_11ee_9699_12208b5a984b_20230919200641_vrepl(id,orderPaymentMethod) values (1,convert('{\"customerId\": \"cus_Il7WwbIODMoaE1\"}' using utf8mb4))

@shlomi-noach shlomi-noach self-assigned this Sep 20, 2023
@rohit-nayak-ps
Copy link
Contributor

Wondering if changing the generated column query to something like, will help, in case the issue is with how MySQL internally processes json for generated columns.

CREATE TABLE `orders` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `orderPaymentMethod` json DEFAULT NULL,
  `stripePaymentMethod` tinyint GENERATED ALWAYS AS (cast(json_unquote(json_extract(`orderPaymentMethod`,_utf8mb4'$.paymentMethod')) as char charset utf8mb4)) STORED,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

@shlomi-noach
Copy link
Contributor

I haven't dug deeply into the discussion yet, but just off the tip of my head:

  • JSON is always utf8 and cannot be binary
  • Generated columns should be completely ignored in Online DDL -- the point is that MySQL populates them, not us. So I'd start with looking into that.

@shlomi-noach
Copy link
Contributor

On main branch, which is really equivalent to v18 when it comes to Online DDL, I am unable to reproduce, and the migration completes correctly:

$ mysql

mysql> CREATE TABLE `orders` (
    ->   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    ->   `orderPaymentMethod` json DEFAULT NULL,
    ->   `stripePaymentMethod` tinyint(4) GENERATED ALWAYS AS (json_unquote(json_extract(`orderPaymentMethod`,'$.paymentMethod'))) STORED,
    ->   PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into orders(orderPaymentMethod) values (convert('{\"customerId\": \"cus_Il7WwbIODMoaE1\"}' using utf8mb4))
    -> ;
Query OK, 1 row affected (0.01 sec)

mysql> ^DBye
$ vtctldclient ApplySchema --ddl-strategy="online --postpone-completion"  --sql "ALTER TABLE orders ENGINE=INNODB;" commerce
ee4facd6_57c6_11ee_a929_0a43f95f28a3

$ vtctldclient OnlineDDL complete commerce all
{
  "rows_affected_by_shard": {
    "0": "1"
  }
}
                             id: 1
                 migration_uuid: ee4facd6_57c6_11ee_a929_0a43f95f28a3
                       keyspace: commerce
                          shard: 0
                   mysql_schema: vt_commerce
                    mysql_table: orders
            migration_statement: alter table orders ENGINE INNODB
                       strategy: online
                        options: --postpone-completion
                added_timestamp: 2023-09-20 15:04:00
            requested_timestamp: 2023-09-20 15:04:01
                ready_timestamp: NULL
              started_timestamp: 2023-09-20 15:04:02
             liveness_timestamp: 2023-09-20 15:04:21
            completed_timestamp: 2023-09-20 15:04:22.112595
              cleanup_timestamp: NULL
               migration_status: complete
                       log_path:
                      artifacts: _ee4facd6_57c6_11ee_a929_0a43f95f28a3_20230920150401_vrepl,
                        retries: 0
                         tablet: zone1-0000000100
                 tablet_failure: 0
                       progress: 100
              migration_context: vtctl:ee4ecee2-57c6-11ee-a929-0a43f95f28a3
                     ddl_action: alter
                        message:
                    eta_seconds: 0
                    rows_copied: 1
                     table_rows: 1
              added_unique_keys: 0
            removed_unique_keys: 0
                       log_file:
       retain_artifacts_seconds: 86400
            postpone_completion: 0
       removed_unique_key_names:
dropped_no_default_column_names:
          expanded_column_names:
               revertible_notes:
               allow_concurrent: 0
                  reverted_uuid:
                        is_view: 0
              ready_to_complete: 1
      vitess_liveness_indicator: 1695222258
            user_throttle_ratio: 0
                   special_plan:

@olyazavr
Copy link
Contributor Author

Huh, I need to go back and retry my v18 migration then. I only have tablets in v18, not vtgates or vtctld, but that shouldn't matter because vreplication/onlineddl is all at the tablet level.

Out of curiosity, what MySQL version are you running? We're on 5.7 and from a few bug reports, it seems like some json-related pain is fixed in 8

@olyazavr
Copy link
Contributor Author

I'll also test out that workaround Rohit suggested

@shlomi-noach
Copy link
Contributor

I was testing with 8.0. Will next check with 5.7!

@olyazavr
Copy link
Contributor Author

Ok so I tested with mysql 8 and my migration worked, even the v14 version, and also Rohit's workaround works (with 5.7 and v14) - but would require us to change the schemas of everyone with json generated columns. I also tested changing the line that does set names binary to set names utf8mb4 in the table copy and that worked, but of course that probably has unintended side-effects (the comment explains that we need this because Tables may have varying character sets. To ship the bits without interpreting them we set the character set to be binary.).

So perhaps this is a mysql 5.7 + set names binary + json generated columns bug

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Sep 21, 2023

I can confirm the issue reproduces on 5.7:

> select * from _vt.schema_migrations order by id desc limit 1\G

                             id: 1
                 migration_uuid: 31085363_583a_11ee_b258_0a43f95f28a3
                       keyspace: commerce
                          shard: 0
                   mysql_schema: vt_commerce
                    mysql_table: orders
            migration_statement: alter table orders ENGINE INNODB
                       strategy: online
                        options: --postpone-completion
                added_timestamp: 2023-09-21 04:49:04
            requested_timestamp: 2023-09-21 04:49:05
                ready_timestamp: NULL
              started_timestamp: 2023-09-21 04:49:06
             liveness_timestamp: 2023-09-21 04:49:06
            completed_timestamp: 2023-09-21 04:49:09.949121
              cleanup_timestamp: NULL
               migration_status: failed
                       log_path:
                      artifacts: _31085363_583a_11ee_b258_0a43f95f28a3_20230921044905_vrepl,
                        retries: 0
                         tablet: zone1-0000000101
                 tablet_failure: 0
                       progress: 0
              migration_context: vtctl:31076797-583a-11ee-b258-0a43f95f28a3
                     ddl_action: alter
                        message: task error: failed inserting rows: Cannot create a JSON value from a string with CHARACTER SET 'binary'. (errno 3144) (sqlstate
22032) during query: insert into _31085363_583a_11ee_b258_0a43f95f28a3_20230921044905_vrepl(id,orderPaymentMethod) values (1,convert(JSON_OBJECT(_utf8mb4'custome
rId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4))
                    eta_seconds: -1
                    rows_copied: 0
                     table_rows: 1
              added_unique_keys: 0
            removed_unique_keys: 0
                       log_file:
       retain_artifacts_seconds: 86400
            postpone_completion: 1
       removed_unique_key_names:
dropped_no_default_column_names:
          expanded_column_names:
               revertible_notes:
               allow_concurrent: 0
                  reverted_uuid:
                        is_view: 0
              ready_to_complete: 0
      vitess_liveness_indicator: 1695271746

@shlomi-noach
Copy link
Contributor

It does look like vitess is doing all the necessary casting for the query to be successful:

convert(JSON_OBJECT(_utf8mb4'customerId', _utf8mb4'cus_Il7WwbIODMoaE1') using utf8mb4)

and the problem will not reproduce is there's no GENERATED column, and so it does look like a MySQL 5.7 issue rather than vreplication and I'd take @rohit-nayak-ps 's suggestion as the simplest workaround.

@olyazavr
Copy link
Contributor Author

I see, that would be quite difficult for us to implement because we'd have to run migrations (non-vitess) on every table that has a json column and a generated column (I don't know how many there are, but given we have over a thousand keyspaces, and teams can design schemas as they like, it's possible this is a large number).

Is there nothing else we could do? Perhaps a condition on the set names binary to be set names utf8mb4 if it's this json problem?

@shlomi-noach
Copy link
Contributor

Perhaps a condition on the set names binary to be set names utf8mb4 if it's this json problem?

Let me look into that, and build a test case.

@olyazavr
Copy link
Contributor Author

Heh I found another bug with JSON, though unrelated to this: #14089

@mattlord
Copy link
Contributor

Is this still an issue on main? I thought that we had fixed this, but perhaps not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug
Projects
Status: Backlog
Development

No branches or pull requests

6 participants