Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426

deepthi · 2023-06-30T14:25:58Z

Overview of the Issue

We upgraded a large database to v16 recently. During the rollout, errors were served to the app for ~30 seconds.
The root cause seems to be that the upgrade of _vt schema during PlannedReparent was blocked by semi-sync.

Reproduction Steps

Upgrade a 3+ tablet cluster with semi-sync enabled from v15 to v16.

Binary Version

16.0.0+

Operating System and Environment details

Any

Log Fragments

2023-06-26 20:33:46.339	
I0626 20:33:46.338969       1 replication.go:586] Setting semi-sync mode: primary=true, replica=true
2023-06-26 20:33:46.339	
I0626 20:33:46.339255       1 query.go:81] exec SET GLOBAL rpl_semi_sync_master_enabled = 1, GLOBAL rpl_semi_sync_slave_enabled = 1
2023-06-26 20:33:46.339	
I0626 20:33:46.339689       1 tm_state.go:186] Changing Tablet Type: PRIMARY for cell:"redacted" uid:redacted
2023-06-26 20:33:46.358	
I0626 20:33:46.357886       1 syslogger.go:129] <redacted> [tablet] updated
2023-06-26 20:33:46.371	
I0626 20:33:46.371122       1 sidecardb.go:408] Applying DDL for table views:
2023-06-26 20:33:46.371	
CREATE TABLE IF NOT EXISTS `_vt`.`views` (
2023-06-26 20:33:46.371	
	`TABLE_SCHEMA` varchar(64) NOT NULL,
2023-06-26 20:33:46.371	
	`TABLE_NAME` varchar(64) NOT NULL,
2023-06-26 20:33:46.371	
	`CREATE_STATEMENT` longtext NOT NULL,
2023-06-26 20:33:46.371	
	`UPDATED_AT` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
2023-06-26 20:33:46.371	
	PRIMARY KEY (`TABLE_SCHEMA`, `TABLE_NAME`)
2023-06-26 20:33:46.371	
) ENGINE InnoDB
2023-06-26 20:33:48.215	
I0626 20:33:48.215033       1 state_manager.go:682] Going unhealthy due to replication error: no replication status (errno 100) (sqlstate HY000)
2023-06-26 20:34:16.356	
I0626 20:34:16.356183       1 sidecardb.go:357] createSidecarDB: _vt

deepthi · 2023-06-30T14:27:07Z

The fix for this will need to be back ported to v16 and v17 and we'll need to do patch releases.

deepthi · 2023-06-30T14:30:31Z

I0626 20:33:46.371122       1 sidecardb.go:408] Applying DDL for table views:
I0626 20:34:16.356183       1 sidecardb.go:357] createSidecarDB: _vt

There is a 30 second gap here. What seems to have happened is that because we enable semi-sync before transitioning the tablet to primary, the creation of _vt schema gets blocked by semi-sync. We point replicas to the new primary only after the transition to primary so there is no tablet available to ACK the write. In the meantime, vtorc detects that the replicas are pointing to the wrong (old) primary, but can't do anything because of the shard lock being held by PRS. At the end of 30 seconds, the lock times out, vtorc fixes replication, and the DDL can proceed.

deepthi added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: Cluster management labels Jun 30, 2023

deepthi removed the Needs Triage This issue needs to be correctly labelled and triaged label Jun 30, 2023

deepthi mentioned this issue Jun 30, 2023

[release-17.0] Upgrade-Downgrade Fix: Schema-initialization stuck on semi-sync ACKs while upgrading #13411

Merged

4 tasks

This was referenced Jul 5, 2023

[main] Upgrade-Downgrade Fix: Schema-initialization stuck on semi-sync ACKs while upgrading (#13411) #13440

Merged

[release-16.0] Upgrade-Downgrade Fix: Schema-initialization stuck on semi-sync ACKs while upgrading (#13411) #13441

Merged

frouioui closed this as completed in #13440 Jul 5, 2023

This was referenced Jul 26, 2023

Release of v15.0.4 #13613

Closed

Release of v16.0.3 #13612

Closed

Release of v17.0.1 #13611

Closed

Update known issues in v16.x and v17.0.0 #13618

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426

Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426

deepthi commented Jun 30, 2023 •

edited

Loading

deepthi commented Jun 30, 2023

deepthi commented Jun 30, 2023

Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426

Bug Report: upgrade to v16 from v15 blocks on semi-sync #13426

Comments

deepthi commented Jun 30, 2023 • edited Loading

Overview of the Issue

Reproduction Steps

Binary Version

Operating System and Environment details

Log Fragments

deepthi commented Jun 30, 2023

deepthi commented Jun 30, 2023

deepthi commented Jun 30, 2023 •

edited

Loading