don't commit GTID to GtidSet until we hit COMMIT #250

osheroff · 2018-12-06T00:45:59Z

Been working on zendesk/maxwell#1129, the tl;dr of which is that a maxwell user has a GTID-enabled connection to the mysql master that seems to drop fairly frequently. If this connection drops in the middle of processing a transaction, maxwell will attempt to reconnect the binlog-connector, but the binlog-connector's GTID position has already incremented, leading to data loss.

I tried to do this all maxwell-side, but it got a little ugly; maxwell had to maintain the current GTID string and then splice it into the GTID string if we crash in the middle of processing a transaction. It's possible, but then I had a think and figured that this approach probably makes more sense for 99% of binlog-connector use cases. I think. LMK what you think.

cheers,
osheroff

otherwise, clients that disconnect in the middle of a GTID transaction will be pointing at an incorrect binlog position.

shyiko

we'll need to ensureEventDataDeserializer(EventType.QUERY, ...) @ https://github.com/shyiko/mysql-binlog-connector-java/blob/master/src/main/java/com/github/shyiko/mysql/binlog/BinaryLogClient.java#L554
CI build failed
an integration test would be nice

shyiko · 2018-12-06T03:33:56Z

src/main/java/com/github/shyiko/mysql/binlog/BinaryLogClient.java

-                    gtidSet.add(gtidEventData.getGtid());
+            GtidEventData gtidEventData = (GtidEventData) unwrapEventData(event.getData());
+            currentGtid = gtidEventData.getGtid();
+        } else if ( gtidSet != null ) {


gtidSet is no longer protected by gtidSetAccessLock

it is, in addGtidToSet

shyiko · 2018-12-06T03:56:27Z

Hey Ben.
Just to make sure I understand this correctly.
If I try reconnecting in the middle of transaction when GTID is on I'm not going to get the remaining events in that transaction / only the next one? This sounds weird (inconsistent with the behavior when GTID turned off) but believable (MariaDB case - #53).

osheroff · 2018-12-06T19:51:36Z

Stanley,
yeah, you've got it - it's more or less the same as the mariadb issue in that w/ GTID you're not allowed to index straight into the middle of a transaction, but instead are constrained to the boundaries.

I'll make the fixes you ask for.

osheroff · 2018-12-08T01:14:53Z

bah. @shyiko I'm stuck.

have managed to get the vagrant VM into GTID mode, and even writing these tests has forced me to re-work the code, but now I'm stuck with the tests.

In the tests I'm calling

                eventListener.waitFor(XidEventData.class, 1, TimeUnit.SECONDS.toMillis(4));

but for some damn reason this always seems to return before the actual event gets logged, and before the action that I'm waiting for. My expectation is that this test-code should return after updateGtidSet is called, but it's not the truth.

Am I mis-understanding how to use the test suite? I think I must be.

osheroff · 2018-12-08T11:23:47Z

nevermind, I'm an idiot and didn't see there were two client instances. rubber duck debugging wins again.

osheroff · 2018-12-08T11:39:53Z

btw for my integration tests I use https://github.com/osheroff/onetimeserver, which can quickly bring up a mysql server in whatever configuration you want, and at the end of the test we throw the whole server away. It's been generally less headache for me than maintaining vagrant and trying to reset the server back into a good state before the next test. any interest in trying that?

was confused about client vs clientKeepAlive.

shyiko · 2018-12-10T05:09:41Z

Vagrant is being phased out by docker/docker-compose (specifically by #238). https://github.com/osheroff/onetimeserver does look interesting but I feel like switching to docker should be enough here.

Anyway, thank you for yet another PR, Ben!
Merging in!

osheroff · 2018-12-13T22:31:18Z

thanks for the merge! any chance of a release here?

shyiko · 2018-12-17T18:45:57Z

Absolutely. I'll try to publish a new release soon (over the next couple of days).

osheroff · 2019-01-02T05:31:04Z

hey @shyiko imma bugging you for a release in the new year

shyiko · 2019-01-07T09:13:11Z

0.17.0 is finally here 🐌

osheroff · 2019-01-11T04:09:58Z

that's a lovely snail you've got there, stanley. I've got my own snail-like PR going.

BTW, I ran into a weird gotcha while doing reconnect logic; check out
https://github.com/zendesk/maxwell/pull/1186/files#diff-6c0364725d8f72f2d70f68c6c6115747R343...

Basically in order to reconnect a GTID-enabled binlog connector using GTID positioning instead of file/offset positioning, I have to clear out the filename and position that's stored. I'm not utterly sure what the right thing to do when reconnecting a GTID client is... using file/offset could be more accurate, but if you're connecting to the master through a VIP and the reason your connection got dropped is that the master changed, using GTID positioning would be better.

In my case, a Maxwell user (going to the master via kubernetes networking) would get disconnects that happened right in the middle of a GTID transaction. When that happened and we tried to reconnect with file/offset positioning all hell would break loose.

Anyway, not sure if you want to do something about it but I thought I'd let ya know.

Happy new year!
-ben

shyiko · 2019-01-11T04:30:13Z

Heh :) Happy New Year to you too, Ben 🎄

Thanks for pointing that out. It kinda makes sense (if you have multiple MySQL pods accessible through the Service client might end up connecting to a different pod on "reconnect"). This needs to be fixed on mysql-binlog-connector-java side (issue - #254).

don't commit GTID to GtidSet until we hit COMMIT

638b586

otherwise, clients that disconnect in the middle of a GTID transaction will be pointing at an incorrect binlog position.

shyiko requested changes Dec 6, 2018

View reviewed changes

osheroff added 3 commits December 6, 2018 11:54

ensure query event deserializer

0e3e607

add missing imports

4a87dc7

satisfy checkStyle

4bf7215

osheroff mentioned this pull request Dec 7, 2018

Unhandled QueryEvent inside transaction zendesk/maxwell#1129

Closed

checkpoint work around writing integration tests

7a645a0

osheroff added 2 commits December 8, 2018 03:47

fix integration tests

f6f608e

was confused about client vs clientKeepAlive.

satisfy checkstyle

7e42011

shyiko approved these changes Dec 10, 2018

View reviewed changes

shyiko merged commit ddaabfd into shyiko:master Dec 10, 2018

shyiko pushed a commit that referenced this pull request Jan 7, 2019

#250 follow up

cdc4a1f

osheroff mentioned this pull request Jan 10, 2019

gtid-reconnects zendesk/maxwell#1186

Merged

shyiko mentioned this pull request Jan 11, 2019

Do not track binlogFilename/binlogPosition when GtidSet is set #254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't commit GTID to GtidSet until we hit COMMIT #250

don't commit GTID to GtidSet until we hit COMMIT #250

osheroff commented Dec 6, 2018

shyiko left a comment

shyiko Dec 6, 2018

osheroff Dec 6, 2018

shyiko commented Dec 6, 2018

osheroff commented Dec 6, 2018

osheroff commented Dec 8, 2018

osheroff commented Dec 8, 2018

osheroff commented Dec 8, 2018 •

edited

Loading

shyiko commented Dec 10, 2018

osheroff commented Dec 13, 2018

shyiko commented Dec 17, 2018

osheroff commented Jan 2, 2019

shyiko commented Jan 7, 2019

osheroff commented Jan 11, 2019

shyiko commented Jan 11, 2019

don't commit GTID to GtidSet until we hit COMMIT #250

don't commit GTID to GtidSet until we hit COMMIT #250

Conversation

osheroff commented Dec 6, 2018

shyiko left a comment

Choose a reason for hiding this comment

shyiko Dec 6, 2018

Choose a reason for hiding this comment

osheroff Dec 6, 2018

Choose a reason for hiding this comment

shyiko commented Dec 6, 2018

osheroff commented Dec 6, 2018

osheroff commented Dec 8, 2018

osheroff commented Dec 8, 2018

osheroff commented Dec 8, 2018 • edited Loading

shyiko commented Dec 10, 2018

osheroff commented Dec 13, 2018

shyiko commented Dec 17, 2018

osheroff commented Jan 2, 2019

shyiko commented Jan 7, 2019

osheroff commented Jan 11, 2019

shyiko commented Jan 11, 2019

osheroff commented Dec 8, 2018 •

edited

Loading