Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fractional seconds are lost #103

Closed
rhauch opened this issue Jul 19, 2016 · 6 comments
Closed

Fractional seconds are lost #103

rhauch opened this issue Jul 19, 2016 · 6 comments

Comments

@rhauch
Copy link
Contributor

rhauch commented Jul 19, 2016

I'm using 0.3.1, and think there is an issue with how the library is deserializing fractional seconds. In particular, the AbstractRowsEventDataDeserializer.getFractionalSeconds(...) method is defined as follows:

   private int getFractionalSeconds(int meta, ByteArrayInputStream inputStream) throws IOException {
        int fractionalSecondsStorageSize = getFractionalSecondsStorageSize(meta);
        if (fractionalSecondsStorageSize > 0) {
            long fractionalSeconds = bigEndianLong(inputStream.read(fractionalSecondsStorageSize), 0,
                    fractionalSecondsStorageSize);
            if (meta % 2 == 1) {
                fractionalSeconds /= 10;
            }
            return (int) (fractionalSeconds / 1000);
        }
        return 0;
    }

Here, meta is the metadata for the column, which for temporal types is the fractional seconds part, or fsp, as described in the MySQL documentation.

Consider an example table with this definition (taken from the MySQL documentation):

CREATE TABLE fractest (
  c1 TIME(2),
  c2 DATETIME(2),
  c3 TIMESTAMP(2)
);

and the following insert statement:

INSERT INTO fractest VALUES ('17:51:04.777', '2014-09-08 17:51:04.777', '2014-09-08 17:51:04.777');

Here, the fsp -- and thus the value of meta -- for all three columns is 2. The value of the fractionalSeconds variable read from the stream is then 78 (since rounding occurs within the database). However, the following lines in the method:

            if (meta % 2 == 1) {
                fractionalSeconds /= 10;
            }
            return (int) (fractionalSeconds / 1000);

seem to not make any sense, because the result will always be 0 unless fsp is 4, 5 or 6, and the result will always be incorrect.

If the function is to return the number of milliseconds, then it seems like the code should simply multiply by the correct factor of 10 (based on the number of digits in the precision, or meta):

            int factor = (3 - meta) * 10;
            return fractionalSeconds * factor;

So for fractionalSeconds=78 and meta=2, then factor=10 and the method return 780, which is the correct number of milliseconds.

For an fsp of 1, meta=1' andfractionalSeconds=8` (assuming MySQL rounded), then the method would return 800, which is the correct number of milliseconds.

Now, MySQL fsp values can be integers between 1 and 6, so the above code needs to amended to handle meta values that are greater than 3. A simple way to do that would be to change the factor to use exponentials, but a more efficient way is to do the following:

            if ( meta < 3 ) {
                int factor = (3 - meta) * 10;
                return fractionalSeconds * factor;
            }
            if ( meta > 3 ) {
                int divisor = (meta - 3) * 10;
                return (int) (fractionalSeconds / divisor);
            }
            return fractionalSeconds;

I'm sure there are other ways this could be written, but the above code uses simple integer math and is at least correct.

Note that this same problem appears in the master branch, too.

@rhauch
Copy link
Contributor Author

rhauch commented Jul 19, 2016

Would you be willing to release a 0.3.2 with this fix? If not, when do you think you might release the library with this fix?

@rhauch
Copy link
Contributor Author

rhauch commented Jul 19, 2016

Also, if these methods were not private I would be able to fix this (at least temporarily) in Debezium by subclassing and overriding the method. Any chance of converting methods like these to protected?

@shyiko
Copy link
Owner

shyiko commented Jul 19, 2016

Hi @rhauch

Would you be willing to release a 0.3.2 with this fix?

Of course, I'll do it right after work.

Also, if these methods were not private I would be able to fix this (at least temporarily) in Debezium by subclassing and overriding the method. Any chance of converting methods like these to protected?

All deserialize* are protected in master (though getFractionalSeconds is not). You can always register your own *RowsEventDataDeserializer if something is terribly wrong in the upstream but I usually try fixing critical stuff like this within 24 hours (provided it's clear where the problem is) so a ticket is all that is needed for a new release to happen.

Anyway, thank you for the bug report (and all the details)! I'll let you know when 0.3.2 is in Maven Central.

@rhauch
Copy link
Contributor Author

rhauch commented Jul 19, 2016

@shyiko, sounds great. Thank you very much!

All deserialize* are protected in master (though getFractionalSeconds is not). You can always register your own *RowsEventDataDeserializer if something is terribly wrong in the upstream but I usually try fixing critical stuff like this within 24 hours (provided it's clear where the problem is) so a ticket is all that is needed for a new release to happen.

I actually looked at overriding, but because of the private method it would have required copying a lot of code. Making all the methods protected would be great for situations exactly like this, and would reduce the need for a patch release.

BTW, please double-check my logic, too.

@shyiko
Copy link
Owner

shyiko commented Jul 19, 2016

Making all the methods protected would be great for situations exactly like this, and would reduce the need for a patch release.

I understand. The reason I'm reluctant to marking all of them protected is that it's basically the same as public as far as backward-compatibility is concerned. Signatures have to stay the same, otherwise users may have to do a little bit more than just a simple version bump to upgrade. Anyhow, I agree that it could potentially help in situations like this and so I'll try to address this as part of 1.0.0.

BTW, please double-check my logic, too.

You can count on it :)

@shyiko
Copy link
Owner

shyiko commented Jul 20, 2016

0.3.2 deployed to Maven Central. Cheers 🍻

@shyiko shyiko closed this as completed Jul 20, 2016
rhauch added a commit to rhauch/debezium that referenced this issue Jul 20, 2016
Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as shyiko/mysql-binlog-connector-java#103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us.

During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.
rhauch added a commit to rhauch/debezium that referenced this issue Jul 25, 2016
Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as shyiko/mysql-binlog-connector-java#103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us.

During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.
aprakash-work pushed a commit to fivetran/mysql-binlog-connector-java-osheroff that referenced this issue Jul 17, 2023
Fix circleci build, in the integration test suite multiple events could be in the binary log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants