-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map MySQL TIMESTAMP types to TIMESTAMP WITH TIME ZONE #18470
Map MySQL TIMESTAMP types to TIMESTAMP WITH TIME ZONE #18470
Conversation
411146f
to
062a6ed
Compare
return new LongWriteFunction() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, formatter did its thing and I missed it on a rebase. I'll revert.
plugin/trino-mysql/src/test/java/io/trino/plugin/mysql/BaseMySqlConnectorTest.java
Show resolved
Hide resolved
/** | ||
* Default MySQL time zone is set to UTC. This is to test the date and time type mappings when the server has a different time zone. | ||
*/ | ||
public class TestMySqlTimeMappingsWithServerTimeZone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a way to combine this with existing test class anyway since the server zone needs to be changed and if done on the fly it might lead to test isolation issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wendigo @dominikzalewski look at this specifically - I couldn't think of solutions but maybe there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar, is this a problem? Usually in other projects I was participating in, we tended to keep line numbers on 3 digits. It wasn't an error to have more than one test class for a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is that if you add additional cases to the other class how do you ensure it gets tested in this class as well?
A simple solution is adding a comment on the original class.
I'm just wondering if someone has other ideas. If no ideas then what we have right now is fine.
// after epoch (MySQL's timestamp type doesn't support values <= epoch) | ||
.addRoundTrip("timestamp(3) WITH TIME ZONE", "TIMESTAMP '2019-03-18 10:01:17.987 %s'".formatted(sessionZone), createTimestampWithTimeZoneType(3), "TIMESTAMP '2019-03-18 10:01:17.987 UTC'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a separate test method verifying that the failure happens as expected (e..g it's a good user facing error).
.addRoundTrip("timestamp(1) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.9 %s'".formatted(sessionZone), createTimestampWithTimeZoneType(1), "TIMESTAMP '2020-09-27 12:34:56.9 UTC'") | ||
.addRoundTrip("timestamp(3) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.123 %s'".formatted(sessionZone), createTimestampWithTimeZoneType(3), "TIMESTAMP '2020-09-27 12:34:56.123 UTC'") | ||
.addRoundTrip("timestamp(3) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.999 %s'".formatted(sessionZone), createTimestampWithTimeZoneType(3), "TIMESTAMP '2020-09-27 12:34:56.999 UTC'") | ||
.addRoundTrip("timestamp(6) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.123456 %s'".formatted(sessionZone), createTimestampWithTimeZoneType(6), "TIMESTAMP '2020-09-27 12:34:56.123456 UTC'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also test values where rounding needs to be performed - important to test since write path is different for CTAS and INSERT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added testTimestampWithTimeZoneCoercion
, a copy of testTimestampCoercion
but slightly modified to to use values that are within range of MySQL TIMESTAMP
062a6ed
to
90fe963
Compare
I added a test case to check the unsupported timestamptz values. It passes just fine when the server time zone is set to |
Looking into the failing test case more. I ran the test case with the MySQL TZ set to Trino executes:
When sent to the JDBC client, the On the server side, running the mysql CLI inside the Docker container:
MySQL seems to be acting appropriately and rejecting the values after converting the given values from the session time zone to UTC. This seems like possibly a JDBC driver issue (or how we are using the JDBC driver). |
well, let's add validation like the ClickHouse connector does within the connector code itself. We can follow-up to see if we're using the driver wrong or it's just a driver bug. |
90fe963
to
1996d18
Compare
FYI @findepi @anusudarsan since you were interested in this. |
public class TrinoToMySqlWriteChecker<T> | ||
{ | ||
// The range for TIMESTAMP values is '1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.499999' | ||
public static final TrinoToMySqlWriteChecker<Instant> TIMESTAMP = new TrinoToMySqlWriteChecker<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name 'TIMESTAMP' something that has been overlooked, or a thought through variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also seem to be using guice modules. Wouldn't it be cleaner if we configured a binding 'toInstance' in guice and injected it, rather than rely on statically initialized fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is thought through - see the point of usage - TIMESTAMP.validate(value)
.
Note that this class is "derived" from https://github.com/trinodb/trino/blob/5d8824edf9101eab451128bbad53d8c6ce930ad4/plugin/trino-clickhouse/src/main/java/io/trino/plugin/clickhouse/TrinoToClickHouseWriteChecker.java.
And this class should go away if/when the MySQL JDBC driver bug gets fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjshook Does mysql.md (the docs) need to be updated as well to mention the new mapping?
{ | ||
// The range for TIMESTAMP values is '1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.499999' | ||
public static final TrinoToMySqlWriteChecker<Instant> TIMESTAMP = new TrinoToMySqlWriteChecker<>( | ||
new TimestampWriteValueChecker(new Range<>(Instant.parse("1970-01-01T00:00:01.000000Z"), Instant.parse("2038-01-19T03:14:07.499999Z")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 1970-01-01T00:00:00.000000Z?
https://en.wikipedia.org/wiki/Unix_time
Unix time is currently defined as the number of seconds which have passed since 00:00:00 UTC on Thursday, 1 January 1970
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, this should proably be 1970-01-01T00:00:00.000001Z
- i.e. epoch + min delta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the MySQL documentation:
The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC.
mysql> insert into test_unsupported_timestamp values ('1970-01-01 00:00:00');
ERROR 1292 (22007): Incorrect datetime value: '1970-01-01 00:00:00' for column 'data' at row 1
mysql> insert into test_unsupported_timestamp values ('1970-01-01 00:00:01');
Query OK, 1 row affected (0.01 sec)
mysql> select * from test_unsupported_timestamp;
+---------------------+
| data |
+---------------------+
| 1970-01-01 00:00:01 |
+---------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this is a problem or not, but I'll mention it 'just in case'.
Back a few years ago I was working with a JSON interface of an HTTP endpoint. I was surprised to see that I could convert a negative long passed in JSON into Javascript date to represent dates before epoch. We used that for birth dates just like described here: https://www.epochconverter.com/programming/mysql (paragraph 'Negative epochs').
Now, as the article reads, mysql has a problem representing timestamps before epoch. However, we have a 'converter' at hand that is translating trino datatype to particular database datatype. And that trino datatype just might support negative values. Currently the test throws an exception if the timestamp is out of range, for instance if it's before epoch. I'm just wondering whether that's the behaviour we want to have. Maybe we somehow want to emulate negatives (ie. before epoch) timestamps.
Treat it as a general comment that can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. That makes such "emulated" values only readable by Trino. It's a form of vendor lock in.
All connectors have tests (Test*TypeMapping) which ensure that values written by trino round trip when read directly from the db and the other way around as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjshook please add a code comment about the min value to make it more obvious
@Override | ||
public void validate(Instant value) | ||
{ | ||
if (value.isBefore(range.getMin()) || value.isAfter(range.getMax())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places in the project (like io.trino.plugin.jdbc.JdbcMetadataConfig), I see we seem to be using a validation framework, for instance jakarta.validation.constraints.Min. Is there a reason (performance?) we cannot use it here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performance, jakarta validation does too much work. It's more useful for user-facing validation than runtime checks.
} | ||
} | ||
|
||
private static class Range<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a range class in apache commons:
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/Range.html
That already has most of the logic introduced here. I see that we have commons in the master pom.xml:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.13.0</version>
</dependency>
Is there a reason why we cannot use it in plugins?
import static java.lang.String.format; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class TrinoToMySqlWriteChecker<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the complexity introduced with these abstractions. The only two production usages of the validate() method I can see is that we pass an Instance and the code checks under the covers if it's within epoch's min and max values. Do we really need to introduce private interface, private class, another private class for Range in order to do that?
It seems that the epoch min and epoch max are hardcoded anyway. Isn't it the case that the only thing we need is a
private static boolean validate(Instant i) {
return i >= Instant.parse("1970-01-01T00:00:00.000000Z") && Instant.parse("2038-01-19T03:14:07.499999Z")
}
(or something like above that compiles or throws an exception instead of returning a boolean) in MySqlClient class?
I would understand wrapping into classes, if the code was unit-tested. But it does not look like there's a unit test for this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, we only have the TIMESTAMP
validator here. ClickHouse on the other hand (where this class originates from) has 6 validators, each of them having different "rules" depending on the ClickHouse version we are connecting to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this could be simplified into a static validateMySqlTimestampValue
function in MySqlClient
.
throw new TrinoException(NOT_SUPPORTED, "This connector does not support creating tables with column comment"); | ||
} | ||
|
||
return "%s %s %s" .formatted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the space after the string here is intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for some reason my IntelliJ formatter put that there. I'll fix it.
@@ -469,6 +501,13 @@ public Optional<ColumnMapping> toColumnMapping(ConnectorSession session, Connect | |||
return Optional.of(jsonColumnMapping()); | |||
case "enum": | |||
return Optional.of(defaultVarcharColumnMapping(typeHandle.getRequiredColumnSize(), false)); | |||
case "datetime": | |||
TimestampType timestampType = createTimestampType(getTimestampPrecision(typeHandle.getRequiredColumnSize())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing this logic into a separate private method that would be named like 'mysqlDateTimeToTrinoTimestamp' would make it more explicit for the reader that this is what's going on here.
timestampType, | ||
mySqlTimestampReadFunction(timestampType), | ||
timestampWriteFunction(timestampType))); | ||
TimestampWithTimeZoneType trinoType = createTimestampWithTimeZoneType(getTimestampPrecision(typeHandle.getRequiredColumnSize())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here 'mysqlTimestampToTrinoTimestampWithTZ'
OffsetDateTime offsetDateTime = resultSet.getObject(columnIndex, OffsetDateTime.class); | ||
return LongTimestampWithTimeZone.fromEpochSecondsAndFraction( | ||
offsetDateTime.toEpochSecond(), | ||
(long) offsetDateTime.getNano() * PICOSECONDS_PER_NANOSECOND, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for super-easy reading maybe wrap this computation into a private method called 'nanos2pico'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -592,6 +636,50 @@ private static ColumnMapping mySqlCharColumnMapping(CharType charType, Optional< | |||
return ColumnMapping.sliceMapping(charType, charReadFunction(charType), charWriteFunction(), pushdownController); | |||
} | |||
|
|||
private static LongReadFunction shortTimestampWithTimeZoneReadFunction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether the 'short' and 'long' prefix in the following 4 methods is not misleading. The reason being that the returned type has sometimes 'Long' in the name, so I had to read it 4 times and browse deep into the code to understand why is that. Maybe there is a better wording like for instance 'simpleTimestamp'/'complexTimestamp' or 'regularTimestamp'/'extendedTimestamp'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #18470 (comment) -- similar answer for why they are named the way they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is even more than that. The trino classes which implement the type are called ShortTimestampType and LongTimestampType and that's where the name comes from.
return ObjectWriteFunction.of( | ||
LongTimestampWithTimeZone.class, | ||
(statement, index, value) -> { | ||
long epochSeconds = floorDiv(value.getEpochMillis(), MILLISECONDS_PER_SECOND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would wrap it into a private method called 'epochMillis2Seconds'
and the below into 'timestamp2nanosDelta'.
Then I'd inline it:
Instant instantValue = Instant.ofEpochSecond(epochMillis2Seconds(value), timestamp2nanosDelta(value))
Yes! I'll add that as a separate commit when making other updates to this PR and we'll need a note in the release notes as well regarding the change. |
@Test(dataProvider = "sessionZonesDataProvider") | ||
public void testDate(ZoneId sessionZone) | ||
{ | ||
Session session = Session.builder(getSession()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this snippet is repeated in pratically every test. Do you think there's value in promoting it to an instance variable and initializing in a setup() method?
In the PR description it reads:
Due to the size of the change (on test side), it's hard for me to verify whether this condition is met, or whether we need the backward compatibility test for that, etc. Having written that I just want to make sure that this has been thought through, even if my ask is 'doubled effort' since this has already been addressed. |
I'm trying to dig in to why we need this checker in the first place. I find it a bit unsettling that you insert a
|
|
||
SqlDataTypeTest.create() | ||
.addRoundTrip("date", "DATE '0001-01-01'", DATE, "DATE '0001-01-01'") | ||
.addRoundTrip("date", "DATE '1582-10-04'", DATE, "DATE '1582-10-04'") // before julian->gregorian switch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be beneficial to have methods named 'beforeJulianGregorianSwitch' in SqlDataTypeTest class. This was the code would document itself, would be more readable and maintainable (as comment would be unnecessary)
} | ||
|
||
@Test(dataProvider = "sessionZonesDataProvider") | ||
public void testTimeFromMySql(ZoneId sessionZone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this test should be named 'testPrecision'?
|
||
SqlDataTypeTest.create() | ||
// default precision in MySQL is 0 | ||
.addRoundTrip("TIME", "TIME '00:00:00'", createTimeType(0), "TIME '00:00:00'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These literals seem to be repeated throughout the test. Do you think there's value extracting them to a constant and reusing?
From here solution 2b seems to be the one that is working best. Setting the below properties results in the value stored in MySQL to accurately reflect the value inserted via Trino. I've removed code that validates the timestamptz values since MySQL is returning an expected error now.
One thing to note is the |
1996d18
to
8cd1e31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr / @findepi / @anusudarsan can one of you also please take a look?
@@ -208,6 +225,10 @@ public class MySqlClient | |||
// MySQL driver returns width of time types instead of precision, same as the above timestamp type. | |||
private static final int ZERO_PRECISION_TIME_COLUMN_SIZE = 8; | |||
|
|||
// MySQL TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC | |||
private static final Instant MYSQL_TIMESTAMP_MIN_VALUE = Instant.parse("1970-01-01T00:00:01.000000Z"); | |||
private static final Instant MYSQL_TIMESTAMP_MAX_VALUE = Instant.parse("2038-01-19T03:14:07.499999Z"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that .49999
is confusing me, can you explain me why? It's > 03:14:07 UTC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following paragraph from here (emphasis mine) refers to the fractional seconds. Kind of misleading as the previous paragraph says the maximum is '2038-01-19 03:14:07' UTC
A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision. In particular, any fractional part in a value inserted into a DATETIME or TIMESTAMP column is stored rather than discarded. With the fractional part included, the format for these values is 'YYYY-MM-DD hh:mm:ss[.fraction]', the range for DATETIME values is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.499999', and the range for TIMESTAMP values is '1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.499999'. The fractional part should always be separated from the rest of the time by a decimal point; no other fractional seconds delimiter is recognized.
// TODO (https://github.com/trinodb/trino/issues/15668) rethink how timestamps are mapped. Also, probably worth adding tests | ||
// with MySQL server with a non-UTC system zone. | ||
connectionProperties.setProperty("connectionTimeZone", "UTC"); | ||
// Force the connection time zone of that to the server to preserve time zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds opposite of what MySQL docs say:
according to docs: connectionTimeZone = LOCAL
means that JDBC driver uses the JVM zone as the session zone.
forceConnectionTimeZoneToSession = true
means that the server side connection zone is changed to match local JVM zone.
Now in addition to this I started thinking about whether forceConnectionTimeZoneToSession = true
can lead to failure if the JVM zone doesn't exist on the server. This probably explains the existing comment about without relying on server time zone (which may be configured to be totally unusable).
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #15670 as an example of an issue I'm thinking about but in reverse direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. Comment is now backwards; I didn't update it from a previous attempt to get this to work. I'll update it to reflect what is in the documentation.
8cd1e31
to
41cef73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
We can address any post-merge comments from @anusudarsan/@findepi if they take a look.
Description
This PR changes the MySQL type mappings for MySQL's
TIMESTAMP
type to Trino'sTIMESTAMP WITH TIME ZONE
type. Note that MySQLDATETIME
types are read asTIMESTAMP
types and TrinoTIMESTAMP
types are inserted into MySQL asDATETIME
types -- this behavior is unchanged.In other words, for reads and writes:
Additional context and related issues
Fixes #15668
TIMESTAMP
type to map to TrinoTIMESTAMP WITH TIME ZONE
getColumnDefinitionSql
to explicitly declare a column asNULL
to satisfy anInvalid default value
failure in the data mapping smoke tests when creating a table using a MySQLTIMESTAMP
typePacific/Apia
(and therefore the time zone of the MySQL server)Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text: