TIMESTAMP type cannot represent seconds representable in Spark #9904

NEUpanning · 2024-05-23T07:03:23Z

Bug description

Presto's Timestamp is stored in one 64-bit signed integer for milliseconds, so TIMESTAMP type limits the range of seconds in [INT64_MIN/1000 - 1, INT64_MAX/1000]. Spark's Timestamp is stored in 64-bit signed integer for seconds, so the range of seconds in [INT64_MIN- 1, INT64_MAX]. Therefore, TIMESTAMP type cannot represent seconds representable in Spark.

System information

Relevant logs

No response

rui-mo · 2024-05-23T07:14:48Z

@NEUpanning Thanks for opening an issue. I believe Spark's timestamp is in microsecond unit, see https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala#L23. But the issue in the test of from_unixtime is that it creates a timestamp with INT64_MAX. As for Spark, the check should be enough as Spark stores int64_t value as microseconds.

rui-mo · 2024-05-23T07:44:55Z

The maximum seconds of a valid timestamp in Spark is INT64_MAX / 10^6, which is under the limit in Velox for maximum seconds (INT64_MAX / 10^3).

fromUnixTime(std::numeric_limits<int64_t>::max(), "yyyy-MM-dd HH:mm:ss")

But in the function from_unixtime, timestamp could be created with a larger number of seconds which can cause check failure in debug mode. I wonder if we need to fix that in the function from_unixtime by avoiding creating timestamp if the input exceeds INT64_MAX / 10^6, instead of removing the check.
@mbasmanova How do you think? Thanks!

NEUpanning · 2024-05-23T08:16:24Z

@rui-mo

@NEUpanning Thanks for opening an issue. I believe Spark's timestamp is in microsecond unit, see https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala#L23. But the issue in the test of from_unixtime is that it creates a timestamp with INT64_MAX. As for Spark, the check should be enough as Spark stores int64_t value as microseconds.

You are right. Spark's timestamp isn't exceed the range of seconds in [INT64_MIN/1000 - 1, INT64_MAX/1000]. In the function from_unixtime, argument unix_time can be represented by LongType or DecimalType etc. as seconds, so its range exceeds Velox timestamp type's range limit.

mbasmanova · 2024-05-23T11:51:13Z

I wonder if we need to fix that in the function from_unixtime by avoiding creating timestamp if the input exceeds INT64_MAX / 10^6, instead of removing the check.

@rui-mo I'm thinking the same.

NEUpanning added bug Something isn't working triage Newly created issue that needs attention. labels May 23, 2024

NEUpanning mentioned this issue May 23, 2024

Add allowOverflow flag to Timestamp::toTimezone #9836

Closed

rui-mo mentioned this issue Aug 1, 2024

Support overflow in Timestamp::toTimeZone method #10641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TIMESTAMP type cannot represent seconds representable in Spark #9904

TIMESTAMP type cannot represent seconds representable in Spark #9904

NEUpanning commented May 23, 2024

rui-mo commented May 23, 2024 •

edited

Loading

rui-mo commented May 23, 2024

NEUpanning commented May 23, 2024 •

edited

Loading

mbasmanova commented May 23, 2024

TIMESTAMP type cannot represent seconds representable in Spark #9904

TIMESTAMP type cannot represent seconds representable in Spark #9904

Comments

NEUpanning commented May 23, 2024

Bug description

System information

Relevant logs

rui-mo commented May 23, 2024 • edited Loading

rui-mo commented May 23, 2024

NEUpanning commented May 23, 2024 • edited Loading

mbasmanova commented May 23, 2024

rui-mo commented May 23, 2024 •

edited

Loading

NEUpanning commented May 23, 2024 •

edited

Loading