[SPARK-48776] Fix timestamp formatting for json, xml and csv #47177

milastdbx · 2024-07-02T12:11:57Z

What changes were proposed in this pull request?

In this pull request i propose to change default ISO pattern we use for formatting timestamps when we are writing to json,xml and/or csv as well as when to_(xml|json|csv) is used.

Older timestamps sometimes have offsets that contain seconds part as well. Current default formatting used is omitting seconds hence providing wrong results.

e.g.

sql("SET spark.sql.session.timeZone=America/Los_Angeles")
sql("SELECT to_json(struct(CAST('1800-01-01T00:00:00+00:00' AS TIMESTAMP) AS ts))").show(false)
{"ts":"1799-12-31T16:07:02.000-07:52"}

Why are the changes needed?

This is correctness issue.

Does this PR introduce any user-facing change?

Yes, users will now see different results for older timestamps (correct ones).

How was this patch tested?

Tests

Was this patch authored or co-authored using generative AI tooling?

No

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala

cloud-fan · 2024-07-08T13:26:15Z

thanks, merging to master!

### What changes were proposed in this pull request? In this pull request i propose to change default ISO pattern we use for formatting timestamps when we are writing to json,xml and/or csv as well as when to_(xml|json|csv) is used. Older timestamps sometimes have offsets that contain seconds part as well. Current default formatting used is omitting seconds hence providing wrong results. e.g. ``` sql("SET spark.sql.session.timeZone=America/Los_Angeles") sql("SELECT to_json(struct(CAST('1800-01-01T00:00:00+00:00' AS TIMESTAMP) AS ts))").show(false) {"ts":"1799-12-31T16:07:02.000-07:52"} ``` ### Why are the changes needed? This is correctness issue. ### Does this PR introduce _any_ user-facing change? Yes, users will now see different results for older timestamps (correct ones). ### How was this patch tested? Tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47177 from milastdbx/dev/milast/fixJsonTimestampHandling. Authored-by: milastdbx <milan.stefanovic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

fix json timestamp handling

4befa07

github-actions bot added the SQL label Jul 2, 2024

milastdbx changed the title ~~Fix timestamp formatting for json, xml and csv~~ [SPARK-48776] Fix timestamp formatting for json, xml and csv Jul 2, 2024

cloud-fan approved these changes Jul 2, 2024

View reviewed changes

allisonwang-db reviewed Jul 2, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala Outdated Show resolved Hide resolved

HyukjinKwon approved these changes Jul 3, 2024

View reviewed changes

milastdbx added 3 commits July 3, 2024 17:05

scalastyle fixes

f7eee07

nitfixies

a35d8cd

fix tests, add forgotten files

2033b62

cloud-fan closed this in c4085f1 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48776] Fix timestamp formatting for json, xml and csv #47177

[SPARK-48776] Fix timestamp formatting for json, xml and csv #47177

milastdbx commented Jul 2, 2024

cloud-fan commented Jul 8, 2024

[SPARK-48776] Fix timestamp formatting for json, xml and csv #47177

[SPARK-48776] Fix timestamp formatting for json, xml and csv #47177

Conversation

milastdbx commented Jul 2, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

cloud-fan commented Jul 8, 2024