-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SELECT ... ORDER BY query fails on data with int64 timestamp and timezone field #959
Comments
This is due to a limitation in arrow-rs that does not support timestamps with timezones: every timestamp that enters arrow-rs and is transformed is outputted as a timestamp without timezone. Because A solution in |
Thanks for the prompt reply @jorgecarleitao. I followed your advise of re-casting and this query works: SELECT
CAST(system_time as TIMESTAMP) as system_time,
CAST(reported_date as TIMESTAMP) as reported_date,
province,
total_daily
FROM test
ORDER BY reported_date Linking the corresponding arrow issue: apache/arrow-rs#393 I wonder if the impact of not having timezone support in Arrow can be minimized somehow, e.g. by treating When reading parquet files produced by Spark the timestamps are being encoded as plain |
Here is a related discussion on the mailing list: https://lists.apache.org/thread.html/ra4c0842067342056a64a8a13e03755d4be58e8dddfbb064ca92ed5a3%40%3Cdev.arrow.apache.org%3E |
Update: Latest
|
Thanks @sergiimk -- I wonder if we need to update the arrow pretty printing code to handle that better 🤔 |
Seems to be working as expected now on latest main. Via datafusion-cli:
Can see it properly displays timezone with Z now |
Thanks @Jefffrey I also verified the query runs correctly using an explicitly created external table as well ❯ create external table test stored as parquet location 'flink.parquet';
0 rows in set. Query took 0.003 seconds.
❯ select * from test order by reported_date desc;
+--------------------------+----------------------+----------+-------------+
| system_time | reported_date | province | total_daily |
+--------------------------+----------------------+----------+-------------+
| 2021-08-30T20:38:07.488Z | 2021-08-25T00:00:00Z | ON | 807 |
| 2021-08-30T20:38:07.488Z | 2021-08-25T00:00:00Z | BC | 719 |
| 2021-08-30T20:38:07.488Z | 2021-08-24T00:00:00Z | BC | 708 |
| 2021-08-30T20:38:07.488Z | 2021-08-24T00:00:00Z | ON | 634 |
| 2021-08-30T20:38:07.488Z | 2021-08-23T00:00:00Z | ON | 509 |
| 2021-08-30T20:38:07.488Z | 2021-08-23T00:00:00Z | BC | 559 |
| 2021-08-30T20:38:07.488Z | 2021-08-22T00:00:00Z | ON | 489 |
| 2021-08-30T20:38:07.488Z | 2021-08-22T00:00:00Z | BC | 465 |
| 2021-08-30T20:38:07.488Z | 2021-08-21T00:00:00Z | ON | 681 |
| 2021-08-30T20:38:07.488Z | 2021-08-21T00:00:00Z | BC | 563 |
| 2021-08-30T20:38:07.488Z | 2021-08-20T00:00:00Z | BC | 696 |
| 2021-08-30T20:38:07.488Z | 2021-08-20T00:00:00Z | ON | 710 |
| 2021-08-30T20:38:07.488Z | 2021-08-19T00:00:00Z | ON | 706 |
| 2021-08-30T20:38:07.488Z | 2021-08-19T00:00:00Z | BC | 672 |
| 2021-08-30T20:38:07.488Z | 2021-08-18T00:00:00Z | ON | 651 |
| 2021-08-30T20:38:07.488Z | 2021-08-18T00:00:00Z | BC | 774 |
| 2021-08-30T20:38:07.488Z | 2021-08-17T00:00:00Z | BC | 622 |
| 2021-08-30T20:38:07.488Z | 2021-08-17T00:00:00Z | ON | 511 |
| 2021-08-30T20:38:07.488Z | 2021-08-16T00:00:00Z | ON | 434 |
| 2021-08-30T20:38:07.488Z | 2021-08-16T00:00:00Z | BC | 388 |
| 2021-08-30T20:38:07.488Z | 2021-08-15T00:00:00Z | ON | 450 |
| 2021-08-30T20:38:07.488Z | 2021-08-15T00:00:00Z | BC | 452 |
| 2021-08-30T20:38:07.488Z | 2021-08-14T00:00:00Z | BC | 427 |
| 2021-08-30T20:38:07.488Z | 2021-08-14T00:00:00Z | ON | 506 |
| 2021-08-30T20:38:07.488Z | 2021-08-13T00:00:00Z | ON | 544 |
| 2021-08-30T20:38:07.488Z | 2021-08-13T00:00:00Z | BC | 546 |
| 2021-08-30T20:38:07.488Z | 2021-08-12T00:00:00Z | BC | 704 |
| 2021-08-30T20:38:07.488Z | 2021-08-12T00:00:00Z | ON | 530 |
| 2021-08-30T20:38:07.488Z | 2021-08-11T00:00:00Z | BC | 539 |
| 2021-08-30T20:38:07.488Z | 2021-08-11T00:00:00Z | ON | 535 |
| 2021-08-30T20:38:07.488Z | 2021-08-10T00:00:00Z | ON | 386 |
| 2021-08-30T20:38:07.488Z | 2021-08-10T00:00:00Z | BC | 540 |
| 2021-08-30T20:38:07.488Z | 2021-08-09T00:00:00Z | BC | 360 |
| 2021-08-30T20:38:07.488Z | 2021-08-09T00:00:00Z | ON | 335 |
| 2021-08-30T20:38:07.488Z | 2021-08-08T00:00:00Z | BC | 298 |
| 2021-08-30T20:38:07.488Z | 2021-08-08T00:00:00Z | ON | 337 |
| 2021-08-30T20:38:07.488Z | 2021-08-07T00:00:00Z | BC | 390 |
| 2021-08-30T20:38:07.488Z | 2021-08-07T00:00:00Z | ON | 343 |
| 2021-08-30T20:38:07.488Z | 2021-08-06T00:00:00Z | BC | 428 |
| 2021-08-30T20:38:07.488Z | 2021-08-06T00:00:00Z | ON | 446 |
| . |
| . |
| . |
+--------------------------+----------------------+----------+-------------+
1095 rows in set (40 shown). Query took 0.004 seconds. |
Describe the bug
When trying to query a Parquet file produced by Apache Flink I get an error:
ArrowError(InvalidArgumentError("column types must match schema types, expected Timestamp(Millisecond, Some(\"UTC\")) but found Timestamp(Millisecond, None) at column index 0"))
Output of Java
parquet-schema
:To Reproduce
Download and extract the sample data: data.tar.gz.
Run:
Note that simple select works fine, but
ORDER BY
fails.Expected behavior
Query executes without errors.
The text was updated successfully, but these errors were encountered: