feat(rust, python): support timezone in csv writer #6722

MarcoGorelli · 2023-02-08T09:34:07Z

MarcoGorelli · 2023-02-08T09:41:25Z

polars/polars-io/Cargo.toml

@@ -24,7 +24,7 @@ decompress = ["flate2/miniz_oxide"]
 decompress-fast = ["flate2/zlib-ng"]
 dtype-categorical = ["polars-core/dtype-categorical"]
 dtype-date = ["polars-core/dtype-date", "polars-time/dtype-date"]
-dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"]
+dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"]


I tried to, instead do

Suggested change

dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"]

dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"]

timezones = ["chrono-tz", "chrono"]

and then, in write_impl.rs, applying the following:

diff --git a/polars/polars-io/src/csv/write_impl.rs b/polars/polars-io/src/csv/write_impl.rs index d2aaab6b5..f855dc97c 100644 --- a/polars/polars-io/src/csv/write_impl.rs +++ b/polars/polars-io/src/csv/write_impl.rs @@ -1,9 +1,9 @@ use std::io::Write; use arrow::temporal_conversions; -#[cfg(feature = "dtype-datetime")] +#[cfg(feature = "timezones")] use chrono::TimeZone; -#[cfg(feature = "dtype-datetime")] +#[cfg(feature = "timezones")] use chrono_tz::Tz; use lexical_core::{FormattedSize, ToLexical}; use memchr::{memchr, memchr2}; @@ -95,6 +95,7 @@ fn write_anyvalue(f: &mut Vec<u8>, value: AnyValue, options: &SerializeOptions) TimeUnit::Milliseconds => temporal_conversions::timestamp_ms_to_datetime(v), }; match tz { + #[cfg(feature = "timezones")] Some(tz) => match tz.parse::<Tz>() { Ok(parsed_tz) => match &options.datetime_format { None => write!(f, "{}", PlTzAware::new(ndt, tz)),

However, I then get

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }', /home/marcogorelli/polars-dev/polars/polars-io/src/csv/write_impl.rs:132:6 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Traceback (most recent call last): File "/home/marcogorelli/polars-dev/py-polars/t.py", line 19, in <module> print(df.write_csv()) File "/home/marcogorelli/polars-dev/py-polars/polars/internals/dataframe/frame.py", line 2237, in write_csv self._df.write_csv( pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }

Looking into it, but am a bit confused

How does the panic relate with the dependencies?

I don't know, that's what I'm confused about - there's no panic currently, but it appears if I move chrono-tz and chrono behind the timezones feature

I can take a look later before merging.

thanks! 🙌 I'll take another look too, perhaps I can figure it out

OK, I see what happens (kinda): if I put the Some arm under the timezones feature, then it'll be skipped, and the other arm will be hit

let formatted = match tz { #[cfg(feature = "timezones")] Some(tz) => match tz.parse::<Tz>() { Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format), Err(_) => match temporal_conversions::parse_offset(tz) { Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format), Err(_) => unreachable!(), }, }, _ => ndt.format(datetime_format), };

And then, if date_format contains '%z', then ndt.format(datetime_format) fails

So, I need to get the timezones feature activated when match tz is reached with a tz-aware input

MarcoGorelli · 2023-02-08T12:46:03Z

polars/polars-io/src/csv/write_impl.rs

+            let formatted = match tz {
+                Some(tz) => match tz.parse::<Tz>() {
+                    Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
+                    Err(_) => match temporal_conversions::parse_offset(tz) {
+                        Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
+                        Err(_) => unreachable!(),
+                    },
+                },
+                _ => ndt.format(datetime_format),
+            };
+            write!(f, "{formatted}")


Hi @alexander-beedie - do you have any thoughts on this? On the one hand, parsing tz for each element slows things down for tz-aware columns, and could be done beforehand in a per-column fashion

On the other hand, in #4724 it looks like you intentionally tried to avoid per-column inference

Yup - though it wasn't about not wanting per-column inference (which would actually be great) as ensuring that any such inference was done outside the hot loop (per-element inference would be bad).

I took a minimal approach in the end as my initial attempt to reshuffle things on a per-column basis became unnecessarily convoluted - treat my earlier commit as a mere first step towards a more flexible/better per-column future... ;)

…a-rs#6733)

…s. (pola-rs#5749) (pola-rs#6742)

…a-rs#6715)

pola-rs#6738)

Co-authored-by: MarcoGorelli <>

…a-rs#6751)

…la-rs#6759) Co-authored-by: MarcoGorelli <>

pola-rs#6756)

Co-authored-by: MarcoGorelli <>

ritchie46 · 2023-02-10T06:18:02Z

@MarcoGorelli let me know when I can finish this!

MarcoGorelli · 2023-02-10T07:51:42Z

Sure, feel free to take over, thanks!

…en the replace value looks like a capture group (pola-rs#6765)

Co-authored-by: MarcoGorelli <>

pola-rs#6772)

)

…o MarcoGorelli-autodetect-aware

MarcoGorelli · 2023-02-10T10:16:00Z

polars/Cargo.toml

@@ -131,7 +131,7 @@ bigidx = ["polars-core/bigidx", "polars-lazy/bigidx", "polars-ops/big_idx"]
 list_to_struct = ["polars-ops/list_to_struct", "polars-lazy/list_to_struct"]
 list_take = ["polars-ops/list_take", "polars-lazy/list_take"]
 describe = ["polars-core/describe"]
-timezones = ["polars-core/timezones", "polars-lazy/timezones"]
+timezones = ["polars-core/timezones", "polars-lazy/timezones", "polars-io/timezones"]


ah looks like this is what I was missing - thanks for fixing it up!

looks like this needs a rebase?

Nope, a squash will do. 👍

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Feb 8, 2023

MarcoGorelli commented Feb 8, 2023

View reviewed changes

MarcoGorelli mentioned this pull request Feb 8, 2023

chore(rust): remove unreachable path in write_anyvalue #6727

Merged

MarcoGorelli added 4 commits February 8, 2023 10:57

support timezones in csv writer

b8ada23

lint

1267e8e

simplify

3ae3f29

clippy

55d09a9

MarcoGorelli force-pushed the autodetect-aware branch from 1e4d83c to 1267e8e Compare February 8, 2023 11:18

MarcoGorelli commented Feb 8, 2023

View reviewed changes

ritchie46 and others added 19 commits February 8, 2023 15:06

fix(python): respect 'None' in from_dicts (pola-rs#6726)

2e786d2

fix(rust, python): arrow map dtype conversion (pola-rs#6732)

0532e03

feat(python): don't require pyarrow for utf8 -> numpy conversion (pol…

870a818

…a-rs#6733)

feat(python): scan_ds predicate pushdown for string cmp (pola-rs#6734)

91f765f

feat(rust, python): Support an ignore_nulls param for EWM calculation…

44a7c5b

…s. (pola-rs#5749) (pola-rs#6742)

fix(rust,python): Improve error message in DataFrame constructor (pol…

262114c

…a-rs#6715)

feat(python): Improved assert equal messages (pola-rs#6737)

d43500e

test(python): Reorganize benchmark test folder (pola-rs#6695)

0a1c1bc

feat(python): Improve numpy support: conversion of numpy arrays with … (

d3633fb

pola-rs#6738)

feat(rust, python): add argmin/max for utf8 data (pola-rs#6746)

80cce18

chore(rust): update arrow to 0.16 (pola-rs#6748)

dd1dca7

docs(python): redirect tz_localize (pola-rs#6749)

aeb3a03

Co-authored-by: MarcoGorelli <>

test(python): integrate ignore_nulls into EWM parametric tests (pol…

b160f53

…a-rs#6751)

fix(rust, python): respect skip_rows in glob parsing csv (pola-rs#6754)

e103b34

feat(rust, python): formally support duration division (pola-rs#6758)

9de9316

chore(rust): propagate error in date_range with invalid time zone (po…

7fbdb6c

…la-rs#6759) Co-authored-by: MarcoGorelli <>

build(python): Update mypy to version 1.0.0 (pola-rs#6744)

11e4de2

feat(python): Add option to use PyArrow backed-extension arrays when … (

0cf7d7f

pola-rs#6756)

feat(rust, python): parse timezone from Datetime (pola-rs#6766)

aad4aa3

Co-authored-by: MarcoGorelli <>

MarcoGorelli marked this pull request as ready for review February 10, 2023 07:50

alexander-beedie and others added 7 commits February 10, 2023 09:30

fix(rust,python): handle edge-case with string-literal replacement wh…

4607eb6

…en the replace value looks like a capture group (pola-rs#6765)

feat(python): default to 1d interval in date_range (pola-rs#6771)

2d7d728

Co-authored-by: MarcoGorelli <>

fix(rust, python): don't set sorted flag if we reverse sort the left … (

1a45830

pola-rs#6772)

fix(rust, python): use explicit drop function node (pola-rs#6769)

f61fa38

feat(rust): implement series abstractions for Int128Type (pola-rs#6679

b3a7374

)

Merge branch 'autodetect-aware' of github.com:MarcoGorelli/polars int…

afac817

…o MarcoGorelli-autodetect-aware

add timezones feature to polars-io

7dbdc00

MarcoGorelli commented Feb 10, 2023

View reviewed changes

ritchie46 merged commit 9e298e2 into pola-rs:master Feb 10, 2023

MarcoGorelli mentioned this pull request Feb 12, 2023

Error when writing to csv when DataFrame contains both naive and aware datetimes and no datetime_format is specified #6827

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust, python): support timezone in csv writer #6722

feat(rust, python): support timezone in csv writer #6722

MarcoGorelli commented Feb 8, 2023

MarcoGorelli Feb 8, 2023

ritchie46 Feb 8, 2023

MarcoGorelli Feb 8, 2023

ritchie46 Feb 8, 2023

MarcoGorelli Feb 8, 2023

MarcoGorelli Feb 8, 2023 •

edited

Loading

MarcoGorelli Feb 8, 2023

alexander-beedie Feb 10, 2023 •

edited

Loading

ritchie46 commented Feb 10, 2023

MarcoGorelli commented Feb 10, 2023

MarcoGorelli Feb 10, 2023

ritchie46 Feb 10, 2023

	dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"]
	dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"]
	timezones = ["chrono-tz", "chrono"]

feat(rust, python): support timezone in csv writer #6722

feat(rust, python): support timezone in csv writer #6722

Conversation

MarcoGorelli commented Feb 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli Feb 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexander-beedie Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

ritchie46 commented Feb 10, 2023

MarcoGorelli commented Feb 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli Feb 8, 2023 •

edited

Loading

alexander-beedie Feb 10, 2023 •

edited

Loading