-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): support timezone in csv writer #6722
Conversation
polars/polars-io/Cargo.toml
Outdated
@@ -24,7 +24,7 @@ decompress = ["flate2/miniz_oxide"] | |||
decompress-fast = ["flate2/zlib-ng"] | |||
dtype-categorical = ["polars-core/dtype-categorical"] | |||
dtype-date = ["polars-core/dtype-date", "polars-time/dtype-date"] | |||
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"] | |||
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to, instead do
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"] | |
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"] | |
timezones = ["chrono-tz", "chrono"] |
and then, in write_impl.rs
, applying the following:
diff --git a/polars/polars-io/src/csv/write_impl.rs b/polars/polars-io/src/csv/write_impl.rs
index d2aaab6b5..f855dc97c 100644
--- a/polars/polars-io/src/csv/write_impl.rs
+++ b/polars/polars-io/src/csv/write_impl.rs
@@ -1,9 +1,9 @@
use std::io::Write;
use arrow::temporal_conversions;
-#[cfg(feature = "dtype-datetime")]
+#[cfg(feature = "timezones")]
use chrono::TimeZone;
-#[cfg(feature = "dtype-datetime")]
+#[cfg(feature = "timezones")]
use chrono_tz::Tz;
use lexical_core::{FormattedSize, ToLexical};
use memchr::{memchr, memchr2};
@@ -95,6 +95,7 @@ fn write_anyvalue(f: &mut Vec<u8>, value: AnyValue, options: &SerializeOptions)
TimeUnit::Milliseconds => temporal_conversions::timestamp_ms_to_datetime(v),
};
match tz {
+ #[cfg(feature = "timezones")]
Some(tz) => match tz.parse::<Tz>() {
Ok(parsed_tz) => match &options.datetime_format {
None => write!(f, "{}", PlTzAware::new(ndt, tz)),
However, I then get
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }', /home/marcogorelli/polars-dev/polars/polars-io/src/csv/write_impl.rs:132:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/home/marcogorelli/polars-dev/py-polars/t.py", line 19, in <module>
print(df.write_csv())
File "/home/marcogorelli/polars-dev/py-polars/polars/internals/dataframe/frame.py", line 2237, in write_csv
self._df.write_csv(
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }
Looking into it, but am a bit confused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the panic relate with the dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, that's what I'm confused about - there's no panic currently, but it appears if I move chrono-tz
and chrono
behind the timezones
feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a look later before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! 🙌 I'll take another look too, perhaps I can figure it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see what happens (kinda): if I put the Some
arm under the timezones
feature, then it'll be skipped, and the other arm will be hit
let formatted = match tz {
#[cfg(feature = "timezones")]
Some(tz) => match tz.parse::<Tz>() {
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
Err(_) => match temporal_conversions::parse_offset(tz) {
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
Err(_) => unreachable!(),
},
},
_ => ndt.format(datetime_format),
};
And then, if date_format
contains '%z'
, then ndt.format(datetime_format)
fails
So, I need to get the timezones
feature activated when match tz
is reached with a tz-aware input
1e4d83c
to
1267e8e
Compare
let formatted = match tz { | ||
Some(tz) => match tz.parse::<Tz>() { | ||
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format), | ||
Err(_) => match temporal_conversions::parse_offset(tz) { | ||
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format), | ||
Err(_) => unreachable!(), | ||
}, | ||
}, | ||
_ => ndt.format(datetime_format), | ||
}; | ||
write!(f, "{formatted}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alexander-beedie - do you have any thoughts on this? On the one hand, parsing tz
for each element slows things down for tz-aware columns, and could be done beforehand in a per-column fashion
On the other hand, in #4724 it looks like you intentionally tried to avoid per-column inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - though it wasn't about not wanting per-column inference (which would actually be great) as ensuring that any such inference was done outside the hot loop (per-element inference would be bad).
I took a minimal approach in the end as my initial attempt to reshuffle things on a per-column basis became unnecessarily convoluted - treat my earlier commit as a mere first step towards a more flexible/better per-column future... ;)
Co-authored-by: MarcoGorelli <>
…la-rs#6759) Co-authored-by: MarcoGorelli <>
Co-authored-by: MarcoGorelli <>
@MarcoGorelli let me know when I can finish this! |
Sure, feel free to take over, thanks! |
…en the replace value looks like a capture group (pola-rs#6765)
Co-authored-by: MarcoGorelli <>
…o MarcoGorelli-autodetect-aware
@@ -131,7 +131,7 @@ bigidx = ["polars-core/bigidx", "polars-lazy/bigidx", "polars-ops/big_idx"] | |||
list_to_struct = ["polars-ops/list_to_struct", "polars-lazy/list_to_struct"] | |||
list_take = ["polars-ops/list_take", "polars-lazy/list_take"] | |||
describe = ["polars-core/describe"] | |||
timezones = ["polars-core/timezones", "polars-lazy/timezones"] | |||
timezones = ["polars-core/timezones", "polars-lazy/timezones", "polars-io/timezones"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah looks like this is what I was missing - thanks for fixing it up!
looks like this needs a rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, a squash will do. 👍
closes #6718