Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): support timezone in csv writer #6722

Merged
merged 30 commits into from
Feb 10, 2023

Conversation

MarcoGorelli
Copy link
Collaborator

closes #6718

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Feb 8, 2023
@@ -24,7 +24,7 @@ decompress = ["flate2/miniz_oxide"]
decompress-fast = ["flate2/zlib-ng"]
dtype-categorical = ["polars-core/dtype-categorical"]
dtype-date = ["polars-core/dtype-date", "polars-time/dtype-date"]
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"]
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to, instead do

Suggested change
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime", "chrono-tz", "chrono"]
dtype-datetime = ["polars-core/dtype-datetime", "polars-core/temporal", "polars-time/dtype-datetime"]
timezones = ["chrono-tz", "chrono"]

and then, in write_impl.rs, applying the following:

diff --git a/polars/polars-io/src/csv/write_impl.rs b/polars/polars-io/src/csv/write_impl.rs
index d2aaab6b5..f855dc97c 100644
--- a/polars/polars-io/src/csv/write_impl.rs
+++ b/polars/polars-io/src/csv/write_impl.rs
@@ -1,9 +1,9 @@
 use std::io::Write;
 
 use arrow::temporal_conversions;
-#[cfg(feature = "dtype-datetime")]
+#[cfg(feature = "timezones")]
 use chrono::TimeZone;
-#[cfg(feature = "dtype-datetime")]
+#[cfg(feature = "timezones")]
 use chrono_tz::Tz;
 use lexical_core::{FormattedSize, ToLexical};
 use memchr::{memchr, memchr2};
@@ -95,6 +95,7 @@ fn write_anyvalue(f: &mut Vec<u8>, value: AnyValue, options: &SerializeOptions)
                 TimeUnit::Milliseconds => temporal_conversions::timestamp_ms_to_datetime(v),
             };
             match tz {
+                #[cfg(feature = "timezones")]
                 Some(tz) => match tz.parse::<Tz>() {
                     Ok(parsed_tz) => match &options.datetime_format {
                         None => write!(f, "{}", PlTzAware::new(ndt, tz)),

However, I then get

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }', /home/marcogorelli/polars-dev/polars/polars-io/src/csv/write_impl.rs:132:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/marcogorelli/polars-dev/py-polars/t.py", line 19, in <module>
    print(df.write_csv())
  File "/home/marcogorelli/polars-dev/py-polars/polars/internals/dataframe/frame.py", line 2237, in write_csv
    self._df.write_csv(
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: Error { kind: Uncategorized, message: "formatter error" }

Looking into it, but am a bit confused

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the panic relate with the dependencies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that's what I'm confused about - there's no panic currently, but it appears if I move chrono-tz and chrono behind the timezones feature

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look later before merging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! 🙌 I'll take another look too, perhaps I can figure it out

Copy link
Collaborator Author

@MarcoGorelli MarcoGorelli Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see what happens (kinda): if I put the Some arm under the timezones feature, then it'll be skipped, and the other arm will be hit

            let formatted = match tz {
                #[cfg(feature = "timezones")]
                Some(tz) => match tz.parse::<Tz>() {
                    Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
                    Err(_) => match temporal_conversions::parse_offset(tz) {
                        Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
                        Err(_) => unreachable!(),
                    },
                },
                _ => ndt.format(datetime_format),
            };

And then, if date_format contains '%z', then ndt.format(datetime_format) fails

So, I need to get the timezones feature activated when match tz is reached with a tz-aware input

Comment on lines 101 to 111
let formatted = match tz {
Some(tz) => match tz.parse::<Tz>() {
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
Err(_) => match temporal_conversions::parse_offset(tz) {
Ok(parsed_tz) => parsed_tz.from_utc_datetime(&ndt).format(datetime_format),
Err(_) => unreachable!(),
},
},
_ => ndt.format(datetime_format),
};
write!(f, "{formatted}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alexander-beedie - do you have any thoughts on this? On the one hand, parsing tz for each element slows things down for tz-aware columns, and could be done beforehand in a per-column fashion

On the other hand, in #4724 it looks like you intentionally tried to avoid per-column inference

Copy link
Collaborator

@alexander-beedie alexander-beedie Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - though it wasn't about not wanting per-column inference (which would actually be great) as ensuring that any such inference was done outside the hot loop (per-element inference would be bad).

I took a minimal approach in the end as my initial attempt to reshuffle things on a per-column basis became unnecessarily convoluted - treat my earlier commit as a mere first step towards a more flexible/better per-column future... ;)

ritchie46 and others added 19 commits February 8, 2023 15:06
Co-authored-by: MarcoGorelli <>
@ritchie46
Copy link
Member

@MarcoGorelli let me know when I can finish this!

@MarcoGorelli MarcoGorelli marked this pull request as ready for review February 10, 2023 07:50
@MarcoGorelli
Copy link
Collaborator Author

Sure, feel free to take over, thanks!

@@ -131,7 +131,7 @@ bigidx = ["polars-core/bigidx", "polars-lazy/bigidx", "polars-ops/big_idx"]
list_to_struct = ["polars-ops/list_to_struct", "polars-lazy/list_to_struct"]
list_take = ["polars-ops/list_take", "polars-lazy/list_take"]
describe = ["polars-core/describe"]
timezones = ["polars-core/timezones", "polars-lazy/timezones"]
timezones = ["polars-core/timezones", "polars-lazy/timezones", "polars-io/timezones"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah looks like this is what I was missing - thanks for fixing it up!

looks like this needs a rebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, a squash will do. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write_csv doesn't respect datetime_format
7 participants