-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python)!: error on string <-> date cmp #6498
Conversation
Just checking @ritchie46, will this still work? import polars as pl
from datetime import datetime
pl.DataFrame({
'myDate': [datetime(2020,1,1), datetime(2021,1,1), datetime(2022,1,1), datetime(2023,1,1)]
}).filter(pl.col('myDate')>'2021-06-01')
>>>
shape: (2, 1)
┌─────────────────────┐
│ myDate │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2022-01-01 00:00:00 │
│ 2023-01-01 00:00:00 │
└─────────────────────┘ |
That never worked as expected. It will cast the date to string and do a string compare. You don't want that l. |
Oh, not sure how I feel about this... This way of filtering time series seems to have accidentally been working, and it is the standard in Pandas - I think this change will break many codebases.
My proposal was more that the string should be cast into a date, and an error raised if the casting was not possible (that's the standard behavior in other libraries)
|
Those codebases would have been broken to begin with. Polars already printed a warning that this likely was not what you should do. Polars is not pandas and we choose to not support this as string as dates are ambigious. |
I can live with that (though I'll be sad). If I can argue my case, it is not just Pandas - it is the convention in SQL as well: SELECT * FROM table WHERE myDateCol >= '2022-01-01' This is also the case in PySpark and Snowpark among others. Moreover, it is not clear to me that this: df.filter(pl.col('myDate') > '2021-06-01') is any worse than df.filter(pl.col('myDate') > datetime(2021, 6, 1)) My preference would be that when comparing a |
Conversions of strings to dates have so many edge cases and gotcha's that this conversion should be explicit. Then you can specify the format, the timezone handling, etc. This explicitness will prevent bugs downstream. |
Yes, not allowing things is a feature. ;)
closes #6468