Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify temporal expression functions #6445

Closed
romanovacca opened this issue Jan 25, 2023 · 1 comment · Fixed by #12179
Closed

Unify temporal expression functions #6445

romanovacca opened this issue Jan 25, 2023 · 1 comment · Fixed by #12179
Assignees
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature
Milestone

Comments

@romanovacca
Copy link
Contributor

romanovacca commented Jan 25, 2023

Problem description

A couple functions are called similar, while having different behaviour e.g. Expr.dt.microsecond() and Expr.dt.microseconds().

These functionality of these could be unified. See #6413 for more information

@romanovacca romanovacca added the enhancement New feature or an improvement of an existing feature label Jan 25, 2023
@stinodego stinodego added this to the Python Polars 0.17.0 milestone Mar 2, 2023
@zundertj zundertj added the breaking Change that breaks backwards compatibility label Mar 4, 2023
@MarcoGorelli MarcoGorelli self-assigned this Nov 1, 2023
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Nov 1, 2023

The current names were (probably?) taken from pandas / Python stdlib, but they behave differently:

In [31]: pd.Series([timedelta(days=1, hours=1)]).dt.seconds
Out[31]:
0    3600
dtype: int32

In [32]: pl.Series([timedelta(days=1, hours=1)]).dt.seconds()
Out[32]:
shape: (1,)
Series: '' [i64]
[
        90000
]

The pandas (and Python stdlib) equivalent of Polars .dt.seconds is actually .dt.total_seconds:

In [41]: pd.Series([timedelta(days=1, hours=1)]).dt.total_seconds()
Out[41]:
0    90000.0
dtype: float64

The Python standard library behaves in the same way:

In [49]: timedelta(days=1, hours=1).seconds
Out[49]: 3600

In [50]: timedelta(days=1, hours=1).total_seconds()
Out[50]: 90000.0

So, rather than unifiying temporal .dt.microseconds and datetime .dt.microsecond, maybe all that needs doing is:

  • .dt.seconds => .dt.total_seconds
  • .dt.microseconds => .dt.total_microseconds
  • ...
  • .dt.nanoseconds => .dt.total_nanoseconds
    ?

This should

  • reduce the surprise for users used to pandas / Python stdlib
  • reduce the risk of accidentally confusing them because of a single character difference (this is quite easily done, I've done it too 😄 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature
Projects
None yet
4 participants