Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Series.equals to ignore name by default #13231

Closed
2 tasks done
dctalbot opened this issue Dec 24, 2023 · 10 comments · Fixed by #16610
Closed
2 tasks done

Change Series.equals to ignore name by default #13231

dctalbot opened this issue Dec 24, 2023 · 10 comments · Fixed by #16610
Assignees
Labels
A-api Area: changes to the public API accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars
Milestone

Comments

@dctalbot
Copy link

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
df = pl.DataFrame(
            {
                "a": [1, 2, 3],
                "b": [1, 2, 3]
            }
        )

s1 = df['a']
s2 = df['b']

# this fails unexpectedly
assert s1.equals(s2)

Log output

No response

Issue description

It looks like Series.equals requires the header names to be equal in order to return true. This surprised me, but I'm unsure what the intended behavior is. Pandas does not take the headers into account e.g.

import pandas as pd
df = pd.DataFrame(
            {
                "a": [1, 2, 3],
                "b": [1, 2, 3]
            }
        )

s1 = df['a']
s2 = df['b']

# this passes as expected
assert s1.equals(s2)

I would have expected Polars to behave the same way.

If intended, it's probably worth clarifying this in the documentation.

Expected behavior

import polars as pl
df = pl.DataFrame(
            {
                "a": [1, 2, 3],
                "b": [1, 2, 3]
            }
        )

s1 = df['a']
s2 = df['b']

# this should pass
assert s1.equals(s2)

Installed versions

--------Version info---------
Polars:               0.20.1
Index type:           UInt32
Platform:             macOS-14.1.1-arm64-arm-64bit
Python:               3.11.7 (main, Dec 19 2023, 23:11:51) [Clang 15.0.0 (clang-1500.0.40.1)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           0.3.2
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
matplotlib:           <not installed>
numpy:                1.26.2
openpyxl:             <not installed>
pandas:               2.1.4
pyarrow:              14.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.23
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@dctalbot dctalbot added bug Something isn't working python Related to Python Polars labels Dec 24, 2023
@Wainberg
Copy link
Contributor

Agreed - if you want to check whether the names are equal as well, you can test for a.name == b.name and a.equals(b).

@cmdlineluser
Copy link
Contributor

There is also the polars.testing module which has several parameters for customizing behaviour.

https://pola-rs.github.io/polars/py-polars/html/reference/testing.html#asserts

>>> pl.testing.assert_series_equal(s1, s2)
# AssertionError: Series are different (name mismatch)
# [left]:  a
# [right]: b
>>> pl.testing.assert_series_equal(s1, s2, check_names=False)
>>> print("Equal!")
# Equal!

@dctalbot
Copy link
Author

Thanks for the link! I wouldn't want to depend on testing utils in my application though.

@stinodego
Copy link
Member

stinodego commented Jan 4, 2024

It's not a bug. But I also think it makes more sense for equals to ignore the name by default. I think this is worth an update. We can include a check_name parameter that defaults to False. That way, we can deprecate before we break..

@stinodego stinodego added enhancement New feature or an improvement of an existing feature A-api Area: changes to the public API accepted Ready for implementation and removed bug Something isn't working labels Jan 4, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jan 4, 2024
@stinodego stinodego moved this from Ready to Candidate in Backlog Jan 4, 2024
@stinodego stinodego changed the title Series.equals seems either broken or unintuitive Change Series.equals to ignore name by default Jan 4, 2024
@dctalbot
Copy link
Author

@stinodego To preserve existing behavior and avoid a breaking change, shouldn't the default check_name be True? Depends on your versioning strategy I suppose. I can probably put together a PR for this.

@stinodego
Copy link
Member

To do this properly, you would add a parameter check_name which defaults to None. And in the function body, you check for None inputs, throw a deprecation warning, and then set it to True.

Happy to review a PR for this 👍

@jjmarchewitz
Copy link

Hello! Long time lurker, (potentially) first time contributor. This type of change sounds like a good first issue to me, but I see it doesn't have that label. @stinodego do you think this is a good first issue? If so, would you be willing to assign me to it (since I can't on my own)? Or should I just make the changes without being assigned.

@jjmarchewitz
Copy link

jjmarchewitz commented Jan 27, 2024

I did some exploring and I have a question. Why is this operator's implementation located in a test utilities file? crates/polars-core/src/testing.rs equals_missing(). Also, what does _missing mean in the title?

@cmdlineluser
Copy link
Contributor

@jjmarchewitz

.eq_missing() behaves differently with regards to "missing values" i.e. None/null

pl.Series([1, None, 3]).eq( pl.Series([1, 2, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
#     true
#     null # <- null is propagated
#     true
# ]
pl.Series([1, None, 3]).eq_missing( pl.Series([1, 2, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
#     true
#     false 
#     true
# ]
pl.Series([1, None, 3]).eq_missing( pl.Series([1, None, 3]) )
# shape: (3,)
# Series: '' [bool]
# [
#     true
#     true 
#     true
# ]

There doesn't seem to be doc entries for the Series API, but: https://docs.pola.rs/docs/python/dev/reference/expressions/api/polars.Expr.eq_missing.html#polars.Expr.eq_missing should help.

@stinodego
Copy link
Member

It could be a good first issue, depends on your skillset!

Note that the change must be made in Rust. Deprecation must happen on the Python side.

@stinodego stinodego added this to the 1.0.0 milestone May 25, 2024
@stinodego stinodego assigned stinodego and unassigned jjmarchewitz May 26, 2024
@stinodego stinodego moved this from Candidate to Next in Backlog May 26, 2024
@stinodego stinodego added needs decision Awaiting decision by a maintainer accepted Ready for implementation and removed accepted Ready for implementation needs decision Awaiting decision by a maintainer labels May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-api Area: changes to the public API accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
Archived in project
5 participants