Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expr.is_between() not implemented for str #6625

Closed
2-5 opened this issue Feb 1, 2023 · 5 comments · Fixed by #6627
Closed

Expr.is_between() not implemented for str #6625

2-5 opened this issue Feb 1, 2023 · 5 comments · Fixed by #6627
Labels
enhancement New feature or an improvement of an existing feature

Comments

@2-5
Copy link
Contributor

2-5 commented Feb 1, 2023

Problem description

Expr.is_between() is not implemented for str, only < and >:

import polars as pl

df = pl.DataFrame({
    "text": ["a", "b", "c", "d"],
})
print(df)

df2 = df.filter(
    pl.col("text") < "c",
)
print(df2)

df3 = df.filter(
    pl.col("text").is_between("b", "d"),
)
print(df3)

Output:

Traceback (most recent call last):
  File "test.py", line 13, in <module>
    df3 = df.filter(
  File ".venv\lib\site-packages\polars\internals\dataframe\frame.py", line 2702, in filter
    self.lazy()
  File ".venv\lib\site-packages\polars\internals\lazyframe\frame.py", line 1143, in collect
    return pli.wrap_df(ldf.collect())
exceptions.NotFoundError: b
@2-5 2-5 added the enhancement New feature or an improvement of an existing feature label Feb 1, 2023
@stinodego
Copy link
Member

stinodego commented Feb 1, 2023

It is implemented, but you should write this like:

df3 = df.filter(
    pl.col("text").is_between(pl.lit("b"), pl.lit("d")),
)

It interprets "b" as pl.col("b") (which does not exist). That happens in many places throughout Polars for ease of use, e.g. df.select(["a", "b"]).

So this is by design, although it may be poor design. It can definitely be confusing.

@2-5
Copy link
Contributor Author

2-5 commented Feb 1, 2023

Interesting, sounds like a useful feature.

Maybe an example could be added in the docs:

https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.is_between.html

The error message could be improved, by mentioning that a column was not found, something like this:

exceptions.NotFoundError: b -> exceptions.ColumnNotFoundError: b

Or better exceptions.NotFoundError: No "b" column. Maybe you mean pl.lit("b")?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 2, 2023

Maybe an example could be added in the docs:

Good idea; done. Will also (separately) see if we can slightly enrich the NotFoundError to better-indicate what has not been found 👍

@stinodego
Copy link
Member

Maybe an example could be added in the docs:

While this is nice, it's a broader thing that affects many Polars functions. Every function should clearly state how it handles string inputs. We should do that, but it's quite cumbersome 😞

Better error messaging is really the best solution here.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 4, 2023

@2-5, @stinodego: FYI ,have just done a sweep through the NotFound exception classes, improving error context as suggested... (#6670)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
3 participants