Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.contains strict=False takes no effect #6901

Closed
2 tasks done
sorhawell opened this issue Feb 15, 2023 · 2 comments
Closed
2 tasks done

str.contains strict=False takes no effect #6901

sorhawell opened this issue Feb 15, 2023 · 2 comments
Labels
bug Something isn't working python Related to Python Polars

Comments

@sorhawell
Copy link
Contributor

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

expr.str.contains has a strict argument which if False , should not raise error for invalid regex pattern however an error is raised always at query time in either case.

I think this is also a problem in rust as I see the same behavior in r-polars

Reproducible example

In [1]: import polars as pl

In [2]: pl.version()
Out[2]: '0.16.5'

In [3]: pl.select(pl.lit("some_text").str.contains("(invalid_pattern",literal=False,strict=False))
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[3], line 1
----> 1 pl.select(pl.lit("some_text").str.contains("(invalid_pattern",literal=False,strict=False))

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/lazy_functions.py:2335, in select(exprs, *more_exprs, **named_exprs)
   2287 def select(
   2288     exprs: (
   2289         str
   (...)
   2297     **named_exprs: str | PolarsExprType | PythonLiteral | pli.Series | None,
   2298 ) -> pli.DataFrame:
   2299     """
   2300     Run polars expressions without a context.
   2301 
   (...)
   2333 
   2334     """
-> 2335     return pli.DataFrame().select(exprs, *more_exprs, **named_exprs)

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/dataframe/frame.py:5784, in DataFrame.select(self, exprs, *more_exprs, **named_exprs)
   5666 def select(
   5667     self,
   5668     exprs: (
   (...)
   5677     **named_exprs: str | PolarsExprType | PythonLiteral | pli.Series | None,
   5678 ) -> Self:
   5679     """
   5680     Select columns from this DataFrame.
   5681 
   (...)
   5779 
   5780     """
   5781     return self._from_pydf(
   5782         self.lazy()
   5783         .select(exprs, *more_exprs, **named_exprs)
-> 5784         .collect(no_optimization=True)
   5785         ._df
   5786     )

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/lazyframe/frame.py:1146, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
   1135     common_subplan_elimination = False
   1137 ldf = self._ldf.optimization_toggle(
   1138     type_coercion,
   1139     predicate_pushdown,
   (...)
   1144     streaming,
   1145 )
-> 1146 return pli.wrap_df(ldf.collect())

ComputeError: regex error: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    (invalid_pattern
    ^
error: unclosed group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

In [4]: pl.select(pl.lit("some_text").str.contains("(invalid_pattern",literal=False,strict=True))
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[4], line 1
----> 1 pl.select(pl.lit("some_text").str.contains("(invalid_pattern",literal=False,strict=True))

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/lazy_functions.py:2335, in select(exprs, *more_exprs, **named_exprs)
   2287 def select(
   2288     exprs: (
   2289         str
   (...)
   2297     **named_exprs: str | PolarsExprType | PythonLiteral | pli.Series | None,
   2298 ) -> pli.DataFrame:
   2299     """
   2300     Run polars expressions without a context.
   2301 
   (...)
   2333 
   2334     """
-> 2335     return pli.DataFrame().select(exprs, *more_exprs, **named_exprs)

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/dataframe/frame.py:5784, in DataFrame.select(self, exprs, *more_exprs, **named_exprs)
   5666 def select(
   5667     self,
   5668     exprs: (
   (...)
   5677     **named_exprs: str | PolarsExprType | PythonLiteral | pli.Series | None,
   5678 ) -> Self:
   5679     """
   5680     Select columns from this DataFrame.
   5681 
   (...)
   5779 
   5780     """
   5781     return self._from_pydf(
   5782         self.lazy()
   5783         .select(exprs, *more_exprs, **named_exprs)
-> 5784         .collect(no_optimization=True)
   5785         ._df
   5786     )

File /usr/local/Cellar/ipython/8.8.0/libexec/lib/python3.11/site-packages/polars/internals/lazyframe/frame.py:1146, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
   1135     common_subplan_elimination = False
   1137 ldf = self._ldf.optimization_toggle(
   1138     type_coercion,
   1139     predicate_pushdown,
   (...)
   1144     streaming,
   1145 )
-> 1146 return pli.wrap_df(ldf.collect())

ComputeError: regex error: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    (invalid_pattern
    ^
error: unclosed group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

Expected behavior

when strict= False contains should return Null for invalid regex patterns

Installed versions

---Version info---
Polars: 0.16.5
Index type: UInt32
Platform: macOS-12.6.3-x86_64-i386-64bit
Python: 3.11.1 (main, Dec 23 2022, 09:40:27) [Clang 14.0.0 (clang-1400.0.29.202)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: <not installed>
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: <not installed>
@sorhawell sorhawell added bug Something isn't working python Related to Python Polars labels Feb 15, 2023
@ritchie46
Copy link
Member

Thanks for the report. Could you make a PR?

@sorhawell
Copy link
Contributor Author

sorhawell commented Feb 16, 2023

I will give it a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants