Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when filtering a dataframe with an object field #18665

Closed
2 tasks done
fedyakov opened this issue Sep 10, 2024 · 5 comments · Fixed by #19811
Closed
2 tasks done

Panic when filtering a dataframe with an object field #18665

fedyakov opened this issue Sep 10, 2024 · 5 comments · Fixed by #19811
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@fedyakov
Copy link

fedyakov commented Sep 10, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame([{f"c{i}": 0 for i in range(11)} | {"c": object()}] * 12)
df.filter((pl.col("c0") == 0) & (pl.col("c1") == 0))

Log output

thread '<unnamed>' panicked at /Users/runner/work/polars/polars/crates/polars-core/src/chunked_array/ops/chunkops.rs:146:17:
implementation error
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: polars_core::chunked_array::ops::chunkops::<impl polars_core::chunked_array::ChunkedArray<T>>::rechunk
   3: polars_core::chunked_array::ops::gather::<impl polars_core::chunked_array::ops::ChunkTakeUnchecked<polars_core::chunked_array::ChunkedArray<polars_core::datatypes::UInt32Type>> for polars_core::chunked_array::ChunkedArray<T>>::take_unchecked
   4: polars_core::series::implementations::object::<impl polars_core::series::series_trait::SeriesTrait for polars_core::series::implementations::SeriesWrap<polars_core::chunked_array::ChunkedArray<polars_core::datatypes::ObjectType<T>>>>::take
   5: polars_core::series::Series::clear
   6: polars_core::series::Series::select_chunk
   7: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
   8: <polars_mem_engine::executors::filter::FilterExec as polars_mem_engine::executors::executor::Executor>::execute::{{closure}}
   9: <polars_mem_engine::executors::filter::FilterExec as polars_mem_engine::executors::executor::Executor>::execute
  10: polars_lazy::frame::LazyFrame::collect
  11: polars_python::lazyframe::general::<impl polars_python::lazyframe::PyLazyFrame>::__pymethod_collect__
  12: pyo3::impl_::trampoline::trampoline
  13: polars_python::lazyframe::general::_::__INVENTORY::trampoline
  14: _method_vectorcall_VARARGS_KEYWORDS
  15: _call_function
  16: __PyEval_EvalFrameDefault
  17: __PyEval_Vector
  18: _method_vectorcall
  19: _call_function
  20: __PyEval_EvalFrameDefault
  21: __PyEval_Vector
  22: _call_function
  23: __PyEval_EvalFrameDefault
  24: __PyEval_Vector
  25: _PyEval_EvalCode
  26: _run_eval_code_obj
  27: _run_mod
  28: _pyrun_file
  29: __PyRun_SimpleFileObject
  30: __PyRun_AnyFileObject
  31: _pymain_run_file_obj
  32: _pymain_run_file
  33: _Py_RunMain
  34: _Py_BytesMain
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Traceback (most recent call last):
  File "/Users/fedyakov/GitHub/apple/neutron/web-scraping-hours/apps/streamlits/golden_set/polars_bug.py", line 4, in <module>
    df.filter((pl.col("c0") == 0) & (pl.col("c1") == 0))
  File "/Users/fedyakov/GitHub/apple/neutron/web-scraping-hours/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py", line 4701, in filter
    return self.lazy().filter(*predicates, **constraints).collect(_eager=True)
  File "/Users/fedyakov/GitHub/apple/neutron/web-scraping-hours/.venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 2034, in collect
    return wrap_df(ldf.collect(callback))
pyo3_runtime.PanicException: implementation error

Issue description

Expected behavior

Filter must work without exceptions.

Installed versions

--------Version info---------
Polars:              1.6.0
Index type:          UInt32
Platform:            macOS-14.6.1-arm64-arm-64bit
Python:              3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               4.2.2
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.3.1
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.0
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             <not installed>
pandas               2.2.2
pyarrow              15.0.2
pydantic             2.7.3
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                2.2.2
xlsx2csv             <not installed>
xlsxwriter           <not installed>
None
@fedyakov fedyakov added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 10, 2024
@egaban
Copy link

egaban commented Sep 27, 2024

Can confirm this issue on a 14 column dataframe WITHOUT any object. Only dtypes are Float64, Int64, and String. Dropping any 3 columns makes it work

Edit: adding more details, the data frame was created with two cross joins

@egaban
Copy link

egaban commented Sep 27, 2024

Also, somehow printing works. So print(df.filter(...)) prints as expected the filtered dataframe. When assigning, it breaks

@cmdlineluser
Copy link
Contributor

@egaban If you can make a reproducible example (with synthetic data if needed) - you should open a new issue.

The object dtype has very limited support, meaning your case would have much higher priority for the devs.

@egaban
Copy link

egaban commented Sep 28, 2024

Just tried the exact same code with same inputs on another machine and I don't know how but it did work 🤯

Same Python/Polars versions, but the first was running Linux and the second Mac. I'll try to create a simple reproduction of the problem in the Linux machine and send it here

@NXP-KetelsJ
Copy link

NXP-KetelsJ commented Oct 10, 2024

I have the same behavior with Polar 1.9.0, Python 3.10 on a Windows machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Status: Done
6 participants