Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Struct contents with {:list, {:struct, %{...}} #847

Closed
maennchen opened this issue Feb 6, 2024 · 7 comments · Fixed by #849
Closed

Invalid Struct contents with {:list, {:struct, %{...}} #847

maennchen opened this issue Feb 6, 2024 · 7 comments · Fixed by #849

Comments

@maennchen
Copy link
Contributor

maennchen commented Feb 6, 2024

Versions

Explorer: 0.8.0
Elixir: 1.16.0
OTP: 26.2.1

Reproduction

dbg Series.from_list([[%{"a" => 1}]], dtype: {:list, {:struct, %{"a" => :integer}}})
#=> #Explorer.Series<
#   Polars[1]
#   list[struct[1]] [[%{"a" => 1}]]
# >
dbg Series.from_list([[], [%{"a" => 1}]], dtype: {:list, {:struct, %{"a" => :integer}}})
#=> #Explorer.Series<
#   Polars[2]
#   list[struct[1]] [[], [%{"" => nil}]]
# >

Expected

The series with an empty list at the start should have [%{"a" => 1}] as the value for the second item and not [%{"" => nil}].

@philss
Copy link

philss commented Feb 7, 2024

I'm investigating this. It looks like a problem when we try to create the series in the backend (Rust side). For some reason, the series seems to lose its data (or dtype). For what I could see, the data is arriving correctly when we pass down to Polars. I will try to reproduce it in an isolated project.

philss added a commit that referenced this issue Feb 7, 2024
philss added a commit that referenced this issue Feb 7, 2024
* Fix `from_list/2` of list of structs when fist is empty

Closes #847

* Update test/explorer/series/list_test.exs
@maennchen
Copy link
Contributor Author

maennchen commented Feb 7, 2024

@philss Thanks for your fix!

Unfortunately while testing the newest main, I now often get nif panics.

Code

Mix.install([{:explorer, github: "elixir-explorer/explorer"}])

[
  [],
  [%{"a" => "a", "b" => "b"}]
]
|> Explorer.Series.from_list(dtype: {:list, {:struct, %{"a" => :string, "b" => :string}}})
|> dbg

Running this script 100 times results in 73 panics and 27 successful runs on my machine.

Ok Output

[[], [%{"a" => "a", "b" => "b"}]] #=> [[], [%{"a" => "a", "b" => "b"}]]
|> Explorer.Series.from_list(dtype: {:list, {:struct, %{"a" => :string, "b" => :string}}}) #=> #Explorer.Series<
  Polars[2]
  list[struct[2]] [[], [%{"a" => "a", "b" => "b"}]]
>

Error Output

thread '<unnamed>' panicked at /.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.36.2/src/array/list/mod.rs:82:61:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("ListArray's child's DataType must match. However, the expected DataType is Struct([Field { name: \"a\", data_type: LargeUtf8, is_nullable: true, metadata: {} }, Field { name: \"b\", data_type: LargeUtf8, is_nullable: true, metadata: {} }]) while it got Struct([Field { name: \"b\", data_type: LargeUtf8, is_nullable: true, metadata: {} }, Field { name: \"a\", data_type: LargeUtf8, is_nullable: true, metadata: {} }])."))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
** (ErlangError) Erlang error: :nif_panicked
    (explorer 0.9.0-dev) Explorer.PolarsBackend.Native.s_from_list_of_series("", [#Explorer.PolarsBackend.Series<
    shape: (0,)
    Series: '' [struct[2]]
    [
    ]
  >, #Explorer.PolarsBackend.Series<
    shape: (1,)
    Series: '' [struct[2]]
    [
    	{"b","a"}
    ]
  >])
    (explorer 0.9.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    (explorer 0.9.0-dev) lib/explorer/series.ex:417: Explorer.Series.from_list/2
    ...: (file)
Full RUST_BACKTRACE
thread '<unnamed>' panicked at /.../.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.36.2/src/array/list/mod.rs:82:61:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("ListArray's child's DataType must match. However, the expected DataType is Struct([Field { name: \"a\", data_type: LargeUtf8, is_nullable: true, metadata: {} }, Field { name: \"b\", data_type: LargeUtf8, is_nullable: true, metadata: {} }]) while it got Struct([Field { name: \"b\", data_type: LargeUtf8, is_nullable: true, metadata: {} }, Field { name: \"a\", data_type: LargeUtf8, is_nullable: true, metadata: {} }])."))
stack backtrace:
   0:     0x7f05c347c939 - std::backtrace_rs::backtrace::libunwind::trace::h4c39d39dfbe9783d
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x7f05c347c939 - std::backtrace_rs::backtrace::trace_unsynchronized::h4dc8ace54ae311e5
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f05c347c939 - std::sys_common::backtrace::_print_fmt::hb93bb30078f88f5e
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x7f05c347c939 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc0a088b7e0825f6a
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7f05c13f3f80 - core::fmt::rt::Argument::fmt::h3bf37701eb7e7bdb
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/core/src/fmt/rt.rs:142:9
   5:     0x7f05c13f3f80 - core::fmt::write::h7e6ab5f6daf5c422
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/core/src/fmt/mod.rs:1120:17
   6:     0x7f05c344e642 - std::io::Write::write_fmt::h08c3306cb5bf723d
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/io/mod.rs:1810:15
   7:     0x7f05c347e76e - std::sys_common::backtrace::_print::hafed43d2fa285a46
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7f05c347e76e - std::sys_common::backtrace::print::h39c23a58558ca2e2
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7f05c347def0 - std::panicking::default_hook::{{closure}}::h33423deafbc8e615
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/panicking.rs:272:22
  10:     0x7f05c347f0bc - std::panicking::default_hook::h50ae27bf0009ee3b
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/panicking.rs:292:9
  11:     0x7f05c347f0bc - std::panicking::rust_panic_with_hook::hbc586ac9df6b73a8
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/panicking.rs:779:13
  12:     0x7f05c347eac0 - std::panicking::begin_panic_handler::{{closure}}::hf311b40154e75f12
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/panicking.rs:657:13
  13:     0x7f05c347ea16 - std::sys_common::backtrace::__rust_end_short_backtrace::he9e1eb231bad02f7
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/sys_common/backtrace.rs:171:18
  14:     0x7f05c347ea0f - rust_begin_unwind
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/std/src/panicking.rs:645:5
  15:     0x7f05c111fa14 - core::panicking::panic_fmt::hefa1490703b8b929
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/core/src/panicking.rs:72:14
  16:     0x7f05c111ffc2 - core::result::unwrap_failed::h878cb6d8da1ac1b0
                               at /rustc/d6d7a93866f2ffcfb51828b8859bdad760b54ce0/library/core/src/result.rs:1649:5
  17:     0x7f05c19f938e - polars_arrow::legacy::array::list::AnonymousBuilder::finish::h9feaad5861eb20e0
  18:     0x7f05c2339b5b - <polars_core::chunked_array::builder::list::anonymous::AnonymousOwnedListBuilder as polars_core::chunked_array::builder::list::ListBuilderTrait>::finish::hdaad3d20f84bbcca
  19:     0x7f05c1612394 - <explorer::series::s_from_list_of_series as rustler::nif::Nif>::RAW_FUNC::nif_func::hdafda331b76b1983
  20:     0x55d19e4b79a6 - erts_call_dirty_nif
                               at /home/maennchen/.asdf/plugins/erlang/kerl-home/builds/asdf_26.2.1/otp_src_26.2.1/erts/emulator/beam/erl_nif.c:466:19
  21:     0x55d19e32d61b - erts_dirty_process_main
                               at /home/maennchen/.asdf/plugins/erlang/kerl-home/builds/asdf_26.2.1/otp_src_26.2.1/erts/emulator/beam/beam_common.c:280:23
  22:     0x55d19e25d647 - sched_dirty_cpu_thread_func
                               at /home/maennchen/.asdf/plugins/erlang/kerl-home/builds/asdf_26.2.1/otp_src_26.2.1/erts/emulator/beam/erl_process.c:8718:5
  23:     0x55d19e585313 - thr_wrapper
                               at /home/maennchen/.asdf/plugins/erlang/kerl-home/builds/asdf_26.2.1/otp_src_26.2.1/erts/lib_src/pthread/ethread.c:116:26
  24:     0x7f065f4c6ac3 - start_thread
                               at ./nptl/pthread_create.c:442:8
  25:     0x7f065f558850 - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  26:                0x0 - <unknown>
** (ErlangError) Erlang error: :nif_panicked
    (explorer 0.9.0-dev) Explorer.PolarsBackend.Native.s_from_list_of_series("", [#Explorer.PolarsBackend.Series<
    shape: (0,)
    Series: '' [struct[2]]
    [
    ]
  >, #Explorer.PolarsBackend.Series<
    shape: (1,)
    Series: '' [struct[2]]
    [
    	{"b","a"}
    ]
  >])
    (explorer 0.9.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    (explorer 0.9.0-dev) lib/explorer/series.ex:417: Explorer.Series.from_list/2
    ...: (file)

@maennchen
Copy link
Contributor Author

maennchen commented Feb 7, 2024

I think this might have to do with map order based on the error message.

@philss
Copy link

philss commented Feb 7, 2024

I think this might have to do with map order based on the error message.

@maennchen makes sense! Not sure yet how to solve this, but I'm going to take a look. Thanks for the reports!

@maennchen
Copy link
Contributor Author

@philss To acurately represent the dtype, we probably have to switch from {:struct, %{"a" => :type, "b" => :type}} to {:struct, a: :type, b: :type}.

Otherwise it is impossible to make sure that the structure is represented correctly.

Doing that will allow to sort the columns correctly when reading them from elixir.

@josevalim
Copy link
Member

I think that's a solid idea that will make it look more elegant too.

@philss
Copy link

philss commented Feb 8, 2024

@josevalim @maennchen please take a look: #851. I think it will fix the problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants