Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Property Tests for DataFrame.new #1012

Merged
merged 9 commits into from
Nov 14, 2024

Conversation

maennchen
Copy link
Contributor

@maennchen maennchen commented Oct 29, 2024

Follow up from #1011 (comment)

Property Tests for DataFrame.new seem to be helpful. I already found a panic with it and lots of other issues.

This PR is not finished / ready to merge. I just wanted to give it a go.

Feel free to:

  • Tell me to add / change things
  • Appropriate the PR and do things yourself (It probably makes more sense for someone else to take over and directly get rid off detected issues.)

Example uncovered issue:

mix test --exclude test --include property              
Running ExUnit with seed: 393713, max_cases: 16
Excluding tags: [:test, :cloud_integration]
Including tags: [:property]

thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))


  1) property should be able to create a DataFrame from any valid data / dtype combination (Explorer.DataFrameTest)
     test/explorer/data_frame_test.exs:4714
     ** (ExUnitProperties.Error) failed with generated values (after 8 successful runs):

         * Clause:    dtype <- Explorer.Generator.dtype()
           Generated: {:list, {:list, {:decimal, 2, 1}}}
         
         * Clause:    data <- list_of(fixed_map(%{"field" => Explorer.Generator.data_for_dtype(dtype)}))
           Generated: [%{"field" => [[Decimal.new("50.2")], nil, [nil, Decimal.new("73.1"), Decimal.new("14.4"), nil, Decimal.new("43.3")], nil, nil, nil, [nil, nil, nil], [Decimal.new("9.4"), nil, Decimal.new("72.5"), nil, Decimal.new("94.8"), nil], [Decimal.new("65.7"), Decimal.new("52.3"), nil, Decimal.new("41.1"), Decimal.new("3.6")]]}]

     got exception:

         ** (ArgumentError) cannot create series "field": Erlang error: :nif_panicked
     code: check all(
     stacktrace:
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:578: Explorer.PolarsBackend.DataFrame.series_from_list!/3
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:529: anonymous fn/3 in Explorer.PolarsBackend.DataFrame.from_tabular/2
       (elixir 1.17.3) lib/enum.ex:1703: Enum."-map/2-lists^map/1-1-"/2
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:523: Explorer.PolarsBackend.DataFrame.from_tabular/2
       test/explorer/data_frame_test.exs:4721: anonymous fn/3 in Explorer.DataFrameTest."property should be able to create a DataFrame from any valid data / dtype combination"/1
       (stream_data 1.1.1) lib/stream_data.ex:2367: StreamData.shrink_failure/6
       (stream_data 1.1.1) lib/stream_data.ex:2327: StreamData.check_all/7
       test/explorer/data_frame_test.exs:4715: (test)

.
Finished in 7.7 seconds (7.7s async, 0.00s sync)
617 doctests, 2 properties, 1524 tests, 1 failure, 2141 excluded

Copy link
Contributor

@billylanchantin billylanchantin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this PR, thanks for jumping on it!

Sorry I know it's still in draft, but I've added a few suggestions. Also please note that we have our own home-grown Explorer.Duration struct (until we require 1.17), so Duration.new! won't work.

test/support/generator.ex Outdated Show resolved Hide resolved
test/support/generator.ex Outdated Show resolved Hide resolved
test/support/generator.ex Outdated Show resolved Hide resolved
Co-authored-by: Billy Lanchantin <william.lanchantin@cargosense.com>
@maennchen
Copy link
Contributor Author

maennchen commented Oct 30, 2024

@billylanchantin Thanks for the feedback. All should be fixed / applied.

I have marked this PR as draft since I have no intention of fixing the bugs it uncovers and also not to define more meaningful tests based on it. (somebody more familiar with the internals will do a lot better job at that)

I'll however happily get the generator ready so that someone else can use it to root out all the bugs.

@billylanchantin
Copy link
Contributor

@maennchen The plan of keeping this as a draft for now makes sense to me.

Related: above, you said we (the team) could take over the PR and start making tests based on what we find. Would you mind if I do that? I've made some tweaks to the generator locally and I've already found out a few things.

Findings so far:

  • Decimal has some issues. We'll need to fix.
  • After tweaking the generator, I've not found any non-decimal problems with DF.new (woo!).
  • I have, however, found a ton of problems with our printing/inspecting capabilities. That's actually the root of your original issue NIF panic with list of struct #1011. It's not the DF.new call, it's the dbg call that panics.

Mind if I push my changes?

@maennchen
Copy link
Contributor Author

@billylanchantin Feel free to take over 😊

We should probably also property test the data export formats. #1011 manifests for me both when inspecting as well as when storing as a json or parquet file.

The existing property test used a special case
of the new generator logic. This replaces that
special case with the new generator.
We should be able to serialize any DataFrame
or document the cases where we can't. None of
these are working at the moment.
@billylanchantin billylanchantin marked this pull request as ready for review November 12, 2024 16:16
Copy link
Contributor

@billylanchantin billylanchantin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The team has decided that we'll try to merge a version of this PR with skipped properties. Then we'll try to address the various issues later. We think this approach will be easier to maintain.

I'll also create an issue to track the problems that these property tests have uncovered.

@josevalim
Copy link
Member

Btw, I think we should have property based testing disabled altogether by default, and we enable it on CI only. There is no need to spend CPU cycles on properties for most of the tests runs that the core team and contributors do. They are very valuable though on CI.

@billylanchantin
Copy link
Contributor

@josevalim I agree. I actually thought it was already like that but it's not working. Will fix.

Before we weren't actually skipping property
tests. We were excluding the`property`tag, but
that tag wasn't actually being set.
Turns out we _were_ setting the `property` tag.
@billylanchantin
Copy link
Contributor

@josevalim All is good. Turns out we were excluding property tests outside of CI all along.

Related: TIL that if you use the file:line syntax when running tests, your other --include/--exclude flags are ignored...

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@billylanchantin billylanchantin merged commit 7c5a087 into elixir-explorer:main Nov 14, 2024
3 checks passed
@maennchen maennchen deleted the property_tests branch November 14, 2024 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants