Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic fuzz testing for parquet predicate pushdown #12115

Open
Tracked by #13648
alamb opened this issue Aug 22, 2024 · 1 comment
Open
Tracked by #13648

Systematic fuzz testing for parquet predicate pushdown #12115

alamb opened this issue Aug 22, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 22, 2024

Is your feature request related to a problem or challenge?

We have several forms of predicate pushdown in DataFusion's Parquet reader. The code path taken depends on the exact data layout and predicates defined

@itsjunetime is working on #4028 to improve performance by being more clever about some of these predicates.

The current code paths taken depend on

  1. Row group size
  2. Sort order of the data within the file
  3. File repartitioning size (how many partitions are read)
  4. Number of row groups
  5. Datapage size
  6. Use predicate pushdown?
  7. Use predicate reordering?

Describe the solution you'd like

I would like some additional test coverage (for correctness) when reading from parquet files with the various forms of pushdown enabled. It is especially important to ensure correctness with the various pushdowns enabled.

Describe alternatives you've considered

I would like to have a test that

  1. Creates multiple parquet files with different orderings / row group distribution etc
  2. Runs the same query on the same input
  3. Compares the results from the different queries and ensures it is the same

Parameters to check

  1. Row group size
  2. Sort order
  3. Number of row groups
  4. Datapage size
  5. Use predicate pushdown
  6. use predicate reordering

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Aug 22, 2024

I would also like to get when the "force string view" feature is enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant