Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize the GroupColumn implementations into more coherent modules #13262

Closed
Tracked by #12680
alamb opened this issue Nov 5, 2024 · 6 comments · Fixed by #13352
Closed
Tracked by #12680

Reorganize the GroupColumn implementations into more coherent modules #13262

alamb opened this issue Nov 5, 2024 · 6 comments · Fixed by #13352
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

Is your feature request related to a problem or challenge?

While reviewing #12996 I noticed that all the implementations of GroupColumn for different types were in datafusion/physical-plan/src/aggregates/group_values/group_column.rs which is:

  1. a single large file
  2. Not obvious that it is like an implementation detail of GroupValuesColumn: https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs#L94-L93

Describe the solution you'd like

I would like to propose rearrainging the code:

  1. Rename https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs to https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column/mod.rs
  2. Move the implementations in datafusion/physical-plan/src/aggregates/group_values/group_column.rs to different modules within https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column (like primitive.rs, bytes.rs, etc)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Nov 5, 2024
@alamb alamb added the good first issue Good for newcomers label Nov 5, 2024
@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2024

Once #12996 is merged, this would be a good first issue I think as it is just code movement and somewhat mechanical It would be a good way to get introduced to the code

@Chen-Yuan-Lai
Copy link
Contributor

take

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2024

#12996 has now merged so this issue should now be unblocked

@jayzhan211
Copy link
Contributor

@Chen-Yuan-Lai Have you started on this?

@Chen-Yuan-Lai
Copy link
Contributor

Chen-Yuan-Lai commented Nov 11, 2024

Yes, I'm working on this, but have some questions:

  1. I trying to rename /column.rs to /multi_column/mod.rs, and move /group_column.rs to different module. simultaneously, I've also moved some shared trait and function to /multi_column/mod.rs. However, /multi_column/mod.rs itself also defines and implements some data structures (e.g.,GroupValuesColumn, GroupIndexView). It feels too complex as a mod.rs file. Should I split it into other files as well?

  2. In the original /group_column.rs, there were some test cases. After splitting /group_column.rs into three modules (/multi_column/byte.rs, /multi_column/byte_view.rs, and /multi_column/primitive.rs), I move the corresponding tests to their respective files. I'm not sure if this is reasonable, or if it would be better to keep all tests together in multi_column/test.rs?"

@jayzhan211
Copy link
Contributor

Or you can name it as multi_column.rs and add the submodule in this this file. Structure is similar to datafusion/expr-common/src/type_coercion.rs and you can place trait in multi_column.rs.

I think it is fine to place test in separated file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
3 participants