Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support vectorized append and compare for multi group by (#12996)
* simple support vectorized append. * fix tests. * some logs. * add `append_n` in `MaybeNullBufferBuilder`. * impl basic append_batch * fix equal to. * define `GroupIndexContext`. * define the structs useful in vectorizing. * re-define some structs for vectorized operations. * impl some vectorized logics. * impl chekcing hashmap stage. * fix compile. * tmp * define and impl `vectorized_compare`. * fix compile. * impl `vectorized_equal_to`. * impl `vectorized_append`. * finish the basic vectorized ops logic. * impl `take_n`. * fix `renaming clear` and `groups fill`. * fix death loop due to rehashing. * fix vectorized append. * add counter. * use extend rather than resize. * remove dbg!. * remove reserve. * refactor the codes to make simpler and more performant. * clear `scalarized_indices` in `intern` to avoid some corner case. * fix `scalarized_equal_to`. * fallback to total scalarized `GroupValuesColumn` in streaming aggregation. * add unit test for `VectorizedGroupValuesColumn`. * add unit test for emitting first n in `VectorizedGroupValuesColumn`. * sort out tests codes in for group columns and add vectorized tests for primitives. * add vectorized test for byte builder. * add vectorized test for byte view builder. * add test for the all nulls or not nulls branches in vectorized. * fix clippy. * fix fmt. * fix compile in rust 1.79. * improve comments. * fix doc. * add more comments to explain the really complex vectorized intern process. * add comments to explain why we still need origin `GroupValuesColumn`. * remove some stale comments. * fix clippy. * add comments for `vectorized_equal_to` and `vectorized_append`. * fix clippy. * use zip to simplify codes. * use izip to simplify codes. * Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs Co-authored-by: Jay Zhan <jayzhan211@gmail.com> * first_n attempt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * improve hashtable modifying in emit first n test. * add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices. * make comments in VectorizedGroupValuesColumn::intern simpler and clearer. * define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer. * unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`. * fix fmt. * fix comments. * fix clippy. --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com> Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
- Loading branch information