Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Dremel to properly shred/assemble nested structures with nullable elements #778

Merged
merged 2 commits into from
Nov 11, 2023

Conversation

norberttech
Copy link
Member

Change Log

Added

  • Dremel to properly shred/assemble nested structures with nullable elements

Fixed

Changed

  • Dremel algorithms are no longer working as Generators

Removed

Deprecated

Security


Description

While working on Dremel algorithms implementation I was mostly focused on non nullable nested structures, that's how this bug was missed in the first implementation.
After this fix, it will not only properly handle maps/lists with nulls but also empty maps/lists.
Dremel is no longer working as a generator because all data needs to sit in the memory anyway and since we are operating at small integers, this was bringing very little value compared to complexity of debugging it.

@@ -103,25 +121,4 @@ public function test_dremel_shred_on_repeated_columns() : void
]
);
}

public function test_dremel_with_rows_with_combined_scalar_values_and_array_values() : void
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constraint was incorrect since there is a scenario where arrays also contain nulls, not only scalar values.

Copy link
Contributor

github-actions bot commented Nov 11, 2023

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode              | rstdev          |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 35.271mb +0.00%  | 538.754ms +46.99% | ±2.45% +189.05% |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.837mb +0.01%   | 438.088ms +62.60% | ±0.99% +409.14% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 5.007mb +0.01%   | 857.833ms +44.48% | ±3.21% +172.24% |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 233.597mb +2.61% | 1.209s +54.30%    | ±1.61% +288.61% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.831mb +0.01%   | 25.594ms +32.32%  | ±1.04% -28.38%  |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.831mb +0.01%   | 639.734ms +58.02% | ±0.46% -62.15%  |
+-----------------------+-------------------+------+-----+------------------+-------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
| benchmark                   | subject                  | revs | its | mem_peak        | mode             | rstdev        |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 87.098mb +0.00% | 82.279ms +66.81% | ±1.14% -6.46% |
+-----------------------------+--------------------------+------+-----+-----------------+------------------+---------------+
Loaders
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode              | rstdev          |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 95.281mb +0.00%  | 857.591ms +48.70% | ±2.04% +104.61% |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 46.114mb +0.00%  | 82.650ms +17.13%  | ±0.61% -65.91%  |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 90.558mb +0.00%  | 93.111ms +46.61%  | ±1.19% +9.18%   |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 286.030mb +2.23% | 2.463s +87.11%    | ±1.21% +52.68%  |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 16.608mb +0.00%  | 43.523ms +7.33%   | ±1.22% +141.73% |
+--------------------+----------------+------+-----+------------------+-------------------+-----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+
| benchmark               | subject                    | revs | its | mem_peak        | mode              | rstdev                        |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 60.707mb +0.00% | 5.616ms +151.11%  | ±1.25% -39.90%                |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 80.499mb +0.00% | 225.495ms +48.52% | ±0.62% -71.82%                |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 59.025mb +0.00% | 20.509ms +35.79%  | ±1.28% +139.97%               |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 59.846mb +0.00% | 3.071ms +74.09%   | ±2.95% +163.78%               |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 59.846mb +0.00% | 3.197ms +66.44%   | ±3.17% +92.95%                |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 59.059mb +0.00% | 4.767ms +73.34%   | ±1.95% +47.36%                |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 59.588mb +0.00% | 24.109ms +71.44%  | ±3.39% +173.25%               |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 59.588mb +0.00% | 23.871ms +68.72%  | ±2.92% +604.03%               |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 57.659mb +0.00% | 2.206μs +16.11%   | ±2.11% +18061489817610000.00% |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 57.659mb +0.00% | 0.400μs 0.00%     | ±0.00% 0.00%                  |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 65.892mb +0.00% | 16.885ms +61.12%  | ±3.42% +67.57%                |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 91.412mb +0.00% | 73.053ms +54.02%  | ±2.54% +506.37%               |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 60.109mb +0.00% | 3.261ms +66.17%   | ±1.09% -26.66%                |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 62.379mb +0.00% | 59.537ms +77.61%  | ±2.45% +3531.50%              |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 62.209mb +0.00% | 9.349ms +96.04%   | ±2.84% +83.45%                |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 57.659mb +0.00% | 61.812ms +63.11%  | ±0.82% +16.28%                |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 57.659mb +0.00% | 64.372ms +70.50%  | ±0.93% -21.45%                |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 57.659mb +0.00% | 63.206ms +65.30%  | ±1.91% -28.01%                |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 59.933mb +0.00% | 12.215ms +67.36%  | ±3.52% +369.21%               |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 57.658mb +0.00% | 42.951ms +50.60%  | ±2.90% +99.45%                |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 57.659mb +0.00% | 25.596μs +97.20%  | ±2.41% +85.34%                |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 57.659mb +0.00% | 37.688μs +144.63% | ±0.25% -17.98%                |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 80.500mb +0.00% | 223.549ms +42.78% | ±1.20% +307.85%               |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 93.715mb +0.00% | 203.846ms +64.64% | ±1.07% +166.39%               |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 48.632mb +0.00% | 109.705ms +75.31% | ±3.51% +98.71%                |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 12.653mb +0.01% | 24.297ms +62.36%  | ±0.88% -67.86%                |
+-------------------------+----------------------------+------+-----+-----------------+-------------------+-------------------------------+

@norberttech norberttech merged commit b11906c into flow-php:1.x Nov 11, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant