Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of ArrayContentDetector #824

Merged
merged 1 commit into from
Nov 25, 2023

Conversation

stloyd
Copy link
Member

@stloyd stloyd commented Nov 22, 2023

Change Log

Added

Fixed

Changed

  • Improve performance of `ArrayContentDetector`

Removed

Deprecated

Security


Description

https://blackfire.io/profiles/compare/3bb42ece-7de3-489f-b640-44428c15380a/graph

Summary by CodeRabbit

  • Refactor

    • Improved the internal logic for content type detection in arrays to enhance performance and accuracy.
    • Removed an obsolete function related to array key validation, streamlining the class's public interface.
  • Bug Fixes

    • Adjusted array handling methods to prevent potential misclassification of array types.

Copy link
Contributor

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| benchmark             | subject           | revs | its | mem_peak         | mode             | rstdev          |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
| AvroExtractorBench    | bench_extract_10k | 1    | 3   | 34.759mb -0.00%  | 659.298ms -3.71% | ±0.68% -55.95%  |
| CSVExtractorBench     | bench_extract_10k | 1    | 3   | 4.624mb -0.02%   | 298.394ms -0.70% | ±1.83% +322.47% |
| JsonExtractorBench    | bench_extract_10k | 1    | 3   | 4.783mb -0.03%   | 904.474ms -2.29% | ±1.03% +67.63%  |
| ParquetExtractorBench | bench_extract_10k | 1    | 3   | 239.493mb -0.00% | 1.104s +0.70%    | ±1.19% +966.02% |
| TextExtractorBench    | bench_extract_10k | 1    | 3   | 4.562mb +0.01%   | 25.123ms +3.04%  | ±0.82% +5.48%   |
| XmlExtractorBench     | bench_extract_10k | 1    | 3   | 4.563mb +0.01%   | 406.756ms -0.42% | ±0.18% -0.95%   |
+-----------------------+-------------------+------+-----+------------------+------------------+-----------------+
Transformers
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| benchmark                   | subject                  | revs | its | mem_peak         | mode            | rstdev         |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
| RenameEntryTransformerBench | bench_transform_10k_rows | 1    | 3   | 110.247mb -0.00% | 63.124ms +1.56% | ±0.70% -68.14% |
+-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+
Loaders
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark          | subject        | revs | its | mem_peak         | mode             | rstdev         |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| AvroLoaderBench    | bench_load_10k | 1    | 3   | 94.420mb -0.00%  | 441.167ms -0.58% | ±1.13% +81.66% |
| CSVLoaderBench     | bench_load_10k | 1    | 3   | 54.729mb -0.00%  | 70.343ms -1.14%  | ±0.42% -60.95% |
| JsonLoaderBench    | bench_load_10k | 1    | 3   | 105.002mb -0.00% | 56.829ms +4.48%  | ±0.62% -63.91% |
| ParquetLoaderBench | bench_load_10k | 1    | 3   | 320.480mb -0.00% | 1.477s +0.13%    | ±0.64% -55.72% |
| TextLoaderBench    | bench_load_10k | 1    | 3   | 17.604mb -0.01%  | 41.634ms +1.51%  | ±0.33% -28.05% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark               | subject                    | revs | its | mem_peak         | mode             | rstdev          |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| RowsBench               | bench_chunk_10_on_10k      | 2    | 3   | 76.301mb -0.00%  | 2.850ms +5.88%   | ±1.95% +77.43%  |
| RowsBench               | bench_diff_left_1k_on_10k  | 2    | 3   | 96.092mb -0.00%  | 175.253ms -1.44% | ±2.75% +333.99% |
| RowsBench               | bench_diff_right_1k_on_10k | 2    | 3   | 74.618mb -0.00%  | 17.583ms -0.35%  | ±0.88% +216.93% |
| RowsBench               | bench_drop_1k_on_10k       | 2    | 3   | 77.541mb -0.00%  | 1.836ms +12.31%  | ±2.41% +23.22%  |
| RowsBench               | bench_drop_right_1k_on_10k | 2    | 3   | 77.541mb -0.00%  | 1.775ms +7.79%   | ±2.80% -19.38%  |
| RowsBench               | bench_entries_on_10k       | 2    | 3   | 74.653mb -0.00%  | 2.919ms +13.44%  | ±1.63% +292.56% |
| RowsBench               | bench_filter_on_10k        | 2    | 3   | 75.182mb -0.00%  | 15.283ms +9.31%  | ±2.27% +130.76% |
| RowsBench               | bench_find_on_10k          | 2    | 3   | 75.182mb -0.00%  | 14.982ms +6.51%  | ±0.87% +10.62%  |
| RowsBench               | bench_find_one_on_10k      | 10   | 3   | 73.085mb -0.00%  | 2.000μs +11.49%  | ±0.00% -100.00% |
| RowsBench               | bench_first_on_10k         | 10   | 3   | 73.085mb -0.00%  | 0.400μs 0.00%    | ±0.00% 0.00%    |
| RowsBench               | bench_flat_map_on_1k       | 2    | 3   | 86.642mb -0.00%  | 13.193ms +7.89%  | ±2.86% +167.14% |
| RowsBench               | bench_map_on_10k           | 2    | 3   | 116.001mb -0.00% | 62.711ms +2.81%  | ±2.04% -6.32%   |
| RowsBench               | bench_merge_1k_on_10k      | 2    | 3   | 75.703mb -0.00%  | 2.054ms +16.75%  | ±0.62% +114.04% |
| RowsBench               | bench_partition_by_on_10k  | 2    | 3   | 77.969mb -0.00%  | 33.700ms +2.53%  | ±2.67% +273.35% |
| RowsBench               | bench_remove_on_10k        | 2    | 3   | 77.803mb -0.00%  | 4.216ms +10.06%  | ±3.22% +115.78% |
| RowsBench               | bench_sort_asc_on_1k       | 2    | 3   | 73.228mb -0.00%  | 38.582ms -2.16%  | ±0.53% -73.60%  |
| RowsBench               | bench_sort_by_on_1k        | 2    | 3   | 73.229mb -0.00%  | 39.250ms -0.77%  | ±0.60% +49.60%  |
| RowsBench               | bench_sort_desc_on_1k      | 2    | 3   | 73.228mb -0.00%  | 38.957ms -0.88%  | ±0.26% -40.09%  |
| RowsBench               | bench_sort_entries_on_1k   | 2    | 3   | 75.527mb -0.00%  | 7.446ms +0.57%   | ±1.44% +239.73% |
| RowsBench               | bench_sort_on_1k           | 2    | 3   | 73.085mb -0.00%  | 28.902ms +1.92%  | ±1.13% +348.83% |
| RowsBench               | bench_take_1k_on_10k       | 10   | 3   | 73.085mb -0.00%  | 13.666μs +4.23%  | ±1.50% +108.95% |
| RowsBench               | bench_take_right_1k_on_10k | 10   | 3   | 73.085mb -0.00%  | 16.334μs +1.38%  | ±1.26% -23.16%  |
| RowsBench               | bench_unique_on_1k         | 2    | 3   | 96.093mb -0.00%  | 180.852ms +0.43% | ±1.24% +324.88% |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 59.174mb -0.00%  | 318.958ms -2.52% | ±0.07% -93.33%  |
| TypeDetectorBench       | bench_type_detector        | 1    | 3   | 14.097mb -0.01%  | 62.847ms -1.99%  | ±0.47% +177.42% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 115.844mb -0.00% | 357.014ms -4.54% | ±0.71% +108.76% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 59.562mb -0.00%  | 184.208ms -2.05% | ±1.30% +180.48% |
| NativeEntryFactoryBench | bench_entry_factory        | 1    | 3   | 14.687mb -0.01%  | 40.235ms +2.99%  | ±2.56% +181.70% |
+-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+

coderabbitai[bot]

This comment was marked as outdated.

@flow-php flow-php deleted a comment from coderabbitai bot Nov 23, 2023
@norberttech norberttech merged commit 7f0a6b9 into flow-php:1.x Nov 25, 2023
20 checks passed
@stloyd stloyd deleted the chore/perf-array-detect branch November 25, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants