Add PHPBench tool and first benchmark example #581

stloyd · 2023-10-13T16:23:07Z

Change Log

Added

Add PHPBench tool and first benchmark example

Fixed

Changed

Removed

Deprecated

Security

Description

Docs: https://phpbench.readthedocs.io/en/latest/quick-start.html

Refs: #560

Report:

composer run-script test:benchmark
> tools/phpbench/vendor/bin/phpbench run --report=aggregate --retry-threshold=5
PHPBench (1.2.14) running benchmarks... #standwithukraine
with configuration file: /Users/stloyd/Documents/flow/phpbench.json
with PHP version 8.1.24, xdebug ❌, opcache ❌

.......... 

Subjects: 10, Assertions: 0, Failures: 0, Errors: 0
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| benchmark                           | subject                    | set | revs | its | mem_peak | mode     | rstdev |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| AvroExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.816μs  | ±1.03% |
| CSVExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.139μs  | ±2.73% |
| JsonExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.242μs  | ±2.88% |
| ParquetExtractorBench               | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.039μs  | ±2.38% |
| TextExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.713μs  | ±2.49% |
| XmlExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.053μs  | ±2.32% |
| RenameEntryTransformerBench         | bench_transform            |     | 1000 | 5   | 3.627mb  | 23.397μs | ±2.78% |
| EntryExpressionEvalTransformerBench | bench_transform_json_row   |     | 1000 | 5   | 3.627mb  | 14.574μs | ±1.08% |
| EntryExpressionEvalTransformerBench | bench_transform_string_row |     | 1000 | 5   | 3.627mb  | 14.349μs | ±0.62% |
| EntryExpressionEvalTransformerBench | bench_transform_xml_row    |     | 1000 | 5   | 3.627mb  | 40.931μs | ±1.04% |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+

src/core/etl/tests/Flow/ETL/Tests/Benchmark/Row/Reference/Expression/AddJsonBench.php

composer.json

src/core/etl/tests/Flow/ETL/Tests/Benchmark/Row/Reference/Expression/AddJsonBench.php

norberttech · 2023-10-14T10:59:31Z

This looks great!
Now, we need to think about what we would like to monitor.
Your example looks nice, but it does not say anything about what is tested there. It can help to notice some memory leaks and maybe even a performance degradation, but still without any details on what is leaking or where the bottleneck is.

I was thinking about creating benchmarks for specific building blocks separately, for example:

Extractors - we could come up with some dataset schema, save it as all supported file types, and just benchmark extraction without doing any operations on the dataset.
Transformers - since we reduced the number of transformers, keeping only critical ones, we might want to start at least from those most frequently used, like the one that evaluates expressions. Here, I think we can take a similar approach, but instead of using extractors, we can directly pass prepared Rows to it and measure the performance of transformations themselves.
Expressions - just like with Transformers, but here we don't even need Rows. Single Row should be enough
Loaders - similarly to Transformers, prepare Rows and execute Loading them into the destination directly

Those are very granular benchmarks, which can test all building blocks separately, providing clear insights about each element separately. However, on top of that, I would probably still try to benchmark entire Pipelines on a selected subset of the most frequently used extractors/loaders/transformers (we would need to develop a few scenarios here).

So, to summarize, in order to finish this initial setup, I would probably start by preparing benchmarks for each of the elements I described above, except the global scenarios for now. This will not only be a good starting point for us but also a pretty nice template for anyone who would like to contribute, even without a full understanding of how the entire project works.

I'm not sure what are the right numbers for revisions (this is what revs stands for?) and iterations, we would need to find a sweet spot between time/value. We might need to use different values for different building blocks because for example Expressions would show any performance degradation only after a couple hundred of iterations when extractors might need only a few and a bigger input.
If we could run benchmarks of each building block in parallel, that would be even more amazing since it could reduce the time of all benchmarks significantly.

Those are my thoughts about adding phpbench to the project, in the past I made a few attempts to use it in other projects, and what I wrote here is pretty much a summary of my past experiences. I would love to hear some thoughts about it or different propositions.

github-actions bot added core size: S labels Oct 13, 2023

stloyd requested a review from norberttech October 13, 2023 16:23

stloyd changed the title ~~Add note about rework of transformers into UPGRADE.md file~~ Add PHPBench tool and first benchmark example Oct 13, 2023

norberttech reviewed Oct 13, 2023

View reviewed changes

src/core/etl/tests/Flow/ETL/Tests/Benchmark/Row/Reference/Expression/AddJsonBench.php Outdated Show resolved Hide resolved

norberttech reviewed Oct 13, 2023

View reviewed changes

composer.json Show resolved Hide resolved

norberttech reviewed Oct 13, 2023

View reviewed changes

src/core/etl/tests/Flow/ETL/Tests/Benchmark/Row/Reference/Expression/AddJsonBench.php Outdated Show resolved Hide resolved

stloyd force-pushed the feature/phpbench-intro branch 2 times, most recently from 3e2f98d to 637bdd9 Compare October 14, 2023 07:31

github-actions bot added ci/cd size: M and removed size: S labels Oct 14, 2023

stloyd force-pushed the feature/phpbench-intro branch 4 times, most recently from ef2dca0 to 2075d7c Compare October 14, 2023 07:39

stloyd requested a review from norberttech October 14, 2023 07:40

stloyd force-pushed the feature/phpbench-intro branch 2 times, most recently from 3132377 to 341bf2c Compare October 15, 2023 09:43

github-actions bot added size: S and removed size: M labels Oct 15, 2023

stloyd force-pushed the feature/phpbench-intro branch from 341bf2c to 97c3346 Compare October 15, 2023 10:00

github-actions bot added adapter-csv adapter-json adapter-text adapter-xml size: M and removed size: S labels Oct 15, 2023

Add PHPBench tool and first benchmark example

c0a8258

stloyd force-pushed the feature/phpbench-intro branch from 97c3346 to c0a8258 Compare October 15, 2023 10:40

github-actions bot added adapter-ampavrohp adapter-parquet labels Oct 15, 2023

norberttech merged commit 68916ab into flow-php:1.x Oct 16, 2023
17 checks passed

stloyd deleted the feature/phpbench-intro branch October 16, 2023 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PHPBench tool and first benchmark example #581

Add PHPBench tool and first benchmark example #581

stloyd commented Oct 13, 2023 •

edited

Loading

norberttech commented Oct 14, 2023

Add PHPBench tool and first benchmark example #581

Add PHPBench tool and first benchmark example #581

Conversation

stloyd commented Oct 13, 2023 • edited Loading

Change Log

Added

Fixed

Changed

Removed

Deprecated

Security

Description

norberttech commented Oct 14, 2023

stloyd commented Oct 13, 2023 •

edited

Loading