Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PHPBench tool and first benchmark example #581

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

stloyd
Copy link
Member

@stloyd stloyd commented Oct 13, 2023

Change Log

Added

  • Add PHPBench tool and first benchmark example

Fixed

Changed

Removed

Deprecated

Security


Description

Docs: https://phpbench.readthedocs.io/en/latest/quick-start.html

Refs: #560

Report:

composer run-script test:benchmark
> tools/phpbench/vendor/bin/phpbench run --report=aggregate --retry-threshold=5
PHPBench (1.2.14) running benchmarks... #standwithukraine
with configuration file: /Users/stloyd/Documents/flow/phpbench.json
with PHP version 8.1.24, xdebug ❌, opcache ❌

.......... 

Subjects: 10, Assertions: 0, Failures: 0, Errors: 0
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| benchmark                           | subject                    | set | revs | its | mem_peak | mode     | rstdev |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+
| AvroExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.816μs  | ±1.03% |
| CSVExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.139μs  | ±2.73% |
| JsonExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.242μs  | ±2.88% |
| ParquetExtractorBench               | bench_extract              |     | 1000 | 5   | 3.627mb  | 4.039μs  | ±2.38% |
| TextExtractorBench                  | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.713μs  | ±2.49% |
| XmlExtractorBench                   | bench_extract              |     | 1000 | 5   | 3.627mb  | 3.053μs  | ±2.32% |
| RenameEntryTransformerBench         | bench_transform            |     | 1000 | 5   | 3.627mb  | 23.397μs | ±2.78% |
| EntryExpressionEvalTransformerBench | bench_transform_json_row   |     | 1000 | 5   | 3.627mb  | 14.574μs | ±1.08% |
| EntryExpressionEvalTransformerBench | bench_transform_string_row |     | 1000 | 5   | 3.627mb  | 14.349μs | ±0.62% |
| EntryExpressionEvalTransformerBench | bench_transform_xml_row    |     | 1000 | 5   | 3.627mb  | 40.931μs | ±1.04% |
+-------------------------------------+----------------------------+-----+------+-----+----------+----------+--------+

@stloyd stloyd changed the title Add note about rework of transformers into UPGRADE.md file Add PHPBench tool and first benchmark example Oct 13, 2023
@stloyd stloyd force-pushed the feature/phpbench-intro branch 2 times, most recently from 3e2f98d to 637bdd9 Compare October 14, 2023 07:31
@stloyd stloyd force-pushed the feature/phpbench-intro branch 4 times, most recently from ef2dca0 to 2075d7c Compare October 14, 2023 07:39
@norberttech
Copy link
Member

This looks great!
Now, we need to think about what we would like to monitor.
Your example looks nice, but it does not say anything about what is tested there. It can help to notice some memory leaks and maybe even a performance degradation, but still without any details on what is leaking or where the bottleneck is.

I was thinking about creating benchmarks for specific building blocks separately, for example:

  • Extractors - we could come up with some dataset schema, save it as all supported file types, and just benchmark extraction without doing any operations on the dataset.
  • Transformers - since we reduced the number of transformers, keeping only critical ones, we might want to start at least from those most frequently used, like the one that evaluates expressions. Here, I think we can take a similar approach, but instead of using extractors, we can directly pass prepared Rows to it and measure the performance of transformations themselves.
  • Expressions - just like with Transformers, but here we don't even need Rows. Single Row should be enough
  • Loaders - similarly to Transformers, prepare Rows and execute Loading them into the destination directly

Those are very granular benchmarks, which can test all building blocks separately, providing clear insights about each element separately. However, on top of that, I would probably still try to benchmark entire Pipelines on a selected subset of the most frequently used extractors/loaders/transformers (we would need to develop a few scenarios here).

So, to summarize, in order to finish this initial setup, I would probably start by preparing benchmarks for each of the elements I described above, except the global scenarios for now. This will not only be a good starting point for us but also a pretty nice template for anyone who would like to contribute, even without a full understanding of how the entire project works.

I'm not sure what are the right numbers for revisions (this is what revs stands for?) and iterations, we would need to find a sweet spot between time/value. We might need to use different values for different building blocks because for example Expressions would show any performance degradation only after a couple hundred of iterations when extractors might need only a few and a bigger input.
If we could run benchmarks of each building block in parallel, that would be even more amazing since it could reduce the time of all benchmarks significantly.

Those are my thoughts about adding phpbench to the project, in the past I made a few attempts to use it in other projects, and what I wrote here is pretty much a summary of my past experiences. I would love to hear some thoughts about it or different propositions.

@norberttech norberttech merged commit 68916ab into flow-php:1.x Oct 16, 2023
17 checks passed
@stloyd stloyd deleted the feature/phpbench-intro branch October 16, 2023 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants