Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Regex perf tests from industry benchmarks #2125

Merged
merged 4 commits into from
Nov 17, 2021

Conversation

stephentoub
Copy link
Member

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the run time is acceptable to the perf folks.

How long does all this take to run locally?

@stephentoub
Copy link
Member Author

BTW, I tried recompressing with maximum compression gzip and it only saves about 10% fwiw

I generated these locally with new GZipStream(..., CompressionLevel.SmallestSize).

@stephentoub
Copy link
Member Author

How long does all this take to run locally?

A while. It's ~200 benchmarks per platform target, e.g. when I run them comparing main vs a pr, it's ~400 benchmarks to be run.

@danmoseley
Copy link
Member

I generated these locally with new GZipStream(..., CompressionLevel.SmallestSize).

I used 7zip with gzip selected and compression level "Ultra" ... 3200.txt goes from 6.21 MB to 5.93 MB. 7z and bz2 formats are about 4.9 MB, of course we can't read those..

@danmoseley
Copy link
Member

A while. It's ~200 benchmarks per platform target, e.g. when I run them comparing main vs a pr, it's ~400 benchmarks to be run.

@DrewScoggins how do you feel about run time here? @kunalspathak if we add 200 more scenarios, does that significantly affect triage (eg., if we regress 50 of them together)?

@kunalspathak
Copy link
Member

@DrewScoggins how do you feel about run time here?

Something that @DrewScoggins would know exactly, but we have fewer arm64 machines and currently backlogged with existing benchmarks itself. With that said, I don't think we should stop ourselves from adding more benchmarks. I think we should just increase the machine capacity.

@kunalspathak if we add 200 more scenarios, does that significantly affect triage (eg., if we regress 50 of them together)?

It depends on how flaky these are. (Eventually) when we improve the noise filtration logic, it shouldn't be a problem.

@DrewScoggins
Copy link
Member

@DrewScoggins how do you feel about run time here?

Something that @DrewScoggins would know exactly, but we have fewer arm64 machines and currently backlogged with existing benchmarks itself. With that said, I don't think we should stop ourselves from adding more benchmarks. I think we should just increase the machine capacity.

Believe me, I would also love to get more machines! In the meantime, I am running the tests locally to get an idea of the total amount of time they will be adding (26 minutes). Like @kunalspathak said, the only real place where we are resource constrained is on Arm64 machine, but I don't believe that this will be prohibitive, and if it is we can revisit what tests we run on Arm64.

@kunalspathak if we add 200 more scenarios, does that significantly affect triage (eg., if we regress 50 of them together)?

It depends on how flaky these are. (Eventually) when we improve the noise filtration logic, it shouldn't be a problem.

Having a bunch regress all at once from a product change will not be a big deal. We can already handle that scenario.

@danmoseley
Copy link
Member

thanks @DrewScoggins . Also, after this is in, is it possible to get a 6.0 (and ideally 5.0) baseline number that shows up in the graphs, or does that happen automatically?

@DrewScoggins
Copy link
Member

If we want to get baseline numbers for 5.0 and 6.0, we will need to backport these tests. It will be easy to do for 6.0, just make a PR to the release/6.0 branch and it will get picked up. For 5.0 it will require some more work, as we didn't have the system we have now for release branches when we forked for 5.0. I have made this issue, #2129, to track the work that we would need to do. In the meantime, maybe we could do a one off run on some lab machines to get some comparison numbers for 5.0 vs 6.0 vs today for these tests?

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @stephentoub

Now let's make sure that .NET is the best for all of the test cases ;)

@adamsitnik
Copy link
Member

One of the CI legs failed with mysterious python error:

[2021/11/12 02:46:47][INFO] // Found 5 benchmarks:
[2021/11/12 02:46:47][INFO] //   Perf_Regex_Industry_Leipzig.Count: Job-RCHHKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog, InvocationCount=1, IterationCount=1, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=(?i)Tom|Sawyer|Huckleberry|Finn, Options=None]
[2021/11/12 02:46:47][INFO] //   Perf_Regex_Industry_Leipzig.Count: Job-RCHHKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog, InvocationCount=1, IterationCount=1, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=.{2,4}(Tom|Sawyer|Huckleberry|Finn), Options=Compiled]
[2021/11/12 02:46:47][INFO] //   Perf_Regex_Industry_Leipzig.Count: Job-RCHHKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog, InvocationCount=1, IterationCount=1, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=Twain, Options=None]
[2021/11/12 02:46:47][INFO] //   Perf_Regex_Industry_Leipzig.Count: Job-RCHHKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog, InvocationCount=1, IterationCount=1, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=[a-z]shing, Options=Compiled]
[2021/11/12 02:47:34][INFO] $ popd
Traceback (most recent call last):
  File "C:\h\w\A568097A\p\scripts\benchmarks_ci.py", line 250, in <module>
    __main(sys.argv[1:])
  File "C:\h\w\A568097A\p\scripts\benchmarks_ci.py", line 226, in __main
    micro_benchmarks.run(
  File "C:\h\w\A568097A\p\scripts\micro_benchmarks.py", line 310, in run
    BENCHMARKS_CSPROJ.run(
  File "C:\h\w\A568097A\p\scripts\dotnet.py", line 467, in run
    RunCommand(cmdline, verbose=verbose).run(
  File "C:\h\w\A568097A\p\scripts\performance\common.py", line 211, in run
    (returncode, quoted_cmdline) = self.__runinternal(working_directory)
  File "C:\h\w\A568097A\p\scripts\performance\common.py", line 200, in __runinternal
    for line in iter(proc.stdout.readline, ''):
  File "C:\python3.9.1\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 334: invalid continuation byte

I'll re-run it.

@adamsitnik
Copy link
Member

I was unable to repro the CI failure locally:

git clone https://github.com/stephentoub/performance.git && cd performance && git checkout regextests
py .\scripts\benchmarks_ci.py -f net461 --filter *Regex* --bdn-arguments="--iterationCount 1 --warmupCount 0 --invocationCount 1 --unrollFactor 1 --strategy ColdStart --stopOnFirstError true" 

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
@danmoseley
Copy link
Member

failure --

MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=Tom.{10,25}river|river.{10,25}Tom, Options=None]
[2021/11/17 15:39:15][INFO] //   Perf_Regex_Industry_Leipzig.Count: Job-RCHHKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog, InvocationCount=1, IterationCount=1, IterationTime=250.0000 ms, MaxIterationCount=20, MinIterationCount=15, RunStrategy=ColdStart, UnrollFactor=1, WarmupCount=0) [Pattern=Twain, Options=Compiled]
[2021/11/17 15:40:03][INFO] $ popd
Traceback (most recent call last):
  File "C:\h\w\A6DB097E\p\scripts\benchmarks_ci.py", line 250, in <module>
    __main(sys.argv[1:])
  File "C:\h\w\A6DB097E\p\scripts\benchmarks_ci.py", line 226, in __main
    micro_benchmarks.run(
  File "C:\h\w\A6DB097E\p\scripts\micro_benchmarks.py", line 310, in run
    BENCHMARKS_CSPROJ.run(
  File "C:\h\w\A6DB097E\p\scripts\dotnet.py", line 467, in run
    RunCommand(cmdline, verbose=verbose).run(
  File "C:\h\w\A6DB097E\p\scripts\performance\common.py", line 211, in run
    (returncode, quoted_cmdline) = self.__runinternal(working_directory)
  File "C:\h\w\A6DB097E\p\scripts\performance\common.py", line 200, in __runinternal
    for line in iter(proc.stdout.readline, ''):
  File "C:\python3.9.1\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 78: invalid continuation byte

@DrewScoggins have we seen this before?

@danmoseley
Copy link
Member

BTW surely if you just run this test on .NET Framework locally and it passes that's good enough to check in.

@DrewScoggins
Copy link
Member

Yes we were seeing stuff like this, and it was fixed with #2108.

BTW surely if you just run this test on .NET Framework locally and it passes that's good enough to check in.

Hard to say, because our lab machines are not identical to the CI machines. We have never had a situation where something works on local machines, but fails on the CI VMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants