Expose an eeIsJitIntrinsic method for quickly determining if a method handle represents an intrinsic method #51124

tannergooding · 2021-04-12T16:08:05Z

This should resolve #51006

tannergooding · 2021-04-12T16:09:54Z

src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs

+        private bool isJitIntrinsic(CORINFO_METHOD_STRUCT_* ftn)
+        {
+            MethodDesc method = HandleToObject(ftn);
+            return method.IsIntrinsic;


I think the only particular thing of note is this (as per https://github.com/dotnet/runtime/compare/main...tannergooding:isJitIntrinsic?expand=1#diff-132a77bcd3f74cf0e0b04fbccda246c97c91e40562d78cb01fff61cf69403573L872-L873) isn't fine grained enough to determine just CORINFO_FLG_INTRINSIC or CORINFO_FLG_JIT_INTRINSIC

AndyAyersMS · 2021-04-12T16:19:59Z

Did you measure crossgen2 throughput with this? You could probably also use crossgen1, it might be simpler.

tannergooding · 2021-04-12T16:27:16Z

Did you measure crossgen2 throughput with this? You could probably also use crossgen1, it might be simpler.

Working on doing that now, I was just planning on running the same benchmark from #51006

tannergooding · 2021-04-12T18:03:46Z

@AndyAyersMS, @jkotas, on my own box, I see the below.

Crossgen2

Base:

    Crossgen2 Throughput - Single - System.Private.Corelib
    Metric               |Average        |Min            |Max
    ---------------------|---------------|---------------|---------------
    Process Time         |2195.994 ms    |2152.472 ms    |2241.161 ms
    Loading Interval     |478.729 ms     |473.132 ms     |488.765 ms
    Emitting Interval    |585.857 ms     |568.642 ms     |601.387 ms
    Jit Interval         |831.327 ms     |810.561 ms     |864.635 ms
    Compilation Interval |2143.333 ms    |2097.013 ms    |2188.326 ms
    Time on Thread       |11848.433 ms   |11634.819 ms   |12010.432 ms

Resolve tokens (+6.03% from Base):

    Crossgen2 Throughput - Single - System.Private.Corelib
    Metric               |Average        |Min            |Max
    ---------------------|---------------|---------------|---------------
    Process Time         |2328.495 ms    |2242.713 ms    |2375.665 ms
    Loading Interval     |494.870 ms     |474.765 ms     |510.811 ms
    Emitting Interval    |582.028 ms     |570.934 ms     |595.669 ms
    Jit Interval         |937.977 ms     |903.238 ms     |985.727 ms
    Compilation Interval |2274.020 ms    |2188.263 ms    |2322.155 ms
    Time on Thread       |13275.996 ms   |12798.085 ms   |13556.782 m

Jan's change (+2.24% from Base, -3.58% from Resolve tokens):

    Crossgen2 Throughput - Single - System.Private.Corelib
    Metric               |Average        |Min            |Max
    ---------------------|---------------|---------------|---------------
    Process Time         |2245.198 ms    |2207.725 ms    |2283.867 ms
    Loading Interval     |484.585 ms     |477.682 ms     |502.124 ms
    Emitting Interval    |569.878 ms     |557.664 ms     |585.911 ms
    Jit Interval         |905.695 ms     |872.238 ms     |917.978 ms
    Compilation Interval |2192.197 ms    |2155.402 ms    |2230.611 ms
    Time on Thread       |13092.984 ms   |12916.836 ms   |13276.031 ms

This change (+1.94% from Base, -3.86% from Resolve tokens, -0.29% from Jan's change):

    Crossgen2 Throughput - Single - System.Private.Corelib
    Metric               |Average        |Min            |Max
    ---------------------|---------------|---------------|---------------
    Process Time         |2238.630 ms    |2178.115 ms    |2301.655 ms
    Loading Interval     |479.965 ms     |473.764 ms     |485.390 ms
    Emitting Interval    |568.050 ms     |564.785 ms     |574.498 ms
    Jit Interval         |894.025 ms     |854.444 ms     |941.489 ms
    Compilation Interval |2186.992 ms    |2126.187 ms    |2249.631 ms
    Time on Thread       |11828.096 ms   |11491.262 ms   |12034.082 ms

Notably, its not clear why this change, which is doing a lot less work than eeGetCallInfo is such a small difference (nor why "maxTime" is worse).
Would the JITEEVersionIdentifer change impact this at all?

jkotas · 2021-04-12T18:13:37Z

The "Time on Thread" has improved significantly with this change. Is it a sign that this change works well?

AndyAyersMS · 2021-04-12T18:19:37Z

Error bars in lab measurements suggest there is a few % variation in run times for crossgen2.

How many runs are you averaging?

tannergooding · 2021-04-12T18:20:43Z

Is it a sign that this change works well?

@DrewScoggins might be able to give more insight into what each value is for.
My hypothesis is Jit Interval is the main one we want to be looking at here, given that these changes are in the JIT and should likely impact that.

There is a good bit of noise here as well, and multiple independent runs can result in some pretty variable results.

For example, in the change I just pushed which drops the impResolveToken call for CEE_CONSTRAINED (since we don't need it given we aren't calling eeGetCallInfo anymore), has higher process time and higher time on thread, but less Jit interval:

    Crossgen2 Throughput - Single - System.Private.Corelib
    Metric               |Average        |Min            |Max
    ---------------------|---------------|---------------|---------------
    Process Time         |2247.620 ms    |2192.448 ms    |2386.135 ms
    Loading Interval     |480.200 ms     |475.420 ms     |485.793 ms
    Emitting Interval    |587.992 ms     |562.316 ms     |660.547 ms
    Jit Interval         |870.072 ms     |846.114 ms     |920.709 ms
    Compilation Interval |2195.054 ms    |2140.109 ms    |2334.002 ms
    Time on Thread       |12126.867 ms   |11708.378 ms   |12998.084 ms

tannergooding · 2021-04-12T18:23:15Z

How many runs are you averaging?

Console output indicates 1 warm up and 5 iterations, for all 4 (now 5) shared results.
The results for the crossgen tests are a bit different from the regular benchmarks.

AndyAyersMS · 2021-04-12T18:32:52Z

Given that we're making measurements where we don't have a good model for the underlying distribution, median may be a more useful thing to report than mean.

Historically for TP measurements the jit team would use crossgen1 running under PIN (or some other instruction counting technique), and there the run to run repeatability was quite good. So you might try doing that instead.

tannergooding · 2021-04-12T18:41:45Z

I can run some more benchmarks later this afternoon. I can collect some instruction counts using AMD uProf (pin largely only works on Intel).

I'll need to figure the command line the benchmark is passing to crossgen as part of that.

DrewScoggins · 2021-04-12T18:50:06Z

if compiletype == const.CROSSGEN2_SINGLEFILE:
            referencefilenames = ['System.*.dll', 'Microsoft.*.dll', 'netstandard.dll', 'mscorlib.dll']
            # single assembly filename: example.dll
            filename, ext = os.path.splitext(self.singlefile)
            outputdir = os.path.join(os.getcwd(), const.CROSSGENDIR)
            if not os.path.exists(outputdir):
                os.mkdir(outputdir)
            outputfile = os.path.join(outputdir, filename+'.ni'+ext )
            
            crossgen2args = [
                os.path.join(self.coreroot, self.singlefile),
                '-o', outputfile,
                '-O'
            ]

            for reffile in referencefilenames:
                crossgen2args.extend(['-r', os.path.join(self.coreroot, reffile)])

Here is the code that we use to generate the command line. Looks like we call crossgen2 with the path to the file we are crossgening, give an output file name and use the -O flag as well as -r for the different ref files.

MichalStrehovsky · 2021-04-12T20:14:39Z

Error bars in lab measurements suggest there is a few % variation in run times for crossgen2.

For the Native AOT compiler (which is pretty much crossgen2), turning off Tiered JIT stabilized throughput by a lot. We actually also had a compilation throughput improvement with that so it's just off by default there.

tannergooding · 2021-04-13T17:30:14Z

@AndyAyersMS, @jkotas

I'm seeing an average of:

22 retired instructions in "Base"
234 retired instructions in "Resolve tokens"
190 retired instructions in "Jan's change"
224 retired instructions in "This change"

It isn't clear to me why this is the case initially, since this should be doing less work now.
Are we aware of anything that might impact this, like changing the JITEEVersionIdentifier or modifying the JIT/EE interface invalidating something (PGO data, some cache, etc)

…te error

…tions

tannergooding · 2021-04-13T18:20:36Z

Rebased to resolve merge conflict

jkotas · 2021-04-13T20:23:40Z

What does the retired instructions number mean?

tannergooding · 2021-04-13T20:30:24Z

What does the retired instructions number mean?

The number of instructions that were executed and correct (meaning this does not include speculatively executed instructions).
This is roughly equivalent to the instruction count output by the pin tool Andy mentioned.

In general, it isn't clear why calling eeIsJitIntrinsic is not cheaper than calling eeGetCallInfo, considering the latter does everything eeIsJitIntrinsic does, plus more.
Hence my wondering if this is potentially due to something like PGO data or otherwise.

AndyAyersMS · 2021-04-13T21:24:56Z

We usually just look at the overall numbers for the whole process, not try and attribute them to particular methods.

tannergooding · 2021-04-13T21:55:55Z

We usually just look at the overall numbers for the whole process, not try and attribute them to particular methods.

Its lost in the noise of the overall process. The average for all 4 variations, measured across 5 runs each, is about 92000 retired instructions with not enough variation between them to see anything discernable in terms of hardware metrics that ETL or uProf collect.

AndyAyersMS · 2021-04-13T22:11:04Z

This small variation in instructions retired is seen across all 4 variants? (base, regression, fix1, fix2...)? If so then something is off in the measurement process.

I can try looking with PIN later and post what I see.

AndyAyersMS · 2021-04-14T03:00:33Z

Here's the full process pin icount data for crossgen SPC. Seems to track your crossgen2 data above fairly well.

Looks like we're maybe 1.5% slower than we were before all this. I think we can live with that.

Baseline
Count 19254296972
Count 19256217769
Count 19253906826

Resolve Tokens
Count 20138271494
Count 20142288269
Count 20140098703

Jan's fix
Count 19988303504
Count 19988649133
Count 19990292972

This change
Count 19566027492
Count 19567077283
Count 19564959405

tannergooding · 2021-04-14T03:07:49Z

Thanks! I'll need to figure out why uProf wasn't reporting the right numbers on my end

tannergooding requested review from jkotas and AndyAyersMS April 12, 2021 16:08

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 12, 2021

tannergooding force-pushed the isJitIntrinsic branch from 72642d8 to daa19d0 Compare April 12, 2021 16:08

tannergooding commented Apr 12, 2021

View reviewed changes

tannergooding added 6 commits April 13, 2021 11:19

Updating the JIT/EE interface to expose isJitIntrinsic

f898d36

Update JITEEVersionIdentifier

69c2d03

Update fgFindJumpTargets to use the much cheaper eeIsJitIntrinsic

031e140

Ensure IsJitIntrinsic exists in the LightWeightMap and fix a copy/pas…

679ec7e

…te error

Run ThunkGenerator

4eaeb57

Don't resolve tokens for CEE_CONSTRAINED when making inlining observa…

d467326

…tions

tannergooding force-pushed the isJitIntrinsic branch from 03c0999 to d467326 Compare April 13, 2021 18:20

AndyAyersMS approved these changes Apr 14, 2021

View reviewed changes

jkotas approved these changes Apr 14, 2021

View reviewed changes

jkotas merged commit f24879f into dotnet:main Apr 14, 2021

JulieLeeMSFT assigned tannergooding Apr 15, 2021

JulieLeeMSFT added this to the 6.0.0 milestone Apr 15, 2021

ghost locked as resolved and limited conversation to collaborators May 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose an eeIsJitIntrinsic method for quickly determining if a method handle represents an intrinsic method #51124

Expose an eeIsJitIntrinsic method for quickly determining if a method handle represents an intrinsic method #51124

tannergooding commented Apr 12, 2021

tannergooding Apr 12, 2021

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

tannergooding commented Apr 12, 2021 •

edited

Loading

jkotas commented Apr 12, 2021

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

tannergooding commented Apr 12, 2021

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

DrewScoggins commented Apr 12, 2021

MichalStrehovsky commented Apr 12, 2021

tannergooding commented Apr 13, 2021

tannergooding commented Apr 13, 2021

jkotas commented Apr 13, 2021

tannergooding commented Apr 13, 2021 •

edited

Loading

AndyAyersMS commented Apr 13, 2021

tannergooding commented Apr 13, 2021

AndyAyersMS commented Apr 13, 2021

AndyAyersMS commented Apr 14, 2021

tannergooding commented Apr 14, 2021

Expose an eeIsJitIntrinsic method for quickly determining if a method handle represents an intrinsic method #51124

Expose an eeIsJitIntrinsic method for quickly determining if a method handle represents an intrinsic method #51124

Conversation

tannergooding commented Apr 12, 2021

tannergooding Apr 12, 2021

Choose a reason for hiding this comment

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

tannergooding commented Apr 12, 2021 • edited Loading

Crossgen2

jkotas commented Apr 12, 2021

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

tannergooding commented Apr 12, 2021

AndyAyersMS commented Apr 12, 2021

tannergooding commented Apr 12, 2021

DrewScoggins commented Apr 12, 2021

MichalStrehovsky commented Apr 12, 2021

tannergooding commented Apr 13, 2021

tannergooding commented Apr 13, 2021

jkotas commented Apr 13, 2021

tannergooding commented Apr 13, 2021 • edited Loading

AndyAyersMS commented Apr 13, 2021

tannergooding commented Apr 13, 2021

AndyAyersMS commented Apr 13, 2021

AndyAyersMS commented Apr 14, 2021

tannergooding commented Apr 14, 2021

tannergooding commented Apr 12, 2021 •

edited

Loading

tannergooding commented Apr 13, 2021 •

edited

Loading