JIT: initial support for reinforcement learning of CSE heuristic #96880

AndyAyersMS · 2024-01-12T04:00:48Z

Adds special CSE heuristic modes to the JIT to support learning a good CSE
heuristic via Policy Gradient, a form of reinforcement learning. The learning
must be orchestrated by an external process, but the JIT does all of the
actual gradient computations.

The orchestration program will be added to jitutils. The overall process
also relies on SPMI and the goal is to minimize perf score.

Introduce two new CSE heuristic policies:

Replay: simply perform indicated sequence of CSEs
RL: used for the Policy Gradient, with 3 modes:
- Stochastic: based on current parameters but allows random variation
- Greedy: based on current parameters, deterministic
- Update: compute updated parameters per Policy Gradient

Also rework the Random policy to be a bit more random, it now alters
both the CSEs performed and the order they are performed in.

Add the ability to have jit config options that specify sequences of ints
or doubles.

Add the ability to just dump metric info for a jitted method, and add
more details (perhaps considerably more) for CSEs. This is all still
simple text format.

Also factor out a common check for "non-viable" candidates -- these are
CSE candidates that won't actually be CSEs. This leads to some minor
diffs as the check is now slightly different for CSEs with zero uses
and/or zero weighted uses.

Contributes to #92915.

ghost · 2024-01-12T04:01:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Initial support for a reinforcement-learning based CSE heuristic.

Author:	AndyAyersMS
Assignees:	AndyAyersMS
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak · 2024-01-18T20:10:22Z

how feasible it is to have a general infrastructure that is driven by features/parameters, so any optimization can plug into it? I want to do something similar for LSRA.

Adds special CSE heuristic modes to the JIT to support learning a good CSE heuristic via Policy Gradient, a form of reinforcement learning. The learning must be orchestrated by an external process, but the JIT does all of the actual gradient computations. The orchestration program will be added to jitutils. The overall process also relies on SPMI and the goal is to minimize perf score. Introduce two new CSE heuristic policies: * Replay: simply perform indicated sequence of CSEs * RL: used for the Policy Gradient, with 3 modes: * Stochastic: based on current parameters but allows random variation * Greedy: based on current parameters, deterministic * Update: compute updated parameters per Policy Gradient Also rework the Random policy to be a bit more random, it now alters both the CSEs performed and the order they are performed in. Add the ability to have jit config options that specify sequences of ints or doubles. Add the ability to just dump metric info for a jitted method, and add more details (perhaps considerably more) for CSEs. This is all still simple text format. Also factor out a common check for "non-viable" candidates -- these are CSE candidates that won't actually be CSEs. This leads to some minor diffs as the check is now slightly different for CSEs with zero uses and/or zero weighted uses. Contributes to dotnet#92915.

AndyAyersMS · 2024-01-22T19:47:09Z

@dotnet/jit-contrib FYI

Not sure who wants to review this one. Any volunteers?

AndyAyersMS · 2024-01-22T20:04:09Z

how feasible it is to have a general infrastructure that is driven by features/parameters, so any optimization can plug into it? I want to do something similar for LSRA.

Somewhat? The basic structure is common to lots of problems, the tricky bit is figuring out the right state/action model and to either handle this across a jit/host/orchestrator boundary or externalize all the info from the jit so it can be processed entirely by outside code.

Let me describe briefly how this all works and maybe we can brainstorm about how to leverage it for your case.

The "RL" mode for CSEs has 3 behaviors:

Evaluation/Exploration -- an external agent supplies a random seed, step size, and parameters (a vector of numbers) to the jit via config and drives the jit on a method. The jit uses a stochastic soft-max policy to produce a particular sequence of CSEs, and, at the end of jitting, a writes metrics including perf score plus a description of what CSEs were done.
Update -- an external agent supplies parameters and per-step rewards (estimated changes in perf score for each sub-sequence of CSEs). The jit uses the PolicyGradient algorithm to compute updates to the parameters, and writes those out via metrics.
Greedy -- an external agent supplies parameters and the JIT runs a greedy policy (always chose best option) using those.

The orchestration process repeatedly cycles through evaluation/exploration + update steps. This process should converge to a set of parameters that (via greedy policy) should obtain the optimal perf score for that method (or scores for sets of methods).

In the background the orchestrator also computes "V" and "Q" estimates using the data from each run; this is used to compute increasingly accurate per-step rewards.

kunalspathak · 2024-01-22T23:14:29Z

Diff results for #96880

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,501,661 contexts (1,003,806 MinOpts, 1,497,855 FullOpts).

MISSED contexts: base: 3,546 (0.14%), diff: 3,556 (0.14%)

Overall (+1,884 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	80,912,504	-64
coreclr_tests.run.linux.arm64.checked.mch	509,947,608	+260
libraries.pmi.linux.arm64.checked.mch	75,989,468	+168
libraries_tests.run.linux.arm64.Release.mch	381,322,764	+1,596
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	162,528,980	-76

FullOpts (+1,884 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	55,976,044	-64
coreclr_tests.run.linux.arm64.checked.mch	160,722,552	+260
libraries.pmi.linux.arm64.checked.mch	75,869,484	+168
libraries_tests.run.linux.arm64.Release.mch	166,025,624	+1,596
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	149,047,768	-76

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,595,039 contexts (1,052,329 MinOpts, 1,542,710 FullOpts).

MISSED contexts: 3,596 (0.14%)

Overall (-236 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	66,799,627	+62
coreclr_tests.run.linux.x64.checked.mch	458,880,954	+193
libraries.pmi.linux.x64.checked.mch	59,972,991	+90
libraries_tests.run.linux.x64.Release.mch	329,977,293	-591
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	130,000,373	+10

FullOpts (-236 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	46,969,870	+62
coreclr_tests.run.linux.x64.checked.mch	132,322,819	+193
libraries.pmi.linux.x64.checked.mch	59,860,121	+90
libraries_tests.run.linux.x64.Release.mch	145,587,772	-591
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	119,341,902	+10

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,263,032 contexts (930,876 MinOpts, 1,332,156 FullOpts).

MISSED contexts: base: 2,925 (0.13%), diff: 2,933 (0.13%)

Overall (+1,512 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	34,569,912	-208
coreclr_tests.run.osx.arm64.checked.mch	485,471,332	+228
libraries.pmi.osx.arm64.checked.mch	79,954,016	+112
libraries_tests.run.osx.arm64.Release.mch	312,684,004	+1,452
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	160,787,844	-72

FullOpts (+1,512 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	18,096,632	-208
coreclr_tests.run.osx.arm64.checked.mch	153,164,876	+228
libraries.pmi.osx.arm64.checked.mch	79,832,888	+112
libraries_tests.run.osx.arm64.Release.mch	108,743,500	+1,452
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	147,650,316	-72

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,318,296 contexts (931,543 MinOpts, 1,386,753 FullOpts).

MISSED contexts: base: 2,587 (0.11%), diff: 2,598 (0.11%)

Overall (-952 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	47,215,296	+104
coreclr_tests.run.windows.arm64.checked.mch	495,322,244	-296
libraries.pmi.windows.arm64.checked.mch	79,562,252	+104
libraries_tests.run.windows.arm64.Release.mch	309,737,828	-728
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	169,000,724	-136

FullOpts (-952 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	30,964,912	+104
coreclr_tests.run.windows.arm64.checked.mch	156,230,716	-296
libraries.pmi.windows.arm64.checked.mch	79,442,268	+104
libraries_tests.run.windows.arm64.Release.mch	108,156,324	-728
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	155,863,260	-136

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,492,949 contexts (983,689 MinOpts, 1,509,260 FullOpts).

MISSED contexts: base: 3,859 (0.15%), diff: 3,862 (0.15%)

Overall (-2,082 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	41,760,738	-1,310
benchmarks.run_pgo.windows.x64.checked.mch	34,741,684	+112
benchmarks.run_tiered.windows.x64.checked.mch	12,662,284	-10
coreclr_tests.run.windows.x64.checked.mch	392,866,349	+302
libraries.pmi.windows.x64.checked.mch	61,196,926	+84
libraries_tests.run.windows.x64.Release.mch	279,034,736	-1,249
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	133,435,993	-6
realworld.run.windows.x64.checked.mch	14,170,685	-5

FullOpts (-2,082 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	27,102,013	-1,310
benchmarks.run_pgo.windows.x64.checked.mch	20,506,707	+112
benchmarks.run_tiered.windows.x64.checked.mch	3,477,018	-10
coreclr_tests.run.windows.x64.checked.mch	119,323,357	+302
libraries.pmi.windows.x64.checked.mch	61,083,407	+84
libraries_tests.run.windows.x64.Release.mch	100,666,420	-1,249
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	123,012,623	-6
realworld.run.windows.x64.checked.mch	13,780,980	-5

Details here

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,212 contexts (827,812 MinOpts, 1,410,400 FullOpts).

MISSED contexts: base: 74,052 (3.20%), diff: 74,066 (3.20%)

Overall (-2,444 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	60,232,784	+188
coreclr_tests.run.linux.arm.checked.mch	321,775,434	+454
libraries.pmi.linux.arm.checked.mch	49,549,380	+86
libraries_tests.run.linux.arm.Release.mch	241,718,592	-3,166
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	93,040,870	-6

FullOpts (-2,444 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	49,435,182	+188
coreclr_tests.run.linux.arm.checked.mch	109,045,300	+454
libraries.pmi.linux.arm.checked.mch	49,442,876	+86
libraries_tests.run.linux.arm.Release.mch	119,715,648	-3,166
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	82,957,050	-6

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,299,277 contexts (841,817 MinOpts, 1,457,460 FullOpts).

MISSED contexts: base: 2,090 (0.09%), diff: 2,093 (0.09%)

Overall (-81 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	43,746,356	-45
coreclr_tests.run.windows.x86.checked.mch	308,815,477	+159
libraries_tests.run.windows.x86.Release.mch	186,076,300	-175
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	102,171,676	-20

FullOpts (-81 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	37,116,866	-45
coreclr_tests.run.windows.x86.checked.mch	107,143,708	+159
libraries_tests.run.windows.x86.Release.mch	87,744,793	-175
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	93,501,884	-20

Details here

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

Details here

Throughput diffs for linux/arm64 ran on linux/x64

FullOpts (-0.01% to -0.00%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	-0.01%

Details here

AndyAyersMS · 2024-01-23T15:47:32Z

@EgorBo can you take a look?

EgorBo · 2024-01-23T16:07:38Z

@EgorBo can you take a look?

Sure, need to rewatch your internal talk that I missed first 🙂

ryujit-bot · 2024-01-24T18:00:07Z

Diff results for #96880

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

Details here

ryujit-bot · 2024-01-24T19:36:44Z

Diff results for #96880

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,501,147 contexts (1,003,806 MinOpts, 1,497,341 FullOpts).

MISSED contexts: base: 4,060 (0.16%), diff: 4,070 (0.16%)

Overall (+3,884 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	81,131,652	-88
coreclr_tests.run.linux.arm64.checked.mch	509,821,816	+240
libraries.pmi.linux.arm64.checked.mch	76,017,172	+168
libraries_tests.run.linux.arm64.Release.mch	381,444,832	+3,672
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	162,653,504	-108

FullOpts (+3,884 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch	56,195,192	-88
coreclr_tests.run.linux.arm64.checked.mch	160,596,760	+240
libraries.pmi.linux.arm64.checked.mch	75,897,188	+168
libraries_tests.run.linux.arm64.Release.mch	166,147,692	+3,672
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch	149,172,292	-108

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,595,007 contexts (1,052,329 MinOpts, 1,542,678 FullOpts).

MISSED contexts: 3,628 (0.14%)

Overall (-365 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	68,635,056	+62
coreclr_tests.run.linux.x64.checked.mch	459,551,078	+224
libraries.pmi.linux.x64.checked.mch	60,144,132	+90
libraries_tests.run.linux.x64.Release.mch	333,558,929	-751
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	130,468,363	+10

FullOpts (-365 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch	48,805,299	+62
coreclr_tests.run.linux.x64.checked.mch	132,992,943	+224
libraries.pmi.linux.x64.checked.mch	60,031,262	+90
libraries_tests.run.linux.x64.Release.mch	149,169,408	-751
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch	119,809,892	+10

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,701 contexts (930,876 MinOpts, 1,331,825 FullOpts).

MISSED contexts: base: 3,256 (0.14%), diff: 3,264 (0.14%)

Overall (+2,152 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	34,667,904	-216
coreclr_tests.run.osx.arm64.checked.mch	485,378,220	+224
libraries.pmi.osx.arm64.checked.mch	79,949,748	+112
libraries_tests.run.osx.arm64.Release.mch	312,903,680	+2,104
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	160,908,056	-72

FullOpts (+2,152 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch	18,194,624	-216
coreclr_tests.run.osx.arm64.checked.mch	153,071,764	+224
libraries.pmi.osx.arm64.checked.mch	79,828,620	+112
libraries_tests.run.osx.arm64.Release.mch	108,963,176	+2,104
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch	147,770,528	-72

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,318,196 contexts (931,543 MinOpts, 1,386,653 FullOpts).

MISSED contexts: base: 2,687 (0.12%), diff: 2,698 (0.12%)

Overall (-216 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	47,390,952	+76
coreclr_tests.run.windows.arm64.checked.mch	495,369,076	-312
libraries.pmi.windows.arm64.checked.mch	79,588,924	+104
libraries_tests.run.windows.arm64.Release.mch	310,509,936	+52
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	169,130,064	-136

FullOpts (-216 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch	31,140,568	+76
coreclr_tests.run.windows.arm64.checked.mch	156,277,548	-312
libraries.pmi.windows.arm64.checked.mch	79,468,940	+104
libraries_tests.run.windows.arm64.Release.mch	108,928,432	+52
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch	155,992,600	-136

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,492,909 contexts (983,689 MinOpts, 1,509,220 FullOpts).

MISSED contexts: base: 3,899 (0.16%), diff: 3,902 (0.16%)

Overall (-2,169 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	42,176,983	-1,181
benchmarks.run_pgo.windows.x64.checked.mch	35,391,293	+101
benchmarks.run_tiered.windows.x64.checked.mch	12,661,498	-10
coreclr_tests.run.windows.x64.checked.mch	393,404,923	+118
libraries.pmi.windows.x64.checked.mch	61,389,190	+84
libraries_tests.run.windows.x64.Release.mch	281,642,309	-1,270
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	133,913,806	-6
realworld.run.windows.x64.checked.mch	14,170,687	-5

FullOpts (-2,169 bytes)

Collection	Base size (bytes)	Diff size (bytes)
aspnet.run.windows.x64.checked.mch	27,518,258	-1,181
benchmarks.run_pgo.windows.x64.checked.mch	21,156,316	+101
benchmarks.run_tiered.windows.x64.checked.mch	3,476,232	-10
coreclr_tests.run.windows.x64.checked.mch	119,861,931	+118
libraries.pmi.windows.x64.checked.mch	61,275,671	+84
libraries_tests.run.windows.x64.Release.mch	103,273,993	-1,270
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	123,490,436	-6
realworld.run.windows.x64.checked.mch	13,780,982	-5

Details here

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,237,676 contexts (827,812 MinOpts, 1,409,864 FullOpts).

MISSED contexts: base: 74,588 (3.23%), diff: 74,602 (3.23%)

Overall (-2,188 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	61,255,640	+192
coreclr_tests.run.linux.arm.checked.mch	321,788,912	+404
libraries.pmi.linux.arm.checked.mch	49,610,860	+86
libraries_tests.run.linux.arm.Release.mch	242,758,250	-2,864
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	93,199,432	-6

FullOpts (-2,188 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch	50,458,038	+192
coreclr_tests.run.linux.arm.checked.mch	109,058,778	+404
libraries.pmi.linux.arm.checked.mch	49,504,356	+86
libraries_tests.run.linux.arm.Release.mch	120,755,306	-2,864
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch	83,115,612	-6

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,296,274 contexts (841,817 MinOpts, 1,454,457 FullOpts).

MISSED contexts: base: 5,093 (0.22%), diff: 5,096 (0.22%)

Overall (-79 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	45,222,512	-61
coreclr_tests.run.windows.x86.checked.mch	309,180,492	+163
libraries_tests.run.windows.x86.Release.mch	185,842,234	-161
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	102,197,516	-20

FullOpts (-79 bytes)

Collection	Base size (bytes)	Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch	38,593,022	-61
coreclr_tests.run.windows.x86.checked.mch	107,508,723	+163
libraries_tests.run.windows.x86.Release.mch	87,510,727	-161
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch	93,527,724	-20

Details here

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	-0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

FullOpts (-0.01% to +0.00%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	-0.01%

Details here

Throughput diffs for linux/arm64 ran on linux/x64

FullOpts (-0.01% to -0.00%)

Collection	PDIFF
libraries_tests.run.linux.arm64.Release.mch	-0.01%

Details here

Add a tool that can use ML techniques to explore the JIT's CSE heuristic. Some parts of this are very specific to CSEs, others are general and could be repurposed for use with other heuristics. This is still work in progress. Depends on jit changes in dotnet/runtime#96880

AndyAyersMS · 2024-01-28T21:06:23Z

@EgorBo ping

EgorBo · 2024-01-29T10:58:07Z

src/coreclr/jit/codegencommon.cpp

+            printf("\n");
+        }
+
+        printf("Total bytes of code %d, prolog size %d, PerfScore %.2f, instruction count %d, allocated bytes for "


nit: looks like "Total bytes of code" is no longer prefixed with ; (comments in asm)

Will fix in a subsequent change.

jit-analyze is looking for this string (https://github.com/dotnet/jitutils/blob/e30e004fee30f6da62e2ddf856e31e677cec2955/src/jit-analyze/Program.cs#L294), so diffs are semi-broken now.

Added that back in #97677

EgorBo · 2024-01-29T11:10:07Z

src/coreclr/jit/optcse.cpp

+//   10. cse costEx is <= MIN_CSE_COST (0/1)
+//   11. cse is a constant and live across call (0/1)
+//   12. cse is a constant and min cost (0/1)
+//   13. cse is a constant and NOT min cost (0/1)


Just wondering - are you going to take platform's features into account such as number of callee-saved regs (for GPR and floats)?

Yes, we will need to add something like this -- right now the mechanisms to decide not to do a CSE are too weak.

I have follow-on changes that add some, but I'm not happy with them yet.

EgorBo

LGTM, looking forward to seeing the actual changes! Sorry for the delayed review

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 12, 2024

ghost assigned AndyAyersMS Jan 12, 2024

AndyAyersMS force-pushed the CseMetrics branch from 6ef9785 to 14ba4ea Compare January 22, 2024 19:41

Merge branch 'main' into CseMetrics

9d2c0ab

AndyAyersMS changed the title ~~Cse metrics~~ JIT: initial support for reinforcement learning of CSE heuristic Jan 22, 2024

AndyAyersMS marked this pull request as ready for review January 22, 2024 19:46

Merge branch 'main' into CseMetrics

684aa94

This was referenced Jan 22, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

Tests crashing in CI with no dump: exit code 137 means SIGKILL Killed #97049

Closed

some fixes -- mostly dumps. Stopping early was using wrong reward

926c646

AndyAyersMS mentioned this pull request Jan 25, 2024

Initial version of a tool for ML on CSEs dotnet/jitutils#389

Merged

EgorBo self-requested a review January 25, 2024 21:28

EgorBo reviewed Jan 29, 2024

View reviewed changes

EgorBo approved these changes Jan 29, 2024

View reviewed changes

AndyAyersMS merged commit 8a0b3f3 into dotnet:main Jan 29, 2024
129 checks passed

EgorBo mentioned this pull request Jan 29, 2024

JIT: don't ask for all write barrer helpers for AOT #97677

Merged

AndyAyersMS mentioned this pull request Jan 31, 2024

Investigate improving JIT heuristics with machine learning #92915

Closed

8 tasks

github-actions bot locked and limited conversation to collaborators Feb 29, 2024

JIT: initial support for reinforcement learning of CSE heuristic #96880

JIT: initial support for reinforcement learning of CSE heuristic #96880

Conversation

AndyAyersMS commented Jan 12, 2024 • edited Loading

ghost commented Jan 12, 2024

kunalspathak commented Jan 18, 2024

AndyAyersMS commented Jan 22, 2024

AndyAyersMS commented Jan 22, 2024

kunalspathak commented Jan 22, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for osx/arm64 ran on windows/x64

Assembly diffs for windows/arm64 ran on windows/x64

Assembly diffs for windows/x64 ran on windows/x64

Assembly diffs for linux/arm ran on windows/x86

Assembly diffs for windows/x86 ran on windows/x86

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

Throughput diffs for linux/arm64 ran on linux/x64

AndyAyersMS commented Jan 23, 2024

EgorBo commented Jan 23, 2024

ryujit-bot commented Jan 24, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

ryujit-bot commented Jan 24, 2024

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Assembly diffs for linux/x64 ran on windows/x64

Assembly diffs for osx/arm64 ran on windows/x64

Assembly diffs for windows/arm64 ran on windows/x64

Assembly diffs for windows/x64 ran on windows/x64

Assembly diffs for linux/arm ran on windows/x86

Assembly diffs for windows/x86 ran on windows/x86

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/x64 ran on windows/x64

Throughput diffs for linux/arm64 ran on linux/x64

AndyAyersMS commented Jan 28, 2024

EgorBo Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

AndyAyersMS Jan 29, 2024

Choose a reason for hiding this comment

jakobbotsch Jan 29, 2024

Choose a reason for hiding this comment

AndyAyersMS Jan 30, 2024

Choose a reason for hiding this comment

EgorBo Jan 29, 2024

Choose a reason for hiding this comment

AndyAyersMS Jan 29, 2024

Choose a reason for hiding this comment

EgorBo left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Jan 12, 2024 •

edited

Loading

EgorBo Jan 29, 2024 •

edited

Loading