Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][Perf] TensorPrimitive MaxMagnitude and Truncate Regressions on 7/10/2024 10:56:31 PM #106475

Open
performanceautofiler bot opened this issue Jul 16, 2024 · 7 comments
Assignees
Labels
arch-arm64 arch-x64 area-System.Numerics.Tensors investigate os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e733c2f22710b6382c9aadb8139833fbf395c332
Compare 0d426df37c6ac2c1d22f952a2e359ffd4281d47b
Diff Diff
Configs CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
433.31 ns 827.64 ns 1.91 0.01 True
8.55 μs 16.94 μs 1.98 0.04 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order))

  • Libraries build extracted to runtime/artifacts or build instructions: Libraries README args: -subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0
  • CoreCLR product build extracted to runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release, build instructions: CoreCLR README args: -subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0
  • AOT MONO build extracted to runtime/artifacts/bin/mono/$RunOS.$RunArch.Release, build instructions: MONO README args: -arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false
  • Dotnet SDK installed for dotnet commands
  • Running commands from the runtime folder

Linux

# Set $RunDir to the runtime directory
RunDir=`pwd`

# Set the OS, arch, and OSId
RunOS='linux'
RunOSId='linux'
RunArch='x64'

# Create aot directory 
mkdir -p $RunDir/artifacts/bin/aot/sgen
mkdir -p $RunDir/artifacts/bin/aot/pack
cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen
cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack

# Create Core Root
$RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance

# One line run:
python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200

Windows

# Set $RunDir to the runtime directory
$RunDir="FullPathHere"

# Set the OS, arch, and OSId
RunOS='windows'
RunOSId='win'
RunArch='x64'

# Create aot directory
mkdir $RunDir\artifacts\bin\aot\sgen
mkdir $RunDir\artifacts\bin\aot\pack
xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y
xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y

# Create Core Root
$RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance

# One line run:
python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Double&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Double>.Truncate(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 6c553628dfaec02e665a4ade3deabe3a91281024
Compare 0d426df37c6ac2c1d22f952a2e359ffd4281d47b
Diff Diff
Configs CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Single>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
4.44 μs 5.84 μs 1.32 0.00 True
233.79 ns 296.04 ns 1.27 0.00 True
279.04 ns 340.78 ns 1.22 0.00 True
4.85 μs 6.18 μs 1.27 0.00 True

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order))

  • Libraries build extracted to runtime/artifacts or build instructions: Libraries README args: -subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0
  • CoreCLR product build extracted to runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release, build instructions: CoreCLR README args: -subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0
  • AOT MONO build extracted to runtime/artifacts/bin/mono/$RunOS.$RunArch.Release, build instructions: MONO README args: -arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false
  • Dotnet SDK installed for dotnet commands
  • Running commands from the runtime folder

Linux

# Set $RunDir to the runtime directory
RunDir=`pwd`

# Set the OS, arch, and OSId
RunOS='linux'
RunOSId='linux'
RunArch='x64'

# Create aot directory 
mkdir -p $RunDir/artifacts/bin/aot/sgen
mkdir -p $RunDir/artifacts/bin/aot/pack
cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen
cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack

# Create Core Root
$RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance

# One line run:
python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Single&gt;*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Single&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200

Windows

# Set $RunDir to the runtime directory
$RunDir="FullPathHere"

# Set the OS, arch, and OSId
RunOS='windows'
RunOSId='win'
RunArch='x64'

# Create aot directory
mkdir $RunDir\artifacts\bin\aot\sgen
mkdir $RunDir\artifacts\bin\aot\pack
xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y
xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y

# Create Core Root
$RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance

# One line run:
python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Single&gt;*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Single&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Single>.MaxMagnitude_Scalar(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Single>.MaxMagnitude_Scalar(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Single>.MaxMagnitude_Vector(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Single>.MaxMagnitude_Vector(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 6c553628dfaec02e665a4ade3deabe3a91281024
Compare 0d426df37c6ac2c1d22f952a2e359ffd4281d47b
Diff Diff
Configs CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
265.46 ns 494.60 ns 1.86 0.03 True
4.37 μs 8.83 μs 2.02 0.01 True

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order))

  • Libraries build extracted to runtime/artifacts or build instructions: Libraries README args: -subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0
  • CoreCLR product build extracted to runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release, build instructions: CoreCLR README args: -subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0
  • AOT MONO build extracted to runtime/artifacts/bin/mono/$RunOS.$RunArch.Release, build instructions: MONO README args: -arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false
  • Dotnet SDK installed for dotnet commands
  • Running commands from the runtime folder

Linux

# Set $RunDir to the runtime directory
RunDir=`pwd`

# Set the OS, arch, and OSId
RunOS='linux'
RunOSId='linux'
RunArch='x64'

# Create aot directory 
mkdir -p $RunDir/artifacts/bin/aot/sgen
mkdir -p $RunDir/artifacts/bin/aot/pack
cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen
cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack

# Create Core Root
$RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance

# One line run:
python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200

Windows

# Set $RunDir to the runtime directory
$RunDir="FullPathHere"

# Set the OS, arch, and OSId
RunOS='windows'
RunOSId='win'
RunArch='x64'

# Create aot directory
mkdir $RunDir\artifacts\bin\aot\sgen
mkdir $RunDir\artifacts\bin\aot\pack
xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y
xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y

# Create Core Root
$RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance

# One line run:
python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives&lt;Single&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Truncate(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_FloatingPointTensorPrimitives<Single>.Truncate(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline e733c2f22710b6382c9aadb8139833fbf395c332
Compare 0d426df37c6ac2c1d22f952a2e359ffd4281d47b
Diff Diff
Configs CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
4.63 μs 6.38 μs 1.38 0.01 True
236.12 ns 297.83 ns 1.26 0.00 True
269.42 ns 348.38 ns 1.29 0.00 True
4.52 μs 5.92 μs 1.31 0.00 True

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order))

  • Libraries build extracted to runtime/artifacts or build instructions: Libraries README args: -subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0
  • CoreCLR product build extracted to runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release, build instructions: CoreCLR README args: -subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0
  • AOT MONO build extracted to runtime/artifacts/bin/mono/$RunOS.$RunArch.Release, build instructions: MONO README args: -arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false
  • Dotnet SDK installed for dotnet commands
  • Running commands from the runtime folder

Linux

# Set $RunDir to the runtime directory
RunDir=`pwd`

# Set the OS, arch, and OSId
RunOS='linux'
RunOSId='linux'
RunArch='x64'

# Create aot directory 
mkdir -p $RunDir/artifacts/bin/aot/sgen
mkdir -p $RunDir/artifacts/bin/aot/pack
cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen
cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack

# Create Core Root
$RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance

# One line run:
python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Int32&gt;*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Int32&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200

Windows

# Set $RunDir to the runtime directory
$RunDir="FullPathHere"

# Set the OS, arch, and OSId
RunOS='windows'
RunOSId='win'
RunArch='x64'

# Create aot directory
mkdir $RunDir\artifacts\bin\aot\sgen
mkdir $RunDir\artifacts\bin\aot\pack
xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y
xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y

# Create Core Root
$RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance

# One line run:
python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Int32&gt;*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Int32&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Int32>.MaxMagnitude_Vector(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Int32>.MaxMagnitude_Scalar(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Int32>.MaxMagnitude_Vector(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Int32>.MaxMagnitude_Scalar(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime untriaged New issue has not been triaged by the area owner labels Jul 16, 2024
Copy link
Author

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 6c553628dfaec02e665a4ade3deabe3a91281024
Compare 0d426df37c6ac2c1d22f952a2e359ffd4281d47b
Diff Diff
Configs CompilationMode:tiered, LLVM:true, MonoAOT:true, MonoInterpreter:false, RunKind:micro_mono

Regressions in System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
8.92 μs 11.73 μs 1.31 0.17 True
414.59 ns 539.07 ns 1.30 0.00 True
468.36 ns 587.35 ns 1.25 0.00 True
10.09 μs 13.41 μs 1.33 0.24 True

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

Repro Steps

Prerequisites (Files either built locally (with build.(sh/cmd) or downloaded from payload above (if same system setup) (in this order))

  • Libraries build extracted to runtime/artifacts or build instructions: Libraries README args: -subset libs+libs.tests -rc release -configuration Release -arch $RunArch -framework net8.0
  • CoreCLR product build extracted to runtime/artifacts/bin/coreclr/$RunOS.$RunArch.Release, build instructions: CoreCLR README args: -subset clr+libs -rc release -configuration Release -arch $RunArch -framework net8.0
  • AOT MONO build extracted to runtime/artifacts/bin/mono/$RunOS.$RunArch.Release, build instructions: MONO README args: -arch $RunArch -os $RunOS -s mono+libs+host+packs -c Release /p:CrossBuild=false /p:MonoLLVMUseCxx11Abi=false
  • Dotnet SDK installed for dotnet commands
  • Running commands from the runtime folder

Linux

# Set $RunDir to the runtime directory
RunDir=`pwd`

# Set the OS, arch, and OSId
RunOS='linux'
RunOSId='linux'
RunArch='x64'

# Create aot directory 
mkdir -p $RunDir/artifacts/bin/aot/sgen
mkdir -p $RunDir/artifacts/bin/aot/pack
cp -r $RunDir/artifacts/obj/mono/$RunOS.$RunArch.Release/mono/* $RunDir/artifacts/bin/aot/sgen
cp -r $RunDir/artifacts/bin/microsoft.netcore.app.runtime.$RunOS-$RunArch/Release/* $RunDir/artifacts/bin/aot/pack

# Create Core Root
$RunDir/src/tests/build.sh release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir/performance

# One line run:
python3 $RunDir/performance/scripts/benchmarks_ci.py --csproj $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;*' --bdn-artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --packages $RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir/performance/artifacts/packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir/performance/src/benchmarks/micro/MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir/artifacts/bin/aot/sgen/mini/mono-sgen --customruntimepack $RunDir/artifacts/bin/aot/pack --aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir/artifacts/BenchmarkDotNet.Artifacts --packages $RunDir/performance/artifacts/packages --buildTimeout 1200

Windows

# Set $RunDir to the runtime directory
$RunDir="FullPathHere"

# Set the OS, arch, and OSId
RunOS='windows'
RunOSId='win'
RunArch='x64'

# Create aot directory
mkdir $RunDir\artifacts\bin\aot\sgen
mkdir $RunDir\artifacts\bin\aot\pack
xcopy $RunDir\artifacts\obj\mono\$RunOS.$RunArch.Release\mono $RunDir\artifacts\bin\aot\sgen\ /e /y
xcopy $RunDir\artifacts\bin\microsoft.netcore.app.runtime.$RunOSId-$RunArch\Release $RunDir\artifacts\bin\aot\pack\ /e /y

# Create Core Root
$RunDir\src\tests\build.cmd release $RunArch generatelayoutonly /p:LibrariesConfiguration=Release

# Clone performance 
git clone --branch main --depth 1 --quiet https://github.com/dotnet/performance.git $RunDir\performance

# One line run:
python3 $RunDir\performance\scripts\benchmarks_ci.py --csproj $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --incremental no --architecture $RunArch -f net8.0 --filter 'System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;*' --bdn-artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --bdn-arguments="--anyCategories Libraries Runtime  --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack --aotcompilermode llvm --logBuildOutput --generateBinLog"

# Individual Commands:
# Restore 
dotnet restore $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --packages $RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Build
dotnet build $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore /p:NuGetPackageRoot=$RunDir\performance\artifacts\packages /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1

# Run
dotnet run --project $RunDir\performance\src\benchmarks\micro\MicroBenchmarks.csproj --configuration Release --framework net8.0 --no-restore --no-build -- --filter System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives&lt;Double&gt;* --anyCategories Libraries Runtime " --category-exclusion-filter NoAOT NoWASM --runtimes monoaotllvm --aotcompilerpath $RunDir\artifacts\bin\aot\sgen\mini\mono-sgen.exe --customruntimepack $RunDir\artifacts\bin\aot\pack -aotcompilermode llvm --logBuildOutput --generateBinLog " --artifacts $RunDir\artifacts\BenchmarkDotNet.Artifacts --packages $RunDir\performance\artifacts\packages --buildTimeout 1200

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.MaxMagnitude_Scalar(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.MaxMagnitude_Scalar(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.MaxMagnitude_Vector(BufferLength: 128)

ETL Files

Histogram

JIT Disasms

System.Numerics.Tensors.Tests.Perf_NumberTensorPrimitives<Double>.MaxMagnitude_Vector(BufferLength: 3079)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@matouskozak
Copy link
Member

matouskozak commented Jul 16, 2024

Possible related to #104651 @stephentoub ?

@stephentoub
Copy link
Member

Possible related to dotnet/runtime#104651 @stephentoub ?

No, that only added new API and didn't touch any existing implementation. More likely to be #103837 @tannergooding

@matouskozak
Copy link
Member

Possible related to dotnet/runtime#104651 @stephentoub ?

No, that only added new API and didn't touch any existing implementation. More likely to be dotnet/runtime#103837 @tannergooding

Looks like you're right. Interestingly that PR is not in the regression range 6c55362...635f9af. Wonder if there is some scenario where it used take a different code path before your PR.

Another possibility is that the commit range calculator hit a bug and is not displaying the range correctly...

@matouskozak matouskozak transferred this issue from dotnet/perf-autofiling-issues Aug 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 15, 2024
Copy link
Contributor

Tagging subscribers to this area: @directhex, @matouskozak
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

@matouskozak matouskozak changed the title [Perf] Linux/arm64: 16 Regressions on 7/10/2024 10:56:31 PM [mono][Perf] TensorPrimitive MaxMagnitude and Truncate Regressions on 7/10/2024 10:56:31 PM Aug 15, 2024
@matouskozak
Copy link
Member

Affecting both x64 and arm64 Mono AOT
image

@matouskozak matouskozak removed the untriaged New issue has not been triaged by the area owner label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 arch-x64 area-System.Numerics.Tensors investigate os-linux Linux OS (any supported distro) runtime-mono specific to the Mono runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

4 participants