-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codegen regression with llvmcall/inlining #28078
Comments
Basically a duplicate of #27694; |
- Instead of always inlining functions marked at-inline, increase the cost threshold 20x - Don't inline functions inferred not to return - statement_cost no longer needs to look at nested Exprs in general - Fix cost of `:copyast`
Here's an example of another regression, this time without using llvmcall: julia> using StaticArrays, BenchmarkTools
julia> A = @SMatrix randn(8,8);
julia> B = @SMatrix randn(8,8);
julia> @benchmark $A * $B
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 16.310 ns (0.00% GC)
median time: 17.290 ns (0.00% GC)
mean time: 17.885 ns (0.00% GC)
maximum time: 63.709 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998 vs julia> @benchmark $A * $B
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 28.014 ns (0.00% GC)
median time: 28.112 ns (0.00% GC)
mean time: 28.664 ns (0.00% GC)
maximum time: 59.865 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995 I think it might be copying big tuples when it doesn't inline. @code_native A * B
.text
; Function * {
; Location: matrix_multiply.jl:9
pushq %r14
pushq %rbx
subq $520, %rsp # imm = 0x208
movq %rdi, %rbx
; Function _mul; {
; Location: matrix_multiply.jl:75
; Function macro expansion; {
; Location: matrix_multiply.jl:78
movabsq $"*;", %rax
leaq 8(%rsp), %r14
movq %r14, %rdi
callq *%rax
movabsq $__memcpy_avx_unaligned_erms, %rax
;}}
movl $512, %edx # imm = 0x200
movq %rbx, %rdi
movq %r14, %rsi
callq *%rax
movq %rbx, %rax
addq $520, %rsp # imm = 0x208
popq %rbx
popq %r14
retq
nopw %cs:(%rax,%rax)
;} |
Ok, I think something can be done about that. |
along with the llvmcall fix, this fixes #28078
along with the llvmcall fix, this fixes #28078
MWE:
Before #27857:
After:
Bisected to 9277d3a.
The text was updated successfully, but these errors were encountered: