Benchmark regressions of 1.3 vs 1.2 #32985

KristofferC · 2019-08-20T16:21:25Z

Opening this issue to not pollute #32973. The benchmark run in that PR showed some regressions vs 1.2 (https://github.com/JuliaCI/BaseBenchmarkReports/blob/24d2f8dc38422bf7c21691c2fdfa243d894ec38f/c181990_vs_9392955/report.md).

I'll go through the results and post the ones I believe are worth looking more into

KristofferC · 2019-08-20T18:14:20Z

Some stuff to look through:

Array indexing

["array", "index", "(\"sumvector_view\", \"Array{Float32,2}\")"]	1.72 (50%) ❌	1.00 (1%)

Looks like a fluke

Broadcasting memory allocations

["broadcast", "dotop", "(\"Float64\", (1000, 1000), 2)"]	1.01 (5%)	Inf (1%)

Fixe by #33007

Id Dict stuff

["collection", "deletion", "(\"IdDict\", \"Int\", \"filter\")"]	1.61 (25%) ❌	1.08 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"iterator\")"]	1.90 (25%) ❌	1.19 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"loop\")"]	1.84 (25%) ❌	1.19 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"loop\", \"sizehint!\")"]	2.04 (25%) ❌	1.24 (1%) ❌
["collection", "queries & updates", "(\"IdDict\", \"Int\", \"push!\", \"new\")"]	2.38 (25%) ❌	1.50 (1%)

etc

Maybe fixed by: #33009

IO

Some of these are expected to slow down due to new locks in IO code

["io", "serialization", "(\"deserialize\", \"Vector{String}\")"]	1.23 (5%) ❌	1.00 (1%)
["io", "skipchars"]	4.97 (5%) ❌	1.10 (1%) ❌
["micro", "printfd"]	2.61 (5%) ❌	1.20 (1%) ❌

Arithmetic

["linalg", "arithmetic", "(\"+\", \"Bidiagonal\", \"Bidiagonal\", 1024)"]	3.13 (45%) ❌	1.01 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Bidiagonal\", \"Bidiagonal\", 256)"]	4.18 (45%) ❌	1.04 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Diagonal\", \"Diagonal\", 1024)"]	2.83 (45%) ❌	1.01 (1%) ❌

["linalg", "arithmetic", "(\"+\", \"Vector\", \"Vector\", 1024)"]	2.90 (45%) ❌	1.01 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Vector\", \"Vector\", 256)"]	4.98 (45%) ❌	1.04 (1%) ❌
["linalg", "arithmetic", "(\"-\", \"Vector\", \"Vector\", 256)"]	5.49 (45%) ❌	1.04 (1%) ❌

etc.

Analyzed in #32985 (comment).

Fixed in #33079.

Random

Probably due to thread local rngs

["random", "collections", "(\"rand\", \"RandomDevice\", \"small Set\")"]	1.70 (25%) ❌	1.00 (1%)
["random", "collections", "(\"rand\", \"RandomDevice\", \"small String\")"]	1.50 (25%) ❌	1.00 (1%)
["random", "collections", "(\"rand\", \"RandomDevice\", \"small Vector\")"]	1.46 (25%) ❌	1.00 (1%)

Problem

["problem", "monte carlo", "euro_option_devec"]	1.50 (5%) ❌	1.00 (1%)

This is because randn() got slower.

Scalar with big numbers

This is #31759

["scalar", "arithmetic", "(\"add\", \"Complex{Float32}\", \"BigFloat\")"]

Sparse constructors ❓

Constructor regression might be #31724

["sparse", "constructors", "(\"IJV\", 100)"]	3.58 (5%) ❌	1.02 (1%) ❌
["sparse", "index", "(\"spmat\", \"row\", \"array\", 1000)"]	1.49 (30%) ❌	1.00 (1%)
["sparse", "index", "(\"spmat\", \"row\", \"logical\", 1000)"]	1.57 (30%) ❌	1.00 (1%)

etc

Others

["linalg", "small exp #29116"]	3.21 (5%) ❌	1.10 (1%) ❌

#32985 (comment)

["micro", "mandel"]	2.04 (5%) ❌	1.00 (1%)

This is likely due to the new hypot implementation.

KristofferC · 2019-08-21T11:08:06Z

The arithmetic regression can be reproduced with e.g.

# 1.2
julia> n = 1024; a = rand(n); b = rand(n); @btime a+b;
  2.083 μs (5 allocations: 8.22 KiB)

# 1.1
julia> n = 1024; a = rand(n); b = rand(n); @btime a+b;
  711.142 ns (1 allocation: 8.13 KiB)

Comparing profile traces vs 1.2 (the 1.3 one showed below):

julia> Profile.print()
1155 ./task.jl:333; (::REPL.var"##26#27"{REPL.REPLBackend})()
 1155 ...ia/stdlib/v1.4/REPL/src/REPL.jl:118; macro expansion
  1155 ...ia/stdlib/v1.4/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   1155 ./boot.jl:330; eval(::Module, ::Any)
    1155 ./REPL[17]:3; measure(::Array{Float64,1}, ::Array{Float64,1},...
     1155 ./arraymath.jl:47; +
      107 ./broadcast.jl:802; broadcast_preserving_zero_d(::Function, ::Arr...
       24 ./broadcast.jl:1232; broadcasted(::Function, ::Array{Float64,1}, :...
        23 ./broadcast.jl:1234; broadcasted
      996 ./broadcast.jl:803; broadcast_preserving_zero_d(::Function, ::Arr...
       984 ./broadcast.jl:814; materialize(::Base.Broadcast.Broadcasted{Base...
        983 ./broadcast.jl:834; copy
         843 ./broadcast.jl:858; copyto!
          843 ./broadcast.jl:903; copyto!
           843 ./simdloop.jl:77; macro expansion
            324 ./array.jl:780; macro expansion
            519 ./broadcast.jl:904; macro expansion
             519 ./broadcast.jl:558; getindex
              244 ./broadcast.jl:597; _broadcast_getindex
               244 ./broadcast.jl:621; _getindex
                244 ./broadcast.jl:591; _broadcast_getindex
                 244 ./array.jl:742; getindex
              275 ./float.jl:401; _broadcast_getindex
         140 ./broadcast.jl:196; similar
          140 ./abstractarray.jl:671; similar
           140 ./abstractarray.jl:672; similar
            140 ./boot.jl:421; Array
             140 ./boot.jl:413; Array
              140 ./boot.jl:404; Array
       1   ./simdloop.jl:0; materialize(::Base.Broadcast.Broadcasted{Base...
       6   ./simdloop.jl:75; materialize(::Base.Broadcast.Broadcasted{Base...
      51  ./broadcast.jl:804; broadcast_preserving_zero_d(::Function, ::Arr...
       3 ./tuple.jl:19; length(::Tuple)

It seems this might be related to the whole broadcast_preserving_zero_d stuff added in #32122 (but I haven't actually verified that). cc @mbauman

KristofferC · 2019-08-21T15:03:32Z

I believe the

["linalg", "small exp #29116"]	3.21 (5%) ❌	1.10 (1%) ❌

is also due to the array arithmetic regression:

       554 ./arraymath.jl:39; -
        367 ./broadcast.jl:802; broadcast_preserving_zero_d(::Function, ::Ar...
         68 ./broadcast.jl:1232; broadcasted(::Function, ::Array{Float64,2},...
          58 ./broadcast.jl:1234; broadcasted
        60  ./broadcast.jl:803; broadcast_preserving_zero_d(::Function, ::Ar...
         46 ./broadcast.jl:814; materialize(::Base.Broadcast.Broadcasted{Bas...
          40 ./broadcast.jl:834; copy
           17 ./broadcast.jl:858; copyto!
            8 ./broadcast.jl:902; copyto!
             8 ./broadcast.jl:885; preprocess
              6 ./promotion.jl:399; preprocess_args
            9 ./broadcast.jl:903; copyto!
           23 ./broadcast.jl:196; similar
            23 ./abstractarray.jl:671; similar
             23 ./abstractarray.jl:672; similar
              23 ./boot.jl:421; Array
               23 ./boot.jl:414; Array
                23 ./boot.jl:406; Array

comes up on 1.3 while on 1.2 that section is just

       74  ./arraymath.jl:39; -
        64 ./broadcast.jl:752; broadcast(::typeof(-), ::Array{Float64,2}, ::...
         45 ./boot.jl:406; materialize

KristofferC · 2019-08-26T16:05:10Z

For e.g. the skipchars regression, would it make sense to have an API where someone that is going to do a bunch of tight calls to read can pull out the lock toggle outside its loop?

* reconstruct PR #31118 * reconstruct PR 31118 2 * Check arguments of SparseMatrixCSC #31024 #31435 * fix SuiteSparse test * added NEWS, fixed tests * loosen restrictions - resize to useful length * cleaned up NEWS, revert minor change * add non-checking and checking constructor - improve check performance

Followup to https://github.com/JuliaLang/julia/pull/31724/files#r317686891; instead of widening the index type dynamically based upon the index vector length, just throw an error in the case where the index type cannot hold all the indices in a CSC format. This previously was an OOB access (and likely segfault) in 1.2, so throwing an error here is not a breaking change -- and throwing an error restores type stability, addressing the performance regression flagged in #32985.

Followup to https://github.com/JuliaLang/julia/pull/31724/files#r317686891; instead of widening the index type dynamically based upon the index vector length, just throw an error in the case where the index type cannot hold all the indices in a CSC format. This previously was an OOB access (and likely segfault) in 1.2, so throwing an error here is not a breaking change -- and throwing an error restores type stability, addressing the performance regression flagged in #32985. (cherry picked from commit 9725fb4)

KristofferC added the regression Regression in behavior compared to a previous version label Aug 20, 2019

KristofferC added this to the 1.3 milestone Aug 20, 2019

KristofferC assigned mbauman and KristofferC Aug 21, 2019

KristofferC mentioned this issue Aug 21, 2019

fix some broadcast patterns to avoid allocations #33007

Merged

KristofferC mentioned this issue Aug 26, 2019

fix performance regression in broadcasting in some cases #33079

Merged

KristofferC unassigned mbauman Aug 26, 2019

KristofferC mentioned this issue Aug 26, 2019

Argument checks for SparseMatrixCSC constructors #31724

Merged

mbauman mentioned this issue Aug 26, 2019

error instead of widening index types in sparse #33083

Merged

KristofferC closed this as completed Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark regressions of 1.3 vs 1.2 #32985

Benchmark regressions of 1.3 vs 1.2 #32985

KristofferC commented Aug 20, 2019

KristofferC commented Aug 20, 2019 •

edited

Loading

KristofferC commented Aug 21, 2019 •

edited

Loading

KristofferC commented Aug 21, 2019 •

edited

Loading

KristofferC commented Aug 26, 2019 •

edited

Loading

Benchmark regressions of 1.3 vs 1.2 #32985

Benchmark regressions of 1.3 vs 1.2 #32985

Comments

KristofferC commented Aug 20, 2019

KristofferC commented Aug 20, 2019 • edited Loading

Array indexing

Broadcasting memory allocations

Id Dict stuff

IO

Arithmetic

Random

Problem

Scalar with big numbers

Sparse constructors ❓

Others

KristofferC commented Aug 21, 2019 • edited Loading

KristofferC commented Aug 21, 2019 • edited Loading

KristofferC commented Aug 26, 2019 • edited Loading

KristofferC commented Aug 20, 2019 •

edited

Loading

KristofferC commented Aug 21, 2019 •

edited

Loading

KristofferC commented Aug 21, 2019 •

edited

Loading

KristofferC commented Aug 26, 2019 •

edited

Loading