@test / @testset can have a significant overhead on 0.5 compared to 0.4 #18077

KristofferC · 2016-08-17T12:45:58Z

We noticed that the tests in https://github.com/KristofferC/ContMechTensors.jl got a lot slower on 0.5 than on 0.4.

On 0.4 the first run takes ~50 seconds and the second takes 1.2 seconds.
On 0.5-rc1+1 the first run takes ~140 seconds and the second takes ~20 seconds.

Removing all the @testset and changing @test to @assert the second run on 0.5 takes ~2 seconds. If I only remove the @testset the second run takes ~14 seconds on 0.5. The profile in type inference that takes the most time is the calls typeinf_frame -> stupdate! -> issubstate. It seems that both @test and @testset is now causing type inference to run extra much or something?

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2016-08-17T20:03:18Z

This seems to be due to @test in 0.5 generating many in-line try-catch blocks, which drive inference a bit crazy. It's slow the second time because we're running inference on the top-level blocks containing the test statements. Here are some options I see:

In 25c3659 I implemented a heuristic to bail out of type inference if there are too many try blocks. We could try reinstating that.
We could try running inference less often (or not at all) on top-level blocks. That will make top-level code even slower (when it does a lot of actual computation), but it tends to be slow already (e.g. global variables) so this might not be a big deal.
Make inference of try blocks smarter somehow, e.g. sparser.

StefanKarpinski · 2016-09-12T14:48:02Z

Accepted as a "real issue" that should be addressed somehow. Not sure if we can target 0.5.x or if this needs to be fixed on 0.6, but optimistically tagging as 0.5.x.

ViralBShah · 2016-09-22T06:38:39Z

We do need to see if we can either fix this in 0.5.x or at least provide a reasonable recipe to work around it.

KristofferC · 2016-10-06T11:50:59Z

FWIW, putting the expression from the @test macro in an anonoymous function made the tests in OP go from 105 seconds -> 80 seconds. 25c3659 did nothing as far as I could see.

maleadt · 2016-10-06T19:22:31Z

Does this have anything do to with it and/or is it known, or am I doing something silly?

using Base.Test

let
    @noinline child1() = return nothing
    parent1() = child1()
    println(code_typed(parent1, ()))
    code_llvm(parent1, ())
end

@testset "foobar" begin
    @noinline child2() = return nothing
    parent2() = child2()
    println(code_typed(parent2, ()))
    code_llvm(parent2, ())
end

Any[CodeInfo(:(begin 
        return $(Expr(:invoke, MethodInstance for (::#child1#1)(), :((Core.getfield)(#self#,:child1)::#child1#1)))
    end))=>Void]

define void @julia_parent1_63282() #0 !dbg !5 {
top:
  call void @julia_child1_63283() #0
  ret void
}


Any[CodeInfo(:(begin 
        return ((Core.getfield)((Core.getfield)(#self#,:child2),:contents))()
    end))=>Any]

define %jl_value_t* @julia_parent2_63305(%jl_value_t*, %jl_value_t**, i32) #0 !dbg !5 {
top:
  ...
  store %jl_value_t* inttoptr (i64 139710928829064 to %jl_value_t*), %jl_value_t** %13, align 8
  %17 = call %jl_value_t* @jl_f_getfield(%jl_value_t* null, %jl_value_t** %5, i32 2)
  store %jl_value_t* %17, %jl_value_t** %6, align 8
  %18 = call %jl_value_t* @jl_apply_generic(%jl_value_t** %6, i32 1)
  ...
}

cc @andreasnoack

maleadt · 2016-10-20T20:14:51Z

So I reduced the above to try..end breaking static dispatch, but the lowered code is different so I'm not sure whether this should be an inference or lowering optimization (couldn't find an existing issue)?

let
    @noinline child1() = return nothing
    parent1() = child1()    # static dispatch
    println(code_lowered(parent1, ()))
end

try
    @noinline child2() = return nothing
    parent2() = child2()    # dynamic dispatch
    println(code_lowered(parent2, ()))
end

Anyway, for a quick test similar to @KristofferC's I wrapped the testset's code block in an anonymous function, reducing eg. the math tests runtime and memory usage significantly:

Test: | Total time (s): | GC time (s): | Percent in gc: | Allocated (MB): | RSS (MB):
math  |   41.19         |    0.77      |    1.87        | 1729.61         |  374.57  
math  |   42.87         |    0.76      |    1.78        | 1729.53         |  376.22  

Test: | Total time (s): | GC time (s): | Percent in gc: | Allocated (MB): | RSS (MB):
math  |   28.57         |    0.76      |    2.68        | 1669.47         |  336.85  
math  |   28.88         |    0.75      |    2.61        | 1669.52         |  337.71

Maybe we should just go with something like this for now?

diff --git a/base/test.jl b/base/test.jl
index 9b37e87..0a481b4 100644
--- a/base/test.jl
+++ b/base/test.jl
@@ -705,6 +705,7 @@ function testset_beginend(args, tests)
     # finally removing the testset and giving it a chance to take
     # action (such as reporting the results)
     quote
+        (()->begin
         ts = $(testsettype)($desc; $options...)
         push_testset(ts)
         try
@@ -716,6 +717,7 @@ function testset_beginend(args, tests)
         end
         pop_testset()
         finish(ts)
+        end)()
     end
 end

@@ -777,6 +779,7 @@ function testset_forloop(args, testloop)
         end
     end
     quote
+        (()->begin
         arr = Array{Any,1}(0)
         first_iteration = true
         local ts
@@ -790,6 +793,7 @@ function testset_forloop(args, testloop)
             end
         end
         arr
+        end)()
     end
 end

aviks · 2017-07-10T14:00:45Z

This is much better on 0.6, still terrible on 0.5.

JeffBezanson · 2018-01-17T05:53:13Z

Ok to close this now?

KristofferC · 2018-01-17T08:25:00Z

Let me run a comparison on 0.7 with @test vs @assert.

KristofferC · 2018-01-17T13:46:18Z

Yes, no longer any significant slowdown of @testset + @test compared to just assert.

Although not related to the issue, there is a significant compilation regression vs 0.7. These are timings from testsets on 0.6 and 0.7:

0.6

 ─────────────────────────────────────────────────────────────────────────────────────
                                              Time                   Allocations
                                      ──────────────────────   ───────────────────────
           Tot / % measured:               60.8s / 89.9%           1.98GiB / 90.5%

 Section                      ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 simple math                       1    11.3s  20.7%   11.3s    361MiB  19.7%   361MiB
 constructors                      1    11.1s  20.2%   11.1s    331MiB  18.0%   331MiB
 tensor ops                        1    7.31s  13.4%   7.31s    217MiB  11.8%   217MiB
   double contraction              9    7.29s  13.3%   810ms    217MiB  11.8%  24.1MiB
 constrct Arr                      1    7.24s  13.2%   7.24s    306MiB  16.7%   306MiB
 norm, trace, det, inv, eig        1    6.83s  12.5%   6.83s    197MiB  10.7%   197MiB
 tensor identities                 1    3.52s  6.43%   3.52s    213MiB  11.6%   213MiB
 promotion/conversion              1    2.43s  4.45%   2.43s    112MiB  6.12%   112MiB
 constrct func                     1    1.96s  3.58%   1.96s   21.7MiB  1.19%  21.7MiB
 base vectors                      1    1.08s  1.98%   1.08s   32.5MiB  1.77%  32.5MiB
 diagm, one                        1    977ms  1.79%   977ms   25.2MiB  1.37%  25.2MiB
 indexing                          1    959ms  1.75%   959ms   16.4MiB  0.89%  16.4MiB
 exceptions                        1   1.16ms  0.00%  1.16ms   8.22KiB  0.00%  8.22KiB
 ─────────────────────────────────────────────────────────────────────────────────────

0.7:

 ─────────────────────────────────────────────────────────────────────────────────────
                                              Time                   Allocations
                                      ──────────────────────   ───────────────────────
           Tot / % measured:                102s / 93.2%           6.91GiB / 98.4%

 Section                      ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 simple math                       1    17.8s  18.6%   17.8s   1.22GiB  18.0%  1.22GiB
 constructors                      1    16.4s  17.2%   16.4s   1.11GiB  16.3%  1.11GiB
 tensor ops                        1    15.0s  15.7%   15.0s    901MiB  12.9%   901MiB
   double contraction              9    14.9s  15.6%   1.66s    898MiB  12.9%   100MiB
 constrct Arr                      1    12.9s  13.5%   12.9s   1.19GiB  17.4%  1.19GiB
 norm, trace, det, inv, eig        1    9.74s  10.2%   9.74s    635MiB  9.12%   635MiB
 promotion/conversion              1    7.74s  8.11%   7.74s    719MiB  10.3%   719MiB
 tensor identities                 1    5.31s  5.56%   5.31s    478MiB  6.87%   478MiB
 constrct func                     1    3.95s  4.14%   3.95s    307MiB  4.41%   307MiB
 diagm, one                        1    2.82s  2.95%   2.82s    134MiB  1.93%   134MiB
 indexing                          1    2.32s  2.43%   2.32s    102MiB  1.46%   102MiB
 base vectors                      1    1.51s  1.58%   1.51s   83.0MiB  1.19%  83.0MiB
 exceptions                        1   50.2ms  0.05%  50.2ms    889KiB  0.01%   889KiB

So time of running the test is up by 80% and allocations with 350%.

helgee · 2018-01-28T13:03:23Z

I just tried adding @testset blocks to ERFA.jl. The file contains 1537 tests in 226 testsets and the performance overhead is huge.

~14x slower on v0.6.2 vs. ~21x slower on master (although I have not fixed all deprecations yet).

0.6.2

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |  x86_64-apple-darwin17.3.0

julia> @time Pkg.test("ERFA")
INFO: Testing ERFA
WARNING: year outside range(1000:3000)
INFO: ERFA tests passed
 12.574542 seconds (4.92 M allocations: 258.743 MiB, 1.19% gc time)

julia> @time Pkg.test("ERFA")
INFO: Testing ERFA
        WARNING: year outside range(1000:3000)
Test Summary: | Pass  Total
ERFA          | 1311   1311
INFO: ERFA tests passed
178.497327 seconds (413 allocations: 22.969 KiB)

Master

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3590 (2018-01-27 08:16 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit eea727c0c (1 day old master)
|__/                   |  x86_64-apple-darwin17.4.0

julia> @time Pkg.test("ERFA")
[ Info: Testing ERFA
WARNING: year outside range(1000:3000)
[ Info: ERFA tests passed
 15.921993 seconds (8.73 M allocations: 486.941 MiB, 2.40% gc time)

julia> @time Pkg.test("ERFA")
[ Info: Testing ERFA
WARNING: year outside range(1000:3000)
Test Summary: | Pass  Total
ERFA          | 1311   1311
[ Info: ERFA tests passed
348.921708 seconds (760 allocations: 35.781 KiB)

maleadt · 2018-01-28T13:47:02Z

FWIW, the repro I posted above still generates horrible code:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3413 (2018-01-15 17:38 UTC)
 _/ |\__'_|_|_|\__'_|  |  master/5e2ff127f8* (fork: 2372 commits, 193 days)
|__/                   |  x86_64-pc-linux-gnu

julia> using Test


julia> let
           @noinline child1() = return nothing
           parent1() = child1()
           println(code_typed(parent1, ()))
           code_llvm(parent1, ())
       end

Any[CodeInfo(:(begin
      return
  end))=>Nothing]

; Function parent1
; Location: REPL[2]:3
define void @julia_parent1_57576() {
top:
  ret void
}


julia> @testset "foobar" begin
           @noinline child2() = return nothing
           parent2() = child2()
           println(code_typed(parent2, ()))
           code_llvm(parent2, ())
       end

Any[CodeInfo(:(begin
      SSAValue(0) = (Core.getfield)(#self#, :child2)::Core.Box
      SSAValue(1) = (Core.isdefined)(SSAValue(0), :contents)::Bool
      unless SSAValue(1) goto 5
      goto 8
      5: 
      NewvarNode(:(child2@_3))
      child2@_3
      8: 
      SSAValue(2) = (Core.getfield)(SSAValue(0), :contents)
      SSAValue(3) = (SSAValue(2))()
      return SSAValue(3)
  end))=>Any]

; Function parent2
; Location: REPL[3]:3
define nonnull %jl_value_t addrspace(10)* @japi1_parent2_57626(%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32) #0 {
top:
  ...
  br i1 %19, label %err, label %pass

pass:                                             ; preds = %top
  ...
  %21 = call nonnull %jl_value_t addrspace(10)* @jl_apply_generic(%jl_value_t addrspace(10)** nonnull %3, i32 1)
  ...
  ret %jl_value_t addrspace(10)* %21

err:                                              ; preds = %top
  call void @jl_undefined_var_error(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 140091673833888 to %jl_value_t*) to %jl_value_t addrspace(12)*))
  unreachable
}

Or, using BenchmarkTools:

julia> let
           @noinline child1() = return nothing
           parent1() = child1()
           display(@benchmark $parent1())
       end

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.015 ns (0.00% GC)
  median time:      0.020 ns (0.00% GC)
  mean time:        0.020 ns (0.00% GC)
  maximum time:     0.073 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000


julia> @testset "foobar" begin
           @noinline child2() = return nothing
           parent2() = child2()
           display(@benchmark $parent2())
       end

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     13.524 ns (0.00% GC)
  median time:      13.663 ns (0.00% GC)
  mean time:        13.732 ns (0.00% GC)
  maximum time:     28.546 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

JeffBezanson · 2018-05-21T18:42:02Z

Hoping to target compiler performance for 1.0 and 1.0.x.

vtjnash · 2019-04-18T21:36:27Z

I'm not sure what's actionable here. @maleadt's example generates good code now, and the original package (ContMechTensors.jl) doesn't load for me on master

KristofferC · 2019-04-19T06:56:52Z

I'll try it again at some time.

helgee · 2019-04-19T07:17:06Z

I retested my example above on the current master and got:

3.7 seconds (without @testset)
5.0 seconds (with @testset)

LGTM 🎉 Thanks y'all!

KristofferC mentioned this issue Aug 17, 2016

Change deprecated syntax KristofferC/ContMechTensors.jl#34

Merged

JeffBezanson added performance Must go faster compiler:inference Type inference labels Aug 17, 2016

kshyatt added the testsystem The unit testing framework and Test stdlib label Aug 17, 2016

jakebolewski mentioned this issue Aug 23, 2016

[WIP] Help wanted: switch Base.Test to use testsets everywhere #17165

Closed

8 tasks

JeffBezanson mentioned this issue Sep 9, 2016

@test is extremely slow in 0.5 for BusinessDays.jl #18361

Closed

StefanKarpinski added this to the 0.5.x milestone Sep 12, 2016

felipenoris mentioned this issue Sep 21, 2016

Tests seem to be going into infinite loop JuliaFinance/BusinessDays.jl#4

Closed

StefanKarpinski added help wanted Indicates that a maintainer wants help on an issue or pull request and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Oct 27, 2016

KristofferC mentioned this issue Dec 10, 2016

wrap tests in anonymous function #19550

Closed

fredrikekre mentioned this issue Mar 13, 2017

Simplify some functions Ferrite-FEM/Tensors.jl#19

Merged

2 tasks

maleadt mentioned this issue Sep 5, 2017

Testsets/try..catch mess up dispatch in testsuite JuliaGPU/CUDAnative.jl#23

Closed

KristofferC closed this as completed Jan 17, 2018

KristofferC reopened this Jan 28, 2018

helgee mentioned this issue Jan 28, 2018

ERFA Overhaul JuliaAstro/ERFA.jl#35

Merged

JeffBezanson modified the milestones: 0.5.x, 1.0.x May 21, 2018

JeffBezanson added compiler:latency Compiler latency and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Sep 6, 2018

KristofferC removed the performance Must go faster label Oct 23, 2018

KristofferC closed this as completed Apr 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@test / @testset can have a significant overhead on 0.5 compared to 0.4 #18077

@test / @testset can have a significant overhead on 0.5 compared to 0.4 #18077

KristofferC commented Aug 17, 2016 •

edited

Loading

JeffBezanson commented Aug 17, 2016 •

edited

Loading

StefanKarpinski commented Sep 12, 2016

ViralBShah commented Sep 22, 2016

KristofferC commented Oct 6, 2016 •

edited

Loading

maleadt commented Oct 6, 2016 •

edited

Loading

maleadt commented Oct 20, 2016 •

edited

Loading

aviks commented Jul 10, 2017

JeffBezanson commented Jan 17, 2018

KristofferC commented Jan 17, 2018

KristofferC commented Jan 17, 2018 •

edited

Loading

helgee commented Jan 28, 2018 •

edited

Loading

maleadt commented Jan 28, 2018

JeffBezanson commented May 21, 2018

vtjnash commented Apr 18, 2019

KristofferC commented Apr 19, 2019

helgee commented Apr 19, 2019

@test / @testset can have a significant overhead on 0.5 compared to 0.4 #18077

@test / @testset can have a significant overhead on 0.5 compared to 0.4 #18077

Comments

KristofferC commented Aug 17, 2016 • edited Loading

JeffBezanson commented Aug 17, 2016 • edited Loading

StefanKarpinski commented Sep 12, 2016

ViralBShah commented Sep 22, 2016

KristofferC commented Oct 6, 2016 • edited Loading

maleadt commented Oct 6, 2016 • edited Loading

maleadt commented Oct 20, 2016 • edited Loading

aviks commented Jul 10, 2017

JeffBezanson commented Jan 17, 2018

KristofferC commented Jan 17, 2018

KristofferC commented Jan 17, 2018 • edited Loading

helgee commented Jan 28, 2018 • edited Loading

0.6.2

Master

maleadt commented Jan 28, 2018

JeffBezanson commented May 21, 2018

vtjnash commented Apr 18, 2019

KristofferC commented Apr 19, 2019

helgee commented Apr 19, 2019

KristofferC commented Aug 17, 2016 •

edited

Loading

JeffBezanson commented Aug 17, 2016 •

edited

Loading

KristofferC commented Oct 6, 2016 •

edited

Loading

maleadt commented Oct 6, 2016 •

edited

Loading

maleadt commented Oct 20, 2016 •

edited

Loading

KristofferC commented Jan 17, 2018 •

edited

Loading

helgee commented Jan 28, 2018 •

edited

Loading