Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating binary variables is slow #1255

Closed
mlubin opened this issue Nov 12, 2018 · 8 comments · Fixed by #1256
Closed

Creating binary variables is slow #1255

mlubin opened this issue Nov 12, 2018 · 8 comments · Fixed by #1256

Comments

@mlubin
Copy link
Member

mlubin commented Nov 12, 2018

using JuMP
using BenchmarkTools

function with_bin(N)
    model = Model()
    x = @variable(model, [1:N], Bin)
    return x
end

function without_bin(N)
    model = Model()
    x = @variable(model, [1:N])
    return x
end

@btime with_bin(1_000_000)
@btime without_bin(1_000_000)

JuMP master, Julia 1.0.1:

  930.520 ms (6000340 allocations: 950.56 MiB)
  184.530 ms (2000252 allocations: 87.93 MiB)

JuMP 0.18, Julia 1.0.1:

  60.050 ms (1000168 allocations: 79.79 MiB)
  102.381 ms (1000168 allocations: 79.79 MiB)

(don't ask me to explain why binary variables are created faster than nonbinary variables on 0.18, it's very weird).

@mlubin
Copy link
Member Author

mlubin commented Nov 13, 2018

With the recent fixes, the numbers on 0.19 are now:

  262.428 ms (4000287 allocations: 161.12 MiB)
  59.681 ms (2000199 allocations: 53.43 MiB)

That's a huge improvement but there's still an underlying issue that creating SingleVariable constraints is way more expensive than it should be:

function with_int_and_bounds(N)
    model = Model()
    x = @variable(model, [1:N], Int, lower_bound=-1, upper_bound=1)
    return x
end
@btime with_int_and_bounds(1_000_000)

gives

742.001 ms (10000427 allocations: 396.01 MiB)

So creating an integer variable with bounds is more than 10x more expensive than creating a plain variable.

@blegat
Copy link
Member

blegat commented Nov 13, 2018

The time seems to be primarily taken by these dictionaries
https://github.com/JuliaOpt/JuMP.jl/blob/a0af39df82fce30cce0821f4f08ec7b13d710b01/src/JuMP.jl#L155-L159
Indeed, while it takes around 20 ns to execute haskey or getindex for a dictionary almost empty, it quickly tends to 150 ns when it has more entries.

julia> d = Dict{Int, Int}()
Dict{Int64,Int64} with 0 entries

julia> using BenchmarkTools

julia> @btime rand(Int) # ~ 3 ns
  3.044 ns (0 allocations: 0 bytes)
7542753156883198371

julia> @btime haskey(d, rand(Int)); # fast because dict is empty
  21.092 ns (1 allocation: 16 bytes)

julia> d[rand(Int)] = rand(Int);

julia> @btime haskey(d, rand(Int)); # still fast
  21.053 ns (1 allocation: 16 bytes)

julia> @btime d[rand(Int)] = rand(Int); # dict get filled up
  118.021 ns (2 allocations: 32 bytes)

julia> length(d) # ~ 10M entries
10410503

julia> @btime haskey(d, rand(Int)); # ~ 130 ns
  137.340 ns (1 allocation: 16 bytes)

julia> @btime d[rand(Int)] = rand(Int) # even more entries
  169.626 ns (2 allocations: 32 bytes)

julia> @btime haskey(d, rand(Int)); # ~ 140 ns
  142.391 ns (1 allocation: 16 bytes)

julia> @btime d[rand(Int)] = rand(Int); # even more !
  156.648 ns (2 allocations: 32 bytes)

julia> @btime haskey(d, rand(Int)); # let's say it converges to ~ 150 ns
  144.789 ns (1 allocation: 16 bytes)

We could avoid calling haskey when creating a variable because we know that it would always return true but it turns out that there is some optimization done in Base.Dict so that calling haskey before setting it is fast:

julia> @btime (i = rand(Int); haskey(d, i); d[i] = i); # If the same key is used, still ~150 ns
  172.634 ns (3 allocations: 48 bytes)

julia> @btime (haskey(d, rand(Int)); d[rand(Int)] = rand(Int)); # different key -> ~300 ns
  301.921 ns (3 allocations: 48 bytes)

It seems a bit unfortunate that we have to use dictionaries while we know that the variable indices are 1, 2, 3, ... with the MOIU Model. Even if some variables are deleted, it seems to be more efficient to store it as a vector (with some useless entries if variables are deleted) than a dictionary. The issue is that in Direct mode, the model could be something that numbers its variables differently.
So we have a few choices here:

  1. Stick to dictionaries, this overhead is acceptable
  2. Forces all model to number its variables 1, 2, ... or at least use small indices (MockOptimizer currently violates this)
  3. Use Union{Vector{MOILB}, Dict{MOIVar, MOILB}} instead of Dict{MOIVar, MOILB}. In the constructor, we need to decide whether we use a vector of a dictionary. I see two choices:
    • Use Dict only with direct_model. This is a bit unfortunate since direct_model is targeted to the users that are looking for more efficiency.
    • Add an MOI attribute like MOI.SmallVariableIndices that returns a Bool. I would expected all model to returns true except the MockOptimizer.

For solution 2) and 3) we could create an optimized container like

struct VariableIndicesDict{T}
    values::Vector{T}
end
function Base.setindex!(d::VariableIndicesDict, val, vi::VariableIndex)
    idx = vi.value
    if idx > length(values)
        resize!(values, idx)
    end
    values[idx] = val
end

It would behave just like a dictionary so in JuMP, we won't have to modify any code that use the dictionaries.
This VariableIndicesDict container could also be used by IndexMap (https://github.com/JuliaOpt/MathOptInterface.jl/blob/b42f03cf5584da8fce247ee089df9841726b42fe/src/Utilities/copy.jl#L67) and for storing variable names in the MOIU model (https://github.com/JuliaOpt/MathOptInterface.jl/blob/b42f03cf5584da8fce247ee089df9841726b42fe/src/Utilities/model.jl#L566)

What do you think ?

@mlubin
Copy link
Member Author

mlubin commented Nov 14, 2018

Removing the 0.19 milestone. We have a plan in #570 that requires a breaking MOI change.

@blegat
Copy link
Member

blegat commented Nov 18, 2018

I think none of the solutions I suggested in https://github.com/JuliaOpt/JuMP.jl/issues/1607#issuecomment-438219901 are acceptable.
We should stick to using Dict because of its small memory footprint (see #570 for more details).
I think we should do a change in JuMP similar to #535, #568.
That is, as most users won't need them, we just need to create them only the first time the user queries lower_bound_index, upper_bound_index, ...
This means we will have

@variable(model, x >= 0) # fast, dictionary fields are `nothing`
lbref = JuMP.lower_bound_index(x) # creates dictionary field `variable_to_lower_bound`
@variable(model, y <= 0) # fast, `variable_to_upper_bound` is `nothing`
@variable(model, z >= 0) # slow, `variable_to_lower_bound` is not `nothing`

A user that would like to access the SingleVariable constraints without the time overhead of dictionaries can replace

@variable(model, x >= 0.0)
lbref = JuMP.lower_bound_index(x)

by

@variable(model, x)
lbref = @constraint(model, x in MOI.GreaterThan(0.0))

@mlubin
Copy link
Member Author

mlubin commented Nov 18, 2018

Users expect that getting and setting bounds should be fast O(1) operations. Your suggestion leads to a very weird performance model where things are fast until after your run a seemingly harmless and unrelated operation.

For name lookup and variable deletion I think it's less surprising that these operations have setup costs.

@mlubin
Copy link
Member Author

mlubin commented Aug 20, 2019

As of jump-dev/JuMP.jl#2032:

using JuMP
using BenchmarkTools

function plain(N)
    model = Model()
    x = @variable(model, [1:N])
    return x
end
@btime plain(1_000_000)

function with_int_and_bounds(N)
    model = Model()
    x = @variable(model, [1:N], Int, lower_bound=-1, upper_bound=1)
    return x
end
@btime with_int_and_bounds(1_000_000)

yields

  100.329 ms (2000220 allocations: 126.46 MiB)
  291.198 ms (4000220 allocations: 154.98 MiB)

It's now 3x slower to create integer variable with bounds than plain variables (compared with 10x earlier). There's still room for improvement and unexplained allocations, so I'll leave this issue open.

@odow
Copy link
Member

odow commented Mar 2, 2021

I'm not sure what there is to do on the JuMP-side anymore. The 3x appears to come from adding bounds to the caching optimizer

using JuMP
using BenchmarkTools

function plain(N)
    model = MOIU.CachingOptimizer(
        MOIU.UniversalFallback(MOIU.Model{Float64}()),
        MOIU.AUTOMATIC,
    )
    x = MOI.add_variables(model, N)
    return x
end

function with_bounds(N)
    model = MOIU.CachingOptimizer(
        MOIU.UniversalFallback(MOIU.Model{Float64}()),
        MOIU.AUTOMATIC,
    )
    x = MOI.add_variables(model, N)
    for xi in x
        MOI.add_constraint(model, MOI.SingleVariable(xi), MOI.GreaterThan(0.0))
    end
    return x
end

@btime plain(1_000_000);
@btime with_bounds(1_000_000);

julia> @btime plain(1_000_000);
  36.684 ms (227 allocations: 27.65 MiB)

julia> @btime with_bounds(1_000_000);
  121.941 ms (1000227 allocations: 42.91 MiB)

Transferring this to MOI.

@odow odow transferred this issue from jump-dev/JuMP.jl Mar 2, 2021
@odow
Copy link
Member

odow commented Mar 2, 2021

Interestingly, the allocations are all for the caching optimizer (probably the try-catch?), but the time is spent in Model.

julia> function plain(N)
           model = MOIU.UniversalFallback(MOIU.Model{Float64}())
           x = MOI.add_variables(model, N)
           return x
       end
plain (generic function with 1 method)

julia> function with_bounds(N)
           model = MOIU.UniversalFallback(MOIU.Model{Float64}())
           x = MOI.add_variables(model, N)
           for xi in x
               MOI.add_constraint(model, MOI.SingleVariable(xi), MOI.GreaterThan(0.0))
           end
           return x
       end
with_bounds (generic function with 1 method)

julia> @btime plain(1_000_000);
  36.479 ms (200 allocations: 27.65 MiB)

julia> @btime with_bounds(1_000_000);
  94.173 ms (200 allocations: 27.65 MiB)

julia> function plain(N)
           model = MOIU.Model{Float64}()
           x = MOI.add_variables(model, N)
           return x
       end
plain (generic function with 1 method)

julia> function with_bounds(N)
           model = MOIU.Model{Float64}()
           x = MOI.add_variables(model, N)
           for xi in x
               MOI.add_constraint(model, MOI.SingleVariable(xi), MOI.GreaterThan(0.0))
           end
           return x
       end
with_bounds (generic function with 1 method)

julia> @btime plain(1_000_000);
  37.242 ms (175 allocations: 27.64 MiB)

julia> @btime with_bounds(1_000_000);
  95.231 ms (175 allocations: 27.64 MiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants