Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference improvements #2691

Merged
merged 25 commits into from
Apr 10, 2021
Merged

Inference improvements #2691

merged 25 commits into from
Apr 10, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 30, 2021

Fixes #2516

There is still more work to do of similar kind, but I least I feel I have a handle what kind of thins need to be done. In particular I think that instead of @nospecialize it is better to use Ref{Any} and use sentinel of target type instead of nothing.

Timings:

This PR:

julia> using DataFrames

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300]);

julia> gd = groupby(df, :b);

julia> transform(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.075093 seconds (135.93 k allocations: 8.132 MiB, 102.09% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.145006 seconds (134.96 k allocations: 8.062 MiB, 42.46% gc time, 99.74% compilation time)

julia> combine(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.081810 seconds (186.94 k allocations: 11.426 MiB, 99.63% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.081787 seconds (186.94 k allocations: 11.426 MiB, 99.64% compilation time)

julia> select(gd, :b => (x -> x)); # lacking precompile statements

julia> @time select(gd, :b => (x -> x));
  0.100346 seconds (184.56 k allocations: 11.188 MiB, 7.14% gc time, 99.55% compilation time)

julia> @time select(gd, :b => (x -> x));
  0.089548 seconds (184.56 k allocations: 11.195 MiB, 99.49% compilation time)

main:

julia> using DataFrames

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300]);

julia> gd = groupby(df, :b);

julia> transform(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.138058 seconds (452.35 k allocations: 27.775 MiB, 6.92% gc time, 101.14% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.139831 seconds (451.38 k allocations: 27.706 MiB, 7.19% gc time, 99.74% compilation time)

julia> combine(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.168574 seconds (725.69 k allocations: 44.179 MiB, 5.62% gc time, 99.82% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.169533 seconds (725.67 k allocations: 44.177 MiB, 5.50% gc time, 99.82% compilation time)

julia> select(gd, :b => (x -> x)); # lacking precompile statements

julia> @time select(gd, :b => (x -> x));
  0.179172 seconds (727.31 k allocations: 44.211 MiB, 5.53% gc time, 99.74% compilation time)

julia> @time select(gd, :b => (x -> x));
  0.180989 seconds (727.33 k allocations: 44.226 MiB, 5.64% gc time, 99.74% compilation time)

0.22.6 release (which shows there is still more work to do as we significantly increased the complexity of methods because of adding multi threading):

julia> using DataFrames

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300]);

julia> gd = groupby(df, :b);

julia> transform(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.026336 seconds (28.17 k allocations: 1.457 MiB, 106.20% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.029306 seconds (27.20 k allocations: 1.386 MiB, 99.09% compilation time)

julia> combine(gd, (d -> (fb = first(d.b),))); # lacking precompile statements

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.066670 seconds (215.26 k allocations: 12.759 MiB, 13.24% gc time, 99.66% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.054908 seconds (215.26 k allocations: 12.758 MiB, 99.58% compilation time)

julia> select(gd, :b => (x -> x)); # lacking precompile statements

julia> @time select(gd, :b => (x -> x));
  0.103483 seconds (258.25 k allocations: 15.512 MiB, 99.60% compilation time)

julia> @time select(gd, :b => (x -> x));
  0.103862 seconds (258.25 k allocations: 15.518 MiB, 99.59% compilation time)

@bkamins bkamins linked an issue Mar 30, 2021 that may be closed by this pull request
@bkamins bkamins added this to the 1.0 milestone Mar 30, 2021
@bkamins
Copy link
Member Author

bkamins commented Mar 30, 2021

@nalimilan - if you would be OK with this I propose to review this PR as is and then merge it. Next I would assume that I will make a series of small PRs focusing on one area of the code base. Otherwise there is a risk that it will be very hard to review one big PR. The point is that these code transformations seem simple but can be quite tricky to make them correctly as our code base is complex.

OK?

@nalimilan
Copy link
Member

Maybe let's wait a bit for Tim to comment at #2597 (comment) in case he has simpler solutions to suggest?

@bkamins
Copy link
Member Author

bkamins commented Mar 30, 2021

OK.

But let me comment on timings. Running test/grouping.jl gives:

this PR

500.453600 seconds (1.17 G allocations: 66.418 GiB, 5.62% gc time) # first run
190.938172 seconds (604.51 M allocations: 33.569 GiB, 6.05% gc time, 0.09% compilation time) # second run

main

625.085487 seconds (1.74 G allocations: 101.231 GiB, 6.68% gc time) # first run
313.050710 seconds (1.03 G allocations: 60.104 GiB, 8.35% gc time, 0.05% compilation time) # second run

(this is before generating new precompile statements adjusted to this PR)

In general this is much better improvement than what @timholy reported in #2563:

Together these changes reduce the time needed to run the "grouping.jl" test file by about 15%, from 283s to 248s.

@bkamins
Copy link
Member Author

bkamins commented Apr 1, 2021

After the additional changes this is the timing of grouping tests on this PR (first and second run in a clean session):

julia> @time include("/home/bkamins/.julia/dev/DataFrames/test/grouping.jl");
442.631665 seconds (1.13 G allocations: 63.886 GiB, 5.60% gc time)
julia> @time include("/home/bkamins/.julia/dev/DataFrames/test/grouping.jl");
175.880084 seconds (574.53 M allocations: 31.706 GiB, 5.66% gc time, 0.10% compilation time)

and this is a timing of select:

this PR

@time include("/home/bkamins/.julia/dev/DataFrames/test/select.jl");
40.685876 seconds (84.69 M allocations: 4.990 GiB, 3.31% gc time, 0.01% compilation time)
@time include("/home/bkamins/.julia/dev/DataFrames/test/select.jl");
17.656625 seconds (18.14 M allocations: 1.080 GiB, 2.39% gc time, 0.00% compilation time)

vs
main

@time include("/home/bkamins/.julia/dev/DataFrames/test/select.jl");
77.544648 seconds (135.29 M allocations: 8.021 GiB, 3.42% gc time)
@time include("/home/bkamins/.julia/dev/DataFrames/test/select.jl");
44.855485 seconds (53.96 M allocations: 3.255 GiB, 3.22% gc time, 0.00% compilation time)

@bkamins
Copy link
Member Author

bkamins commented Apr 1, 2021

I have decided to add some more Ref{Any} + despecialize more selectively. The consequence is (as usual - firs and second run):

  1. for grouping tests:
408.044835 seconds (1.09 G allocations: 61.093 GiB, 4.14% gc time)
172.221172 seconds (574.78 M allocations: 31.720 GiB, 6.02% gc time, 0.10% compilation time)

(here we have an uniform improvement)

  1. for select tests:
43.842888 seconds (93.69 M allocations: 5.518 GiB, 3.42% gc time)
14.952440 seconds (19.92 M allocations: 1.184 GiB, 2.36% gc time, 0.00% compilation time)

(here first run is a bit slower, but successive run is faster because we specialize more)

@bkamins
Copy link
Member Author

bkamins commented Apr 1, 2021

@nalimilan - I would stop with changes at this point. Otherwise it is days of work.

I have also investigated if we can improve filter, but there the highest cost is broadasting, on which we do not have an influence.

In the original case (the topmost example) currently we have:

julia> x = @snoopi_deep transform(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.079086/0.096312 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 4 direct children

so we we spend ~0.015 seconds on inference, but majority of this time is spent in _combine_rows_with_first_task! which we need to specialize.

The problem is that in 0.22 branch the same function is much faster:

julia> x = @snoopi_deep transform(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.023545/0.026405 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 2 direct children

But the major reason for this is addition of support of threading (note that the major cost is not the inference, but code generation cost that I do not know how to reduce).

So my proposal is:

  1. could you please review this PR and I would merge it.
  2. then we finalize the remaining PRs (in particular the last performance related PR you mentioned you would want to make)
  3. then I hope you can make a PR updating precompile statements
  4. then we do freeze for 1-2 weeks and make a release (I assume all this could happen in April 2021, so that we can have some time before JuliaCon 2021 to do bug fixing in patch releases if anything would be found)

Thank you!

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timings are impressive. It's a shame that we cannot achieve the same with @nospecialize.

Comment on lines 455 to +457
function _add_multicol_res(res::AbstractDataFrame, newdf::DataFrame, df::AbstractDataFrame,
colnames::AbstractVector{Symbol},
allow_resizing_newdf::Ref{Bool}, @nospecialize(fun),
allow_resizing_newdf::Ref{Bool}, wfun::Ref{Any},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For functions like this one that don't call fun directly but just pass it to other functions, the compiler should only specialize on Function, not on the particular passed function. Are you sure this change makes a difference? (Same for other _add_multicol_res methods.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler should only specialize on Function, not on the particular passed function.

No I am not sure, but are you sure the opposite? I.e. is this behavior you describe documented and guaranteed by the compiler?

Also note the following. This is on 0.22 branch:

julia> using MethodAnalysis

julia> using DataFrames

julia> combine(DataFrame(), x->(a=1,));

julia> combine(DataFrame(), x->(a=1,));

julia> combine(DataFrame(), x->(a=1,));

julia> methodinstances(DataFrames._add_multicol_res)
11-element Vector{Core.MethodInstance}:
 MethodInstance for _add_multicol_res(::AbstractDataFrame, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Type, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::AbstractDataFrame, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Function, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::AbstractMatrix{T} where T, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Type, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::AbstractMatrix{T} where T, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Function, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple{var"#s267", var"#s266"} where {var"#s267", var"#s266"<:Tuple{Vararg{AbstractVector{T} where T, N} where N}}, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Type, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple{var"#s267", var"#s266"} where {var"#s267", var"#s266"<:Tuple{Vararg{AbstractVector{T} where T, N} where N}}, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Function, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple{(:a,), Tuple{Int64}}, ::DataFrame, ::DataFrame, ::Vector{Symbol}, ::Base.RefValue{Bool}, ::Any, ::Nothing, ::Bool, ::Nothing)
 MethodInstance for _add_multicol_res(::NamedTuple, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Type, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Function, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::DataFrameRow, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Type, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::DataFrameRow, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Function, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})

and this is on this PR:

julia> using MethodAnalysis

julia> using DataFrames

julia> combine(DataFrame(), x->(a=1,));

julia> combine(DataFrame(), x->(a=1,));

julia> combine(DataFrame(), x->(a=1,));

julia> methodinstances(DataFrames._add_multicol_res)
6-element Vector{Core.MethodInstance}:
 MethodInstance for _add_multicol_res(::AbstractDataFrame, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::AbstractMatrix{T} where T, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple{var"#s281", var"#s280"} where {var"#s281", var"#s280"<:Tuple{Vararg{AbstractVector{T} where T, N} where N}}, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::NamedTuple, ::DataFrame, ::DataFrame, ::Vector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Nothing, ::Bool, ::Nothing)
 MethodInstance for _add_multicol_res(::NamedTuple, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})
 MethodInstance for _add_multicol_res(::DataFrameRow, ::DataFrame, ::DataFrame, ::AbstractVector{Symbol}, ::Base.RefValue{Bool}, ::Base.RefValue{Any}, ::Union{Nothing, Int64, AsTable, AbstractVector{Int64}}, ::Bool, ::Union{Nothing, Type{AsTable}, AbstractVector{Symbol}})

so as you can see we generate two times less method instances for _add_multicol_res. Under Julia 1.6 indeed the method is not specialized for a specific function (but what will be the rules in Julia 1.7 or what were the rules in e.g. Julia 1.2?), but since we allow Base.Callable it is specialized both for Function and for Type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that's a documented behavior. But yeah if both Function and Type are passed then we will get two different methods. I hadn't realized Type was used there in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - Type is not passed. These methods get generated even if Type is never passed. Just because they may be passed they get generated

copycols, keeprows)
end

function _manipulate(df::AbstractDataFrame, @nospecialize(normalized_cs), copycols::Bool, keeprows::Bool)
function _manipulate(df::AbstractDataFrame, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _manipulate(df::AbstractDataFrame, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
function _manipulate(df::AbstractDataFrame, normalized_cs::AbstractVector, copycols::Bool, keeprows::Bool)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Vector{Any} to make sure that in the future we have only one method instance of _manipulate with respect to this argument (so that we by accident do not introduce compilation latency).

src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
Comment on lines 39 to 42
conditions = Any[]

# subset allows a transformation specification without a target column name or a column
for (i, a) in enumerate(args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a comprehension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to eliminate _proess_subset_pair function that would get specialized on the types of arguments passed. However, I can do additional timings if this is a significant compilation overhead (probably not as this is a short function).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can use a comprehension without _process_subset_pair, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it would be very ugly then as the whole code:

        if a isa ColumnIndex
            a => Symbol(:x, i)
        elseif a isa Pair{<:Any, <:Base.Callable}
            first(a) => last(a) => Symbol(:x, i)
        else
            throw(ArgumentError("condition specifier $a is not supported by `subset`"))
        end

would have to go into the body of the comprehension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matter of taste I guess. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed - please have a look how it reads now.

src/groupeddataframe/splitapplycombine.jl Show resolved Hide resolved
src/groupeddataframe/splitapplycombine.jl Show resolved Hide resolved
src/abstractdataframe/selection.jl Show resolved Hide resolved
Comment on lines 415 to 420
@nospecialize(fun))
wfun::Ref{Any})
fun = only(wfun)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just occurred to me that this syntax works: (fun,)::Ref. That avoids adding one line for each argument and playing with argument names. The Any doesn't add anything AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use Ref{Any} to ensure we have exactly one method instance generated (same comment as in several places above). This is mostly for the future development.

Regarding (fun,)::Ref{And} this is smart indeed. Do you feel it will be more readable than fun = only(wfun) (I wanted to keep consistent style as sometimes we just pass through wfun without unwrapping). But I am OK with both - just wondering which method is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit more readable, especially since it allows keeping fun as the name everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I will change then. But when Ref{Any} is only passed around and not accessed I will keep wfun to avoid unwrap-rewrap operation.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Apr 2, 2021

@nalimilan - we should think if functions like:

  • _combine_rows_with_first
  • _combine_tables_with_first
  • groupreduce!
  • _combine_rows_with_first_task!

could be split into parts as described in #2563 to reduce their compile cost. I have looked at them but I feel that you know this part of code better.

All these functions are quite long but we have to specialize them (so Ref{Any} is not something we should use there). But maybe you will see how they could be split to extract-out their parts that do not depend on the changing arguments? Thank you!

@nalimilan
Copy link
Member

Maybe we could avoid specializing _combine_rows_with_first! now that it only dispatches work to _combine_rows_with_first_task!. It could even be worth passing a tuple of values and a tuples of names to _combine_rows_with_first_task! rather than a NamedTuple to avoid specializing on column names (wrap_row would also have to do that in a small @noinline helper function). I think I intended to do that in the original implementation but I'm not sure what happened in the end.

The same reasoning applies to _combine_tables_with_first!, and even more so probably as the overhead per group is already higher anyway. But first we should probably rework it to use multithreading with a similar system as _combine_rows_with_first!. The gain to multithreading should be more limited in general since we need to take a lock after processing each group before calling append_rows!, but if the user-provided transformation is costly the gain could be large (e.g. if you fit a model for each group).

For groupreduce!, we could move the core loop to a separate function and prevent specialization of the caller.

Comment on lines 354 to 357
function _gen_colnames(@nospecialize(res), newname::Union{AbstractVector{Symbol},
Type{AsTable}, Nothing})
function _gen_colnames(res, newname::Union{AbstractVector{Symbol},
Type{AsTable}, Nothing})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you think it's better to specialize? :-D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks show that it is minimally better not to specialize given our test set :) - these things are super strange. However the difference is almost imperceptible, so I reverted @nospecialize as probably in practical loads not-specializing here might be better.

src/groupeddataframe/complextransforms.jl Outdated Show resolved Hide resolved
src/groupeddataframe/complextransforms.jl Outdated Show resolved Hide resolved
src/groupeddataframe/complextransforms.jl Outdated Show resolved Hide resolved
src/groupeddataframe/splitapplycombine.jl Outdated Show resolved Hide resolved
src/groupeddataframe/splitapplycombine.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Apr 3, 2021

Maybe we could avoid specializing _combine_rows_with_first! now that it only dispatches work to _combine_rows_with_first_task!.

But there is a loop in this function still (over batches), so I was afraid to do this.

It could even be worth passing a tuple of values and a tuples of names to _combine_rows_with_first_task!

This would indeed be a good change. In general passing around NamedTuple is not a good idea unfortunately.
(same for the next comment)

For groupreduce!, we could move the core loop to a separate function and prevent specialization of the caller.

Agreed


In summary: I have a feeling that it would be best to merge this PR and then do these actions as separate PR or even PRs. Do you think the same?

bkamins and others added 4 commits April 3, 2021 14:53
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Apr 9, 2021

@nalimilan - do you have any more comments to this PR. Of course no rush, but I am asking because I would separate method-splitting to a different PR (I can do it later) and here just concentrate on despecialization and merge this one.

@nalimilan
Copy link
Member

I've already approved. :-)

@bkamins
Copy link
Member Author

bkamins commented Apr 10, 2021

This is a big change so I wanted to double check. Thank you!

@bkamins bkamins merged commit 50af82e into main Apr 10, 2021
@bkamins bkamins deleted the bk/inference_improvements branch April 10, 2021 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants