make recode! type stable #407

tiemvanderdeure · 2024-11-01T14:22:58Z

This PR improves the type-stability of recode!, providing a huge performance improvement, in particular when recoding normal (not categorical) arrays.

On this MWE

A = (rand('a':'c', 100_000))
@profview recode(A, 'b' => 1, 'a' => 2, 'c' => 3)

Before:

After:

For a ~200 times speed-up

tiemvanderdeure · 2024-11-01T15:42:02Z

Test failures are surely unrelated

tiemvanderdeure · 2024-11-18T10:56:39Z

Bump. This is a big performance improvement at no cost - could anyone take a look?

src/recode.jl

bkamins · 2024-11-19T14:38:00Z

@nalimilan should probably approve it.

I looked at the code and I left one comment where I was not 100% sure that the change is OK. In other places it looks OK.

nalimilan · 2024-11-20T10:19:21Z

src/recode.jl

-        for j in 1:length(opt_pairs)
-            p = opt_pairs[j]
+        for j in 1:length(pairs)
+            p = optimize_pair(pairs[j])


This looks problematic when pairs[j] is an array as it will create a Set for each item. Have you investigated why map(optimize_pair, pairs) isn't inferred correctly? In theory it should, and I think I had carefully checked performance when I wrote that function so I'm surprised it's not the case.

pairs[j] is never an array, it is always a Pair. I just got rid of the map and moved it inside the loop, but it still iterates over the exact same thing, so I don't think that should ever be problematic.

I was also surprised that map(optimize_pair, pairs) wasn't inferred, some change in the compiler must have triggered this. I stumbled into this easy fix, so then I didn't investigate further.

I mean when the first element is an array to that the call hits optimize_pair(pair::Pair{<:AbstractArray}).

The problem is that currently we create a Set once for each pair, and with this change we would create it for each pair for each entry in the input array.

You're right, it's in the nested loop. I don't know how I missed that.

tiemvanderdeure · 2024-11-20T15:17:50Z

Okay I dove into this a little more and was able to solve some problems but not all.

I now moved bound checking and converting arrays to sets into a seperate function which then calls _recode! which has the rest of the code. I also added some more dispatches to recode_in as this was type unstable for scalars even after this change.

This already does a lot, but is still type unstable if the pairs have different types (e.g. both scalars and arrays). I'm not really sure how to solve that.

EDIT: It doesn't seem to be type stable now anyway. I'm running this

using CategoricalArrays
A = rand(1:100, 1000, 1000)
dest = copy(A)
pairs_range = (((i*10-9):(i*10)) => i for i in 1:9)
pairs_scalar = collect(i+90 => i for i in 1:9)
@time recode!(dest, A, pairs_range...);
@time recode!(dest, A, pairs_scalar...);
@time recode!(dest, A, pairs_scalar..., pairs_range...);

and getting

  0.312937 seconds (192.78 k allocations: 10.059 MiB, 72.57% compilation time)
  0.091321 seconds (43.72 k allocations: 2.161 MiB, 73.41% compilation time)
  1.531255 seconds (5.70 M allocations: 92.121 MiB, 6.26% gc time, 11.30% compilation time)

on the first run and

  0.053362 seconds (78 allocations: 4.766 KiB)
  0.392593 seconds (89.91 k allocations: 1.373 MiB)
  0.792653 seconds (5.58 M allocations: 85.128 MiB, 1.24% gc time)

on the second. I for the life of me can't understand why the performance of the second call get worse after compilation.

tiemvanderdeure · 2024-11-20T15:21:35Z

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

nalimilan · 2024-11-20T15:26:33Z

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

Why do you need it to return true rather than false as before?

nalimilan · 2024-11-20T15:31:07Z

Regarding the origin of the issue, what surprises me is that I don't see any type instability using @descend or a direct invocation:

@code_warntype recode!(similar(A, Int), A, nothing, 'b' => 1, 'a' => 2, 'c' => 3)

So it's really hard to understand what's going on.

tiemvanderdeure · 2024-11-20T18:06:23Z

Why do you need it to return true rather than false as before?

Because I tried to collapse this x ≅ p.first || recode_in(x, p.first) into just one function, since we don't always need to do both checks anyway and this was the line that was marked type unstable. But it doesn't seem to work anyway.

So it's really hard to understand what's going on.

I tried the same things and am also pretty clueless at this point. Sometimes it seems to be type stable, and then when I run the same code again it allocates.

nalimilan · 2024-11-21T09:37:01Z

OK. And does the approach you tried at ef10966 work? That sounds like the most acceptable fix.

tiemvanderdeure · 2024-11-22T12:09:52Z

And does the approach you tried at ef10966 work?

It does not, unfortunately.

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

This already works nicely.

function _myrecode!(dest::AbstractArray{T}, src::AbstractArray, default, pairsfirst::Tuple, pairlast::Tuple) where {T}
    @inbounds for i in eachindex(dest)
        dest[i] = recode_element(T, src[i], default, pairsfirst, T.(pairlast))
    end
end

function recode_element(::Type{T}, x, default, recode_from, recode_to) where T
    j = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
    isnothing(j) ? recode_fallback(T, x, default) : T(recode_to[j])
end

recode_fallback(::Type{T}, x::Missing, default) where T = 
    throw(MissingException("missing value found, but dest does not support them: recode them to a supported value"))
recode_fallback(::Type{T}, x::Missing, default) where T >: Missing = missing
recode_fallback(::Type{T}, x, default) where T = T(default)
function recode_fallback(::Type{T}, x, ::Nothing) where T
    try
        T(x)
    catch err
        isa(err, MethodError) || rethrow(err)
        throw(ArgumentError("cannot `convert` value $(repr(x)) (of type $(typeof(x))) to type of recoded levels ($T). " *
                            "This will happen with recode() when not all original levels are recoded " *
                            "(i.e. some are preserved) and their type is incompatible with that of recoded levels."))
    end
    return T(x)
end

nalimilan · 2024-11-22T13:02:46Z

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

I'm not sure what's the advantage of using multiple dispatch rather than the current code TBH. @goto is fine in simple cases like this one.

tiemvanderdeure · 2024-11-22T13:22:50Z

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

True.

Okay, this is already much better, but still not totally fixed if src is categorical and dest is not.

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set. I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

using CategoricalArrays
src = rand(1:20, 1000, 1000)
dest = copy(src)
destcat = CategoricalArray{Int}(undef, size(src))
srccat = categorical(src)
pairs = (i => i+1 for i in 1:8)
pairs_array = ([11,12] => 2, [13,14] => 3)

@time recode!(dest, src, nothing, pairs..., pairs_array...);
@time recode!(destcat, src, nothing, pairs..., pairs_array...);
@time recode!(dest, srccat, nothing, pairs..., pairs_array...);

After compilation gives

  0.033817 seconds (34 allocations: 2.266 KiB)

  0.049040 seconds (80 allocations: 8.812 KiB)

  0.340654 seconds (2.95 M allocations: 118.264 MiB, 59.14% gc time)

tiemvanderdeure · 2024-11-22T14:17:44Z

If we want to avoid breaking any tests we can define recode_in(x::T, y::T) where T = x == y instead, even though it adds some unnecessary overhead.

nalimilan · 2024-11-22T14:37:36Z

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set.

What do you mean? It always returns a Bool AFAICT.

I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

Unfortunately I don't think this is correct in general I think, for example Any[1] in Any[[1]]. Why does the current code allocate for Char?

tiemvanderdeure · 2024-11-22T15:46:11Z

Honestly I'm just going off trial-and-error here, I don't understand the compiler well enough to make sense of this. Some of these results I'm getting seem inconsistent so there might be some heuristics at play that I'm not so familiar with (the compiler giving up when Tuples become bigger than a certain number or something).

There are other things I don't understand like findfirst being much faster than the for loop.

Any[1] in Any[[1]]

This won't be a problem currently because Any[[1]] will be converted to a Set. I can't think of any cases where this would be a problem and the only test that fails is the one that explicitly tests that recode_in(1,1) == true. That said I'm sure there must be a better solution out there, it's just been a surprisingly though nut to crack and this is just the solution I found.

To reproduce the issues that I was running into:

using CategoricalArrays
A = (rand('a':'z', 1000, 1000))
recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)
@profview recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)

With @inline recode_in(x, set::Set) = x in set and/or without @inline recode_in(::T, ::T) where T = false, this gives a flame graph like this:

With both of those changes it's completely blue.

nalimilan · 2024-11-28T09:50:40Z

Actually after asking on Slack this is a known issue (probably a combination of JuliaLang/julia#32834 and JuliaLang/julia#54390). This is due to to use of varargs, which are inferred but not fully specialized on.

Regarding the issue with recode_in (which is related to JuliaLang/julia#56587), it seems that allocations can be fixed using this pattern:

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

Could you try this instead of changing the definition of recode_in?

tiemvanderdeure · 2024-11-28T10:06:42Z

This works. I wonder if this whole PR can be replaced by just that one @inline

tiemvanderdeure · 2024-11-28T10:36:06Z

Okay, it seems that this works, but not for julia 1.0. Do you know how to make it work there? Or maybe it would be okay to require 1.6?

nalimilan · 2024-11-28T10:41:00Z

Yes I think at this point we can require Julia 1.6.

tiemvanderdeure · 2024-11-28T11:01:34Z

This use of @inline requires 1.8, actually. So we need to define a separate function

https://docs.julialang.org/en/v1/base/base/#Base.@inline

tiemvanderdeure · 2024-11-28T11:47:36Z

So I though that this

@inline findfirstrecode(x, recode_from) = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)

was going to be the same as

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

But apparently the former allocates whereas the latter doesn't. But using @inline as in the latter requires julia 1.8. Would requiring julia 1.8 be okay with you @nalimilan ?

nalimilan · 2024-12-06T13:52:04Z

Sorry for the delay. I'm not too happy about requiring Julia 1.8, but you can use Compat.jl >= 3.37 for that.

make recode! type stable

1db6279

tiemvanderdeure mentioned this pull request Nov 1, 2024

performance improvements FluxML/MLJFlux.jl#281

Merged

bkamins reviewed Nov 19, 2024

View reviewed changes

src/recode.jl Outdated Show resolved Hide resolved

nalimilan reviewed Nov 20, 2024

View reviewed changes

tiemvanderdeure added 2 commits November 20, 2024 15:46

move optimize_pair away from main function

ef10966

improve type stability of recode_in

1688fb2

un-inline recode_in

b412ce4

pass first and last pairs as tuples

0581607

recode_in is always false for identical types

06b9265

tiemvanderdeure added 2 commits November 28, 2024 11:20

revert changes to recode_in, inline findfirst

74123a6

do not splat pairs passed to _recode!

15a09db

require julia 1.6

c419cf5

tiemvanderdeure added 2 commits November 28, 2024 12:02

Revert "require julia 1.6"

8e2db47

define inlined findfirstrecode

2a446da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make recode! type stable #407

make recode! type stable #407

tiemvanderdeure commented Nov 1, 2024

tiemvanderdeure commented Nov 1, 2024

tiemvanderdeure commented Nov 18, 2024

bkamins commented Nov 19, 2024

nalimilan Nov 20, 2024

tiemvanderdeure Nov 20, 2024 •

edited

Loading

nalimilan Nov 20, 2024

nalimilan Nov 20, 2024 •

edited

Loading

tiemvanderdeure Nov 20, 2024

tiemvanderdeure commented Nov 20, 2024 •

edited

Loading

tiemvanderdeure commented Nov 20, 2024

nalimilan commented Nov 20, 2024

nalimilan commented Nov 20, 2024

tiemvanderdeure commented Nov 20, 2024

nalimilan commented Nov 21, 2024

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 22, 2024

tiemvanderdeure commented Nov 22, 2024 •

edited

Loading

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 22, 2024

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

nalimilan commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

nalimilan commented Dec 6, 2024

make recode! type stable #407

Are you sure you want to change the base?

make recode! type stable #407

Conversation

tiemvanderdeure commented Nov 1, 2024

tiemvanderdeure commented Nov 1, 2024

tiemvanderdeure commented Nov 18, 2024

bkamins commented Nov 19, 2024

nalimilan Nov 20, 2024

Choose a reason for hiding this comment

tiemvanderdeure Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

nalimilan Nov 20, 2024

Choose a reason for hiding this comment

nalimilan Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

tiemvanderdeure Nov 20, 2024

Choose a reason for hiding this comment

tiemvanderdeure commented Nov 20, 2024 • edited Loading

tiemvanderdeure commented Nov 20, 2024

nalimilan commented Nov 20, 2024

nalimilan commented Nov 20, 2024

tiemvanderdeure commented Nov 20, 2024

nalimilan commented Nov 21, 2024

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 22, 2024

tiemvanderdeure commented Nov 22, 2024 • edited Loading

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 22, 2024

tiemvanderdeure commented Nov 22, 2024

nalimilan commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

nalimilan commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

tiemvanderdeure commented Nov 28, 2024

nalimilan commented Dec 6, 2024

tiemvanderdeure Nov 20, 2024 •

edited

Loading

nalimilan Nov 20, 2024 •

edited

Loading

tiemvanderdeure commented Nov 20, 2024 •

edited

Loading

tiemvanderdeure commented Nov 22, 2024 •

edited

Loading