Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modify! function for lookup/update/insert/delete in one go #33758

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tkf
Copy link
Member

@tkf tkf commented Nov 4, 2019

This implements set! function I proposed in #31367 (comment).

See #31367 for discussion on alternative API (e.g., token-based).

Quoting the docstring:

modify!(f, dict::AbstractDict, key)

Lookup and then update, insert or delete in one go without re-computing the hash.

f is a callable object that must accept Union{Some{V}, Nothing} and return Union{T, Some{T}, Nothing} where T is a type convert-able to the value type V. The value Some(d[key]) is passed to f if haskey(d, key); otherwise nothing is passed. If f returns nothing, corresponding entry in the dictionary d is removed. If f returns non-nothing value x, something(x) is inserted to d.

modify! returns whatever f returns as-is.

Examples

julia> dict = Dict("a" => 1);

julia> modify!(dict, "a") do val
           Some(val === nothing ? 1 : something(val) + 1)
       end
Some(2)

julia> dict
Dict{String,Int64} with 1 entry:
  "a" => 2

julia> dict = Dict();

julia> modify!(dict, "a") do val
           Some(val === nothing ? 1 : something(val) + 1)
       end
Some(1)

julia> dict
Dict{Any,Any} with 1 entry:
  "a" => 1

julia> modify!(_ -> nothing, dict, "a")

julia> dict
Dict{Any,Any} with 0 entries

else
if idx > 0
h.age += 1
@inbounds h.keys[idx] = key
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this line h.keys[idx] = key from get!. But why is this required? (I'm new to Dict code.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s going on here is WeakKeyDict uses Dict internally and sets a finalizar to the keys to mutate the Dict. Unfortunately this usage has leaked into the implementation of Dict.

Any time the GC is called (to allocate) it may call finalizers which might mutate the dictionary. To protect itself it checks age whenever an allocation might occur (depends on the dictionary key and element types, etc).

This works in single threaded concurrency - but I have no idea if the implementation is valid under multithreading?

(@vtjnash did I get those details right?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So, is it effectively GC.@preserve key? And h.keys[idx] = key is used because of bootstrapping issue or something? Or maybe it's just cheaper this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't actually address your original question (I was thinking of the h.age += 1 line... and actually now I look at it I'm not sure that is necessary here or not, since I don't see where the GC might run after ht_keyindex2! or how changing h.vals[idx] should affect the operation WeakKeyDict, but I would defer to Jameson on that).

So the ht_keyindex2! function prepares a slot where a key (and value) might reside but doesn't actually populate them with anything. A positive token means the slot already exists, so I think you just need to populate the new value (delete this line).

In the idx < 0 case the _setindex! function populates h.slots[-idx], h.keys[-idx] and h.vals[-idx].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, existing get! method I mentioned was this:

julia/base/dict.jl

Lines 446 to 464 in 6eebbbe

function get!(default::Callable, h::Dict{K,V}, key::K) where V where K
index = ht_keyindex2!(h, key)
index > 0 && return h.vals[index]
age0 = h.age
v = convert(V, default())
if h.age != age0
index = ht_keyindex2!(h, key)
end
if index > 0
h.age += 1
@inbounds h.keys[index] = key
@inbounds h.vals[index] = v
else
@inbounds _setindex!(h, v, key, -index)
end
return v
end

h.keys[index] = key was added in ff4706b (#9595)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. Still not sure what it protects against... it would be ideal to hear from Jeff or Jameson, but leaving it in doesn’t hurt (except maybe performance slightly).

@tkf tkf mentioned this pull request Nov 4, 2019
base/abstractdict.jl Outdated Show resolved Hide resolved
base/abstractdict.jl Outdated Show resolved Hide resolved
base/abstractdict.jl Outdated Show resolved Hide resolved
@@ -465,6 +465,63 @@ function hash(a::AbstractDict, h::UInt)
hash(hv, h)
end

"""
modify!(f, d::AbstractDict{K, V}, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify! sounds a bit like a weird name to me. update! sounds better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, update! and set! were suggested to be simpler shorthands of what I call modify! in this PR. I don't mind renaming to something else, though.

Copy link
Member

@clarkevans clarkevans Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from SQL nomenclature, I think modify! is superior to update! since this operation can also mean removing an entry... not just updating an entry's value. Perhaps set! is the best choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-visiting this naming and reading @clarkevans's comment, I actually think modify! is still the best name so far. I think it's better at conveying the point that this API could be used for updating, inserting, or removing an entry. In particular, update! (and with a lesser extent, set!) sounds a bit weak at reminding the reader of the code about this. So, my preference is modify! > set! >> update!.

@nalimilan What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both "modify" and "update" refer to the dict rather than the value AFAICT, so they both cover the case where you remove the value.

set! could be interesting due to the parallel with get!. Not sure whether it's a good parallel or not...

Copy link
Member

@clarkevans clarkevans Jul 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer set! over modify! aesthetically, and modify! to update!. What is the resistance of calling it set!? It's a very interesting function. Given that it'll be most likely used in a do val .... end block, a few more characters don't matter that much, perhaps, as andyferris suggests, set_or_delete!. Anyway, in my mind, set! given a function returning nothing seems perfectly compatible with removing a key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkevans I think @nalimilan mentioned a good point that set! sounds like a counter part of get. However, I don't think it's a good parallel. If you consider set! as a counter part of get(::Container, ::Key) ::Union{Nothing,Some{Value}}, the signature of set! should be set!(::Container, ::Union{Nothing,Some{Value}}, ::Key). This is how getindex and setindex! work.

It is conceivable to have this set! API since it's very useful when you want to "rollback" a container:

original = get(dict, key)  # works even if `key` does not exist
dict[key] = tmp
try
    ... do something with dict ...
finally
    # remove the `key` if it didn't exist; or insert the `original` value if it did.
    set!(dict, original, key)
end

So, I prefer using another verb for the API for this PR.

Both "modify" and "update" refer to the dict rather than the value AFAICT

@nalimilan But doesn't get refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs like setindex! and setproperty! seem to refer to a slot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkevans I think @nalimilan mentioned a good point that set! sounds like a counter part of get. However, I don't think it's a good parallel. If you consider set! as a counter part of get(::Container, ::Key) ::Union{Nothing,Some{Value}}, the signature of set! should be set!(::Container, ::Union{Nothing,Some{Value}}, ::Key). This is how getindex and setindex! work.

Well currently we have get(f::Function, collection, key), whose signature is very the same as modify!(f, d, key) in this PR. If we added get(collection, key)::Union{Nothing,Some} it could make sense to add set!(collection, value::Union{Nothing,Some}}, key) too.

@nalimilan But doesn't get refer to a "slot" rather than the dict? So, isn't it strange that the mutation API refers to the whole dict rather than a slot? Also, other mutation verbs like setindex! and setproperty! seem to refer to a slot.

I find it hard to tell TBH. It's hard to argue about these things, and if we wanted a fully consistent naming scheme maybe get should be called getkey...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a readability standpoint, if naively read set! I'd probably expect the key to exist after the operation. Which is also just like get!. I'd feel surprised if an operation named set! deleted something, to be honest!

it could make sense to add

I suppose a simpler modify!(d, key, v::Union{Nothing, Some{Value}}) could be nice. I note in other places in Base we do seem to be struggling whether dispatching on ::Function is desirable and these arguments seemed to be getting widened to ::Any. Partly because some callable things are Function or Callable (like python objects). (I was also sensing that there seemed to be some more general arguments that we should be able to know the semantics of a method by the function name and number of arguments only.)

But doesn't get refer to a "slot" rather than the dict?

I agree - I have always read these verbs as refering to the "slot" on the container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about crud! it does create, update, delete... and retrieve. There is also merge! aka upsert! from SQL. I think crud! is a new kind of word that is fun to say and perhaps reflects this hybrid semantics?

base/abstractdict.jl Outdated Show resolved Hide resolved
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
@quinnj
Copy link
Member

quinnj commented Jul 19, 2020

Bump; I run into wanting this functionality all the time.

Quick question though: the docs mention being able to return T from f; does that just mean you don't have to wrap the return value in Some if you don't want? Or if you're not worried about conflicting with nothing?

Also, looks like this PR needs a quick rebase w/ exports.jl

@tkf
Copy link
Member Author

tkf commented Jul 19, 2020

I've created and been using the mutate-or-widen version modify!! so I kinda forgot about keep pining about this.

I think the only blocker is the naming although it'd be nice if someone who knows the interaction between Dict and WeakKeyDict well can have a look at the implementation (see above discussion with @andyferris).

the docs mention being able to return T from f; does that just mean you don't have to wrap the return value in Some if you don't want? Or if you're not worried about conflicting with nothing?

Yeah, I thought about requiring Some{T} but I wasn't sure about people's preference. Sometimes you know the result won't be nothing (e.g., writing literal return 1). What do you think? Is it better to have more strict API that requires Union{Some{T},Nothing}? Or is it fine as-is and the problem is more in the documentation?

@tkf tkf requested review from vtjnash and JeffBezanson July 19, 2020 23:53
@tkf
Copy link
Member Author

tkf commented Jul 19, 2020

@vtjnash @JeffBezanson It'd be great if you guys can have a look at it, especially the interaction with WeakKeyDict.

@andyferris
Copy link
Member

I think the only blocker is the naming

I also think update! is a poor choice for something that might delete an element because of prior meanings (SQL etc). Not that modify is much better in an English-language sense (but at least it doesn't come with the baggage that words like update and mutate have). The other direction to go is be more explicit - set_or_delete! or whatever.

although it'd be nice if someone who knows the interaction between Dict and WeakKeyDict well can have a look at the implementation

@vtjnash @JeffBezanson To clarify my earlier point, I didn't understand why the key would written in the case index > 0, and if not then since we are only changing a value in an existing slot is it actually required to increment age?

@andyferris
Copy link
Member

I still think having some functionaility like this would be great - just wanted to note there's a good discussion (FAQ) at the end a JavaScript proposal for adding emplace to JS Map: https://github.com/tc39/proposal-upsert.

Comparing with this PR, it doesn't include deletion in that API.

Comparing with #12157 it is using a closure-passing style rather than a reference-returning style.

@LilithHafner
Copy link
Member

Bump: I'm messing around with Base.ht_keyindex2 and I'd really rather be using update!.

base/abstractdict.jl Outdated Show resolved Hide resolved
base/dict.jl Outdated Show resolved Hide resolved
base/dict.jl Outdated Show resolved Hide resolved
julia> modify!(dict, "a") do val
Some(val === nothing ? 1 : something(val) + 1)
end
Some(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this result Some(1) => Some(2), so you get back both old and new, instead of just the new value (similar to replacefield!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now implemented in 28900ad.

tkf and others added 3 commits November 12, 2021 16:13
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Joaquim Dias Garcia <joaquimdgarcia@gmail.com>
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)`
is inserted or updated in `d` (equivalent to `d[key] = something(x)` but more efficient).

Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what value is referred to here, as no single value is "returned". If it's about old, then better to name it explicitly, e.g.

Suggested change
Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation
Whether `old` has type `Some{V}(value)` or `Some{typeof(value)}(value)` is an implementation

Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation
defined behavior. The callback function `f` must use `old === nothing` or `old isa Some`
instead of `old isa Some{valtype(d)}` unless the type of the dictionary `d` is known to
define a certain behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a very rare event to know about a certain specific behavior of a dict type. I would just remove "unless ...", and change "must use" to

The callback function f should generally use old === nothing or old isa Some instead ...

(just a suggestion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer requreing the dict to return Some{V}

@nalimilan
Copy link
Member

the docs mention being able to return T from f; does that just mean you don't have to wrap the return value in Some if you don't want? Or if you're not worried about conflicting with nothing?

Yeah, I thought about requiring Some{T} but I wasn't sure about people's preference. Sometimes you know the result won't be nothing (e.g., writing literal return 1). What do you think? Is it better to have more strict API that requires Union{Some{T},Nothing}? Or is it fine as-is and the problem is more in the documentation?

Maybe better be strict for now and require returning either nothing or a Some{T} value? Then if experience shows that it would be convenient to be able to return a T value then we can easily be more flexible. If we're too flexible now then I'm afraid we might regret it (forcing people to handle the case where T >: Nothing is good for safety).

@LilithHafner
Copy link
Member

Perhaps require Some only when nothing isa eltype(values(dict))?

Comment on lines +580 to +586
if haskey(dict, key)
old = Some{V}(dict[key])
val = f(old)
else
old = nothing
val = f(old)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the compiler do this union splitting automatically?

Comment on lines +409 to +411
@inbounds vold = h.vals[idx]
vold = Some{V}(vold)
vnew = f(vold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the abstract dict case we use old and val here we use vold and vnew and in the documentation we use old and new. We should probably be consistent and I prefer old and new

Whether `Some{V}(value)` or `Some{typeof(value)}(value)` is returned is an implementation
defined behavior. The callback function `f` must use `old === nothing` or `old isa Some`
instead of `old isa Some{valtype(d)}` unless the type of the dictionary `d` is known to
define a certain behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer requreing the dict to return Some{V}

@inbounds _setindex!(h, something(vnew), key, -idx)
end
end
return vold => vnew
Copy link
Contributor

@matthias314 matthias314 Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While playing around with the code, I noticed that the last line above allocates unless I define modify! with @inline. EDIT: Even with @inline it allocates if one uses the return value.

I also noticed that the code runs a bit slower on master compared to 1.8.2 and 1.9.0-beta2 (9.0ns vs 7.7ns, always using @inline and discarding the return value).

Code used
# negate an existing entry
function neg!(h, key)
    x, y = modify!(h, key) do sx
        sx === nothing ? nothing : -something(sx)
    end
    # allocation:
    # return x == y
    # no allocation:
    return nothing
end

h = Dict('a' => 4, 'b' => 3)

@btime neg!($h, 'b')

Dict{Any,Any} with 1 entry:
"a" => 1

julia> modify!(_ -> nothing, dict, "a")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
julia> modify!(_ -> nothing, dict, "a")
julia> modify!(Returns(nothing), dict, "a")

I think that Returns is now the preferred way. Or would this make the example too complicated?

Comment on lines +539 to +540
otherwise `nothing` is passed. If `f` returns `nothing`, corresponding entry in the
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
otherwise `nothing` is passed. If `f` returns `nothing`, corresponding entry in the
dictionary `d` is removed. If `f` returns non-`nothing` value `x`, `key => something(x)`
otherwise `nothing` is passed. If `f` returns `nothing`, the corresponding entry in the
dictionary `d` is removed. If `f` returns a non-`nothing` value `x`, `key => something(x)`

@matthias314
Copy link
Contributor

With this PR the method mergewith!(combine, d1::AbstractDict, d2::AbstractDict) in abstractdict.jl could be replaced by

function mergewith!(combine, d1::AbstractDict{K,V}, d2::AbstractDict) where {K,V}
    for (k, v2) in d2
        modify!(d1, k) do v1
            v1 === nothing ? Some{V}(v2) : Some{V}(combine(something(v1), v2))
        end
    end
    d1
end

and the dedicated method in dict.jl could be deleted. This would be cleaner and (in my tests) equally fast.

@LilithHafner
Copy link
Member

@tkf, are you still interested in pushing this through? Is there anything you need to make this happen?

@oscardssmith
Copy link
Member

I think this probably has bitrotted enough that we should just remake the PR.

@stevengj
Copy link
Member

stevengj commented Apr 25, 2023

An alternate API I suggested in #31199 (comment) doesn't support deletion, but simply extends get! with an additional optional argument, which has the advantage of being discoverable as another method of get! and not forcing us to bikeshed yet another name.

@matthias314
Copy link
Contributor

I'm not sure if modify! is the best name, but I would like to keep the possibility of deleting an entry. The application I have in mind is dictionaries with default values. If the values are numeric (with default value zero), then I would like to have an efficient implementation of h[key] += val that deletes the entry if h[key] + val is zero. Using the syntax h[key] += val would of course require changes to the parser, but the essence is a documented method in Base to hook into.

BTW, I think I have solved the problem with the spurious allocations that I mentioned in a previous post: If one changes modify! to return a Tuple instead of a Pair, then the allocations disappear. Apparently, Julia is better at union-splitting tuples than pairs.

@stevengj
Copy link
Member

stevengj commented Apr 25, 2023

We could still do deletions with get!(f, collection, key, default) if f(collection[key]) returns nothing, I suppose. But perhaps modify! starts to be a better name in this case. I dunno.

@matthias314
Copy link
Contributor

I don't know a use case where the old value matters, and I wouldn't mind about using nothing as a deletion request. But maybe other people see it differently. On the other hand, if one changes the return type of modify! to Tuple, then there doesn't seem to be a performance penalty for returning the old value. For instance,

function setindex!(h::Dict, v, key)
    modify!(Returns(Some(v)), h, key)
    h
end

would be as fast as the current implementation, and so would be the mergewith! code that I posted earlier.

Regarding the name, another option would be something like update!, inspired by UpdateIndex.jl. I don't have a favorite.

@matthias314
Copy link
Contributor

matthias314 commented Apr 26, 2023

There have been several proposals already, but after thinking about it for some time, I would like to add another one. My apologies if it's not new. When I speak of the current modify!, I include the change of return type from Pair to Tuple I mentioned earlier.

The problem

With the current version of modify!(f, h, key), the function f cannot return a value and request deletion of the key at the same time. Let me explain why I think this matters. As before, I'm thinking of dictionaries with default values, in particular dictionaries with numeric values and default value zero. I would like have an efficient implementation for updating values as in h[key] += val. With the current modify!, one would say:

function addindex!(h::Dict{K,V}, key, w) where {K,V}
    _, vnew = modify!(h, key) do v
        if v === nothing
            iszero(w) ? nothing : w
        else
            u = something(v)+w
            iszero(u) ? nothing : u
        end
    end
    vnew === nothing ? zero(V) : vnew
end

I see two shortcomings of this approach:

  • If h[key] + val is zero, then the inner function returns nothing. Consequently, addindex! receives nothing for vnew and therefore has to recreate the zero value. For integers this is not a problem, but for instance BigInt(0) allocates. If the zero value h[key] + val were passed to the outer function, then this allocation would not be necessary.
  • Ideally, I would like the statement h[key] += val to be parsed in a way that automatically calls addindex!. This is not part of this PR, but I think we should leave the door open for such a change. Now in Julia, the return type of h[key] += val is that of h[key] + val. However, if this value is zero, then addindex! always returns zero(V), no matter what the type of val is.

My proposal

I want to propose the following for modify!(f, key, val): The function f is called as before, but it returns a tuple (action, vnew). (I'm ignoring the old value, but it could be added.) Here vnew is again the new value. In cases where there may be no new value, it would be either nothing or Some(newval). The value of action can be either "set", "delete" or "ignore" and determines how modify! changes the dictionary: it sets h[key] to vnew, deletes the key or doesn't change h at all.
For efficiency one could define three new types DictSet, DictDelete and DictIgnore with common supertype DictAction, which would be the required type for action. This way the compiler might be able optimize the code if f never returns one or two of the possible values. (If one day we have compile-time information about the return type, we could even avoid calling ht_keyindex2_shorthash! if f never returns DictSet.)

I've played around with the code. It seems that the new proposed_modify! function is about equally fast (sometimes even slightly faster) than the current modify!.

Code
using BenchmarkTools

import Base: _delete!, _setindex!, ht_keyindex2_shorthash!

function modify!(f, h::Dict{K, V}, key0) where {K, V}
    if key0 isa K
        key = key0
    else
        key = convert(K, key0)::K
        if !isequal(key, key0)
            throw(ArgumentError("$(limitrepr(key0)) is not a valid key for type $K"))
        end
    end

    index, sh = ht_keyindex2_shorthash!(h, key)

    age0 = h.age
    
    vold = index > 0 ? Some(@inbounds h.vals[index]) : nothing
    vnew = f(vold)
    
    if h.age != age0
        index, sh = ht_keyindex2_shorthash!(h, key)
    end

    if vnew === nothing
        if index > 0
            _delete!(h, index)
        end
    elseif index > 0
        h.age += 1
        @inbounds h.keys[index] = key
        @inbounds h.vals[index] = something(vnew)
    else
        @inbounds _setindex!(h, something(vnew), key, -index, sh)
    end
    
    return (vold, vnew)
end

abstract type DictAction end
struct DictSet <: DictAction end
struct DictDelete <: DictAction end
struct DictIgnore <: DictAction end

function proposed_modify!(f, h::Dict{K, V}, key0) where {K, V}
    if key0 isa K
        key = key0
    else
        key = convert(K, key0)::K
        if !isequal(key, key0)
            throw(ArgumentError("$(limitrepr(key0)) is not a valid key for type $K"))
        end
    end

    index, sh = ht_keyindex2_shorthash!(h, key)

    age0 = h.age
    
    vold = index > 0 ? Some(@inbounds h.vals[index]) : nothing
    action::DictAction, vnew = f(vold)

    if h.age != age0
        index, sh = ht_keyindex2_shorthash!(h, key)
    end

    if action isa DictSet
        if index > 0
            h.age += 1
            @inbounds h.keys[index] = key
            @inbounds h.vals[index] = vnew
        else
            @inbounds _setindex!(h, vnew, key, -index, sh)
        end
    elseif action isa DictDelete
        if index > 0
            _delete!(h, index)
        end
    end
    
    return vnew
end

# addindex! example

# current Julia

function current_addindex!(h::Dict, key, w)
    u = haskey(h, key) ? h[key] + w : w
    if iszero(u)
        delete!(h, key)
    else
        h[key] = u
    end
    u
end

# current modify!

function modify_addindex!(h::Dict{K,V}, key, w) where {K,V}
    _, vnew = modify!(h, key) do v
        if v === nothing
            iszero(w) ? nothing : w
        else
            u = something(v)+w
            iszero(u) ? nothing : u
        end
    end
    vnew === nothing ? zero(V) : vnew
end

# new proposal

function proposed_addindex!(h::Dict, key, w)
    proposed_modify!(h, key) do v
        if v === nothing
            iszero(w) ? DictDelete() : DictSet(), w
            # DictDelete() is faster than DictIgnore()
        else
            u = something(v)+w
            iszero(u) ? DictDelete() : DictSet(), u
        end
    end
end

# test

K = Int
T = Int
h = Dict{K,T}(k => k for k in 1:100000)

println("non-existing key, non-zero value")
key = K(-1)
w = T(5)
# note that @btime evaluates several times in a row
# after the first time the key exists!
@btime current_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime modify_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime proposed_addindex!(hh, $key, $w) setup = (hh = copy($h))

println("non-existing key, zero value")
key = K(-1)
w = T(0)
@btime current_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime modify_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime proposed_addindex!(hh, $key, $w) setup = (hh = copy($h))

println("existing key, non-zero value")
key = K(55555)
w = T(10)
@btime current_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime modify_addindex!(hh, $key, $w) setup = (hh = copy($h))
@btime proposed_addindex!(hh, $key, $w) setup = (hh = copy($h))

Any thoughts?

@martinholters
Copy link
Member

Instead of a tuple, f could return DictSet(vnew), DictDelete(), or DictIgnore(), i.e. DictSet could carry the new value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.