-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
julep/rfc: proposal for unified Dict get/insert/has interface #12157
Comments
Is this the one you briefly mentioned on saturday night during the JuliaCon? |
Yep. And yes please (again) ! (btw s/in/haskey/ in your first example) |
Also would this still has benefit over the |
yes. i figured i should finally try to see what it would look like |
It is more general since you can use the hash table before the insertion. Like if isempty(bp)
println("Inserting into ", d)
bp[] = x
end We could also still have the lambda version for compactness I guess. |
(well in this example I guess you could capture |
Good idea! I think my main concern would be allocating an extra object for each dictionary operation. It could get quite expensive. Also unfortunately the code for the common idiom is not shorter. |
Was just going to say that.
I see. (not sure if it is good to bring up but I guess it should be as useful as c++ reference.....) |
Related: #2108 |
I'll point out that @StephenVavasis implemented a (He even started to use |
We might need to be careful about asynchronous updates to the hash table causing it to be resized, especially if we ever get threading. |
I think in the example above There might be one additional race possibility but IMHO it shouldn't need more care than the dict itself. |
Yes, Jameson's proposal handles the single thread rehash case by graceful performance degradation. Multi threaded modification of the same data structure will likely be defined as a race condition => undefined behavior. We'll have to implement a family of concurrent data structures. (the java std lib has state of the art implementation of those things, probably good inspiration). |
Since Kevin pinged me on this topic, I guess I can add my two cents about tokens and Ref. (1) The primary purpose of tokens in my code is to provide a mechanism for stepping through a SortedDict (or SortedSet or SortedMultiSet) in the sort-order of the keys. Obviously, this is not an issue for Dict. (2) A second purpose of a token is to provide a handle to an entry that can be dereferenced in O(1) time. Again, this is not so much an issue for Dict because the hash-lookup is already supposed to be O(1). (3) Using a Ref object to point to a spot in the structure where an item would go if it would be inserted later seems like a reasonable idea; as mentioned earlier, there is a performance hit because the Ref object is heap-allocated. However, I am thinking of changing my balanced-tree implementation so that the most recent look-up is cached because often the next lookup is nearby. Couldn't Dict do the same with caching? (4) Going off on a tangent, I have an issue with the equivalence between haskey(d,k) and in(k, keys(d)) ... see my recent posting here: JuliaCollections/DataStructures.jl#103 (5) Finally, with regard to the problem of the awkward if-block versus the need to evaluate 'initial_value' in a get! function, wouldn't it be OK to use a macro and to ask anyone who writes an Associative type to support a macro version of get! that solves this problem? |
I don't really like the syntax much. I've proposed adding a syntax for |
I have an alternate proposed solution to this which I (unwittingly) opened in a separate ticket #18282. TLDR: What about building a function that handles the "update this value in a dictonary" case, as follows: update!(f::Function, a::Associative, key, default) = (a[key] = f(get(a, key, default))) (except that it would be specialized to avoid recomputation of the hash/slot.) This would allow code like: update!(x -> x+1, my_counters, key, 0) # where 0 is the default value for a counter, or
update!(x -> x+value, my_totals, key, 0.0) # where value is accumulated |
Sounds like #15630 |
@KristofferC Pardon my ignorance. Do LinearSlow AbstractArrays and Associatives share a common interface / behavior pattern? EDIT: I think I understand a bit better now from seeing UpdateIndex.jl |
It is similar since you want to "hold on" to the result of the lookup thas has to be made when indexing. |
@KristofferC Thanks for sharing these links. FWIW, I agree that Julia would benefit from a way to directly overload the various ?= operators. (Python has this capability, and numpy leverages it heavily. Why would Julia want to skip over this functionality?) However, I suppose that discussion belongs on #15630. I think the dictionary case differs slightly because, unlike arrays, dictionaries are not guaranteed to have values over any particular range of keys, and there is no default value that can be assumed for missing keys, at least not in the general case. In your proposal: updateindex!(A, op, s, i...)
Could these be combined into a single function, updateindex!(f, A, i...)
update!(f, dict, key, default) Not sure how to deal with the fact that arrays need multiple indices and dicts have just one. Can the array indices be grouped into a tuple, or will this create overhead? |
In general, creating the new entry (i.e. what is called update!(op, mkdefault, coll, ids...) where both |
@eschnett That's a good point.
Current proposal would look like this: update!(init_counter, counters, key) x do
x + 1
end What about moving the default function to the end? update!(counters, key, init_counter) x do
x + 1
end Edit: I like the way Edit 2: Could we support both cases where the default is an object or a function? i.e. update!(f::Function, coll::Associative, key, default)
update!(f::Function, coll::Associative, key, default::Function) ...or is that an abuse of overloading? |
For a dict holding functions as values, that might give surprising/undesired behavior. |
@martinholters Thank you for speaking up, I was looking for someone who might disagree. Does that use case compel us to move the default function to another argument position? (arguably less clear and less consistent?) Of course, if someone was keeping a dict of functions, they could still write: update!(mydict, key, () -> my_default_value_which_is_a_function) do x
decorate_function(x)
end |
Hi Kristofer, are there any follow-up improvements on this ? I am hitting a performance barrier now. The bottleneck is to count the number of occurrence of various patterns in a "string". I used a Dict{pattern_type,Int} to do the job, and I wish to make it faster. What is the state-of-art technique for that ? Otherwise I'll just patch my code with this one. |
problem statement:
a common idiom when working with Associative objects is to do a set of has / insert / lookup operations:
this syntax has the advantage of only using basic operations, for clarity, but it's also 3x slower than necessary since the dict lookup gets repeated three times. that concern has lead to the introduction of a
get!
method that accepts either an object or function, to insert a value if the lookup fails, so that the above code can be rewritten as:or
however, this implementation is still not necessarily faster, since it involves precomputing initial_value (fine it it's just a constant, but bad if it expensive to compute or creates unnecessary garbage) or invoking a lambda.
the code to support this one code pattern isn't exactly short, requires a fair amount of code duplication, and even a macro definition is provided (https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L632-L690)
proposal
define the
Ref
operation on a Dict to return an intelligent indexer object. the above code pattern could then be written as:implementation sketch
ref #12035 (comment)
The text was updated successfully, but these errors were encountered: