Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash#put and Hash#replace #8660

Closed
straight-shoota opened this issue Jan 7, 2020 · 14 comments · Fixed by #13590
Closed

Hash#put and Hash#replace #8660

straight-shoota opened this issue Jan 7, 2020 · 14 comments · Fixed by #13590

Comments

@straight-shoota
Copy link
Member

straight-shoota commented Jan 7, 2020

#8116 added Hash#put which sets the value, returns either the previous value (if exists) or the block result (invoked with the key).

This behaviour seems counter-intuitive as expressed in #8116 (comment) f.

I'm proposing:

  • Rename #put to #replace keeping the same semantics.
  • Add an overload to #replace without a block which returns nil if the value doesn't exit.
  • Add a new #put method which sets the value only if it does not exits.

Maybe there could also be an overload to #put which accepts a block. The block would be called when the value exists and receive the value. This would be an efficient method for the use case when a value needs to be either added or the existing value updated. For example when a hash collects multiple values per key in an array:

# currently:
hash[key] ||= [] of Foo
hash[key] << value

# more efficient: (thanks @bcardiff)
array = (hash[key] ||= [] of Foo)
array << value

# with #put:
hash.put(key, [value]) { |array| array << value }
@KarthikMAM
Copy link
Contributor

KarthikMAM commented Jan 9, 2020

@straight-shoota Mind if I take a stab at this (if the feature is finalized) and is there any ETA on this feature.

@straight-shoota
Copy link
Member Author

I'd wait for some more approval, maybe there'll be suggestions to change the proposed API.

@RX14
Copy link
Contributor

RX14 commented Jan 9, 2020

I'm fine with this change.

@bcardiff
Copy link
Member

bcardiff commented Jan 9, 2020

I am not convinced

  • Rename #put to #replace keeping the same semantics.

But the current semantics of #put allow us to set the value, even if the key wasn't set. Having a #replace with the current semantic is confusing to me

  • Add a new #put method which sets the value only if it does not exits.

That sounds to me more like an explicit replacement (if there is something to replace, do it).

Finally, the use case in the OC seems more like a vivification (Ref: #4376 (comment))

Or it can be coded in a way that there is no double mention of the key per initialize & append:

hash = {} of Int32 => Array(String)
 
array = (hash[0] ||= [] of String)
array << "A"
 
array2 = (hash[0] ||= [] of String)
array2 << "B"
 
hash # => {0 => ["A", "B"]}

@straight-shoota
Copy link
Member Author

Yeah, maybe the names are not ideal. But the current semantics for #put are not great either. The block behaviour seems unexpected.

Agreed, the example for what currently works could also be written more efficiently, I've added your suggestion to the OC.
But this is still fundamentally different from the behaviour my propsal provides where the block is only executed when a value exists. There's no way to distinguish this only with conditional assignment.
When adding an item to an array there is not much difference between initialize & append vs. direct initialization. Other use cases might have a more relevant performance impact when update is more costly than initialization.
I'm not sure anymore whether it's worth optimizing for that, though. When the overall cost for these operations is high, it might just be preferable to just do separate hash lookups. As a bonus this also allows skipping the potentially unnecessary initialization.

@asterite
Copy link
Member

I don't understand this snippet in the original comment:

# with #put:
hash.put(key, value) { |array| array << value }

That doesn't compile in my head, value is an element or is it an array? It can't be both in those two locations.

I think put is fine. The docs are clear. I don't understand what's the problem. And what you can't do right now with Hash... maybe a method is missing, but put does not need to change because of that.

@straight-shoota
Copy link
Member Author

@asterite Ooops, sry I forgot the brackets, it's supposed to be hash.put(key, [value]) { |array| array << value }. Fixed that in the OP.

The documentation for #put is clear, that's true. But I think it is not intuitive because it combines two different mechanics: While the main purpose is to put a value in the hash, the block argument actually has nothing to do with that. Instead it behaves exactly as the block argument to #fetch.
That seems very confusing to me. Judging by its behaviour, the purpose of this method is more shifted towards fetching a value with a default provided by the block (exactly as #fetch) and in the process also setting a new value.
To illustrate: This is a non-optimized implementation of #put:

def put(key : K, value : V)
  return_value = fetch(key) { yield }
  self[key] = value
  return_value
end

I think my main issue with the method is the mixture of concerns does not fit with the name.

@RX14
Copy link
Contributor

RX14 commented Jan 17, 2020

I'd like something like the below, but obviously only looking up the entry once, to be the building block for Hash lookups, deletions, and inserts.

def lookup(key : K, & : V?, Bool -> V | Hash::Delete)
  if exists?(key)
    new_value = yield self[key], true
  else
    new_value = yield nil, false
  end

  if new_value.is_a? Delete
    delete(key)
  else
    self[key] = new_value
  end
end

This method is basically a sanitized, public, interface to performing an arbitrary option on the Hash::Entry. The second parameter is only needed to distinguish between nil and not present, and could likely be ignored in many external invocations. Hash::Delete is a sigil, like Iterator::Stop.

Everything else can be composed efficiently on top of this primitive:

def [](key)
  lookup(key) do |value|
    return value
  end
end

def []=(key, value)
  lookup(key) { value }
end

def exists?(key)
  lookup(key) do |val, present|
    return present
  end
end

def delete(key)
  lookup(key) { Hash.delete }
end

def put(key, value)
  lookup(key) do |old_value, present|
    if present
      return_value = old_value
    else
      return_value = yield
    end

    value
  end
  return_value
end

This would allow Set to have efficient semantics without put. I actually consider put semantics to be a bit convoluted for the reasons @straight-shoota put above.

@jhass
Copy link
Member

jhass commented Jan 18, 2020

I can agree to put's semantics being a bit odd. But either it has to go completly or keep its alias for []= semantic in all cases, that is always setting the given value to the given key. This is what a method named put on a map does in every single language I've seen that features it. Doing anything else is handing a foot gun to our users.

@HertzDevil
Copy link
Contributor

Array#replace replaces all values in-place, as does Hash#replace on Ruby. It would be weird if ours does a different thing.

@glhrmv
Copy link

glhrmv commented Dec 26, 2021

As a recent user of the language, I was looking for a method that behaves similar to #=, something like #insert(key, value). When i found #put, I was confused for reasons mentioned already in this thread. I think #put without any block argument (i.e. do nothing if value missing) would be useful.

@straight-shoota
Copy link
Member Author

We agree that the initial suggestions form the OP are not good: the semantics of #put are established and we must not change them. #replace is also reserved for different semantics.

I still think we're missing some additional behaviour for specific insertion conditions.

As a very generic example, a[i] ||= b is concise but inefficient. It expands to a[i] = a[i]? || b which performs the lookup twice. It would be great if it could be done in one. This would require a method that looks up a key and assigns the value unless it already exists. Getting that is actually the primary goal of this issue.
We'd have to find a good name for this.

@HertzDevil
Copy link
Contributor

HertzDevil commented Jun 20, 2023

@bcardiff
Copy link
Member

I think put_if_absent is the more clear one. put_new is concise, but it only works if you already know that put does the opposite when the element is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants