Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hash#put_if_absent #13590

Merged

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Jun 21, 2023

Resolves #8660.

This is faster than hash[key] ||= value for two reasons:

  • If the given key cannot be found, Hash#insert_new will not attempt to match the key against any existing keys at all. (This suggests that find_entry_with_index followed by set_entry or insert_new might be faster than upsert too, which makes #upsert completely unnecessary, but I haven't verified that.)
  • A missing key goes directly to insert_new, whereas ||= means any existing value must be upcast to a nilable value and evaluated for truthiness first.

Instead of #insert_new, perhaps a copy of #upsert that traverses existing keys but doesn't update them could be defined?

Hash-like types such as ENV.class and URI::Params do not have this. For those types it might be better if a generic implementation is offered via #10886.

@kostya
Copy link
Contributor

kostya commented Jun 21, 2023

is not put doing the same?

@HertzDevil
Copy link
Contributor Author

#put always updates the given key like #[]=, only that it returns a different thing

@@ -8,8 +8,7 @@ module Spec
def self.read_line(file, line)
return nil unless File.file?(file)

lines = lines_cache[file] ||= File.read_lines(file)
lines[line - 1]?
lines_cache.put_if_absent(file) { File.read_lines(file) }[line - 1]?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local variable makes it more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this?

Suggested change
lines_cache.put_if_absent(file) { File.read_lines(file) }[line - 1]?
lines = lines_cache.put_if_absent(file) { File.read_lines(file) }
lines[line - 1]?

Then I agree =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once could argue that the .put_if_absent(file) { File.read_lines(file) } << item constructs are bit overly complicated as well.

@asterite
Copy link
Member

Too many API options. Why not keep it simple? What's a case where the existing way is too slow?

@beta-ziliani
Copy link
Member

A little benchmark shows some non-negligible improvement when keys are absent:

          ||=   1.88M (531.95ns) (± 2.12%)  448B/op   1.22× slower
put_if_absent   2.29M (436.61ns) (± 1.55%)  448B/op        fastest

When keys aren't absent, then it's negligible.

@straight-shoota
Copy link
Member

The performance impact depends on the hash complexity. But calculating the hash only once instead of twice is always a good improvement, even for simple keys.

As #8660 (comment) shows, this is a very common API method for hash map implementations. Maybe not as concise as in Crystal but surely these languages have other ways to implement that with an added overhead of double hash calculation.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the above code suggestion, but I'm happy to merge with or without it.

@straight-shoota straight-shoota added this to the 1.9.0 milestone Jun 26, 2023
@asterite
Copy link
Member

Why is there no equivalent API in Ruby, I wonder?

@straight-shoota
Copy link
Member

straight-shoota commented Jun 27, 2023

I can only speculate on that... Ruby's Hash API generally seems to be a bit bare. I believe methods like #transform_values and #transform_keys are even relatively recent additions (in Ruby's timeline).

Anyway, regardless what Ruby is doing, this method seems to be very reasonable to me. Its optimization effect is basically a no-brainer.
Sure, it adds another tool in the toolcase that you have to know about. But the name is pretty self-explanatory.
I don't think this is bloating the API with unneccessary stuff. It's a useful enhancement.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Jun 28, 2023

It is indeed proposed to Ruby in relatively recent times, called #fetch_set: https://bugs.ruby-lang.org/issues/17342

@straight-shoota straight-shoota modified the milestones: 1.9.0, 1.10.0 Jul 2, 2023
@straight-shoota straight-shoota merged commit c3bf074 into crystal-lang:master Jul 21, 2023
49 of 50 checks passed
@HertzDevil HertzDevil deleted the feature/hash-put_if_absent branch July 24, 2023 11:50
Blacksmoke16 pushed a commit to Blacksmoke16/crystal that referenced this pull request Dec 11, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hash#put and Hash#replace
6 participants