Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub/superset checking methods to Hash #7500

Merged

Conversation

Sija
Copy link
Contributor

@Sija Sija commented Mar 3, 2019

Adds sub/superset checking methods to the Hash class (matching the ones in Set):

  • Hash#subset_of?(other : Hash)
  • Hash#proper_subset_of?(other : Hash)
  • Hash#superset_of?(other : Hash)
  • Hash#proper_superset_of?(other : Hash)

spec/std/hash_spec.cr Outdated Show resolved Hide resolved
@Sija
Copy link
Contributor Author

Sija commented Mar 6, 2019

Is it GTG / anything else to do here? :shipit:

src/hash.cr Outdated Show resolved Hide resolved
src/hash.cr Outdated Show resolved Hide resolved
src/hash.cr Outdated Show resolved Hide resolved
spec/std/hash_spec.cr Show resolved Hide resolved
@Sija Sija force-pushed the feature/hash-sub-superset-operators branch 2 times, most recently from aa4beb4 to de59f20 Compare March 6, 2019 20:46
@straight-shoota
Copy link
Member

I have some doubts about this change. These methods (<,<=,>,>=) are usually used as comparison operators. Is it wise to overload them with a completely different meaning?
In math, set relationships are expressed with ⊂ and ⊃, whereas < and > are comparison operators.

Set relationship methods are also relevant for other collection types besides Hash. Maybe even every Enumerable? I could very well picture this for Array, but the thing is that that's also a Comparable. And we can't use the comparison operators simultaneously for both.

@Sija
Copy link
Contributor Author

Sija commented Apr 1, 2019

I have some doubts about this change. These methods (<,<=,>,>=) are usually used as comparison operators. Is it wise to overload them with a completely different meaning?
In math, set relationships are expressed with ⊂ and ⊃, whereas < and > are comparison operators.

Why are you doubting? ;) These operators are usually overloaded based on the context, thus I see nuffin' wrong in using them according to the most common scenario - they're being overloaded for Class to check for ancestry for instance (same as in Ruby btw). And I doubt we'd like to use / as method names, we ain't writing math here. Having those available in specs is also a boon I'd say.

Set relationship methods are also relevant for other collection types besides Hash. Maybe even every Enumerable? I could very well picture this for Array, but the thing is that that's also a Comparable. And we can't use the comparison operators simultaneously for both.

They're most needed IMO in places where you'd expect overlapping set of values, but only for types holding distinct value sets, like Set or Hash.

@straight-shoota
Copy link
Member

I don't think the usefulness of {sub,super}set is limited to types which exclude duplicate values by design. You might as well use an Array to hold distinct values, which is totally fine if the algorithm ensures there won't be duplicates. That would likely be much more performant than using a Set.

@Sija
Copy link
Contributor Author

Sija commented Apr 7, 2019

You might as well use an Array to hold distinct values, which is totally fine if the algorithm ensures there won't be duplicates. That would likely be much more performant than using a Set.

I don't get it, Array is not designed to hold unique values, Set OTOH is, with this exact use-case in mind.

@straight-shoota
Copy link
Member

Set ensures by its design that there can't be duplicate values. Array doesn't have this inherent property, but it can nevertheless be used to store distinct values. The uniqueness guarantee can be provided by how it's used. A very simple example for this would be arr << elem unless array.index?(elem).
Using an Array for this can be preferable over Set for example for performance or other characteristics such as sortability.

@Sija
Copy link
Contributor Author

Sija commented Apr 8, 2019

Array doesn't have this inherent property, but it can nevertheless be used to store distinct values. The uniqueness guarantee can be provided by how it's used. A very simple example for this would be arr << elem unless array.index?(elem).

Yeah, but why would you do that given that there's Set?

Using an Array for this can be preferable over Set for example for performance or other characteristics such as sortability.

I'd say it's pretty weird to use Array over Set for the exact use-case Set is designed for. If performance is sub-par it should be improved if anything. Sortability for the time being is the same, since Set uses Hash under-the-hood - which is ordered.

@straight-shoota
Copy link
Member

A Set is per definition unordered and shouldn't be treated as ordered. Anyway, it's always insertion ordered. An array can be sorted.

But I don't think this discussion about unrelated details leads us much further.

IMHO we simply shouldn't apply different semantics to comparison operators on different collection types.
Types like Set or Hash (or think of a potential OrderedSet for example) could as well be comparable and provide sub/superset operations at the same time.

@straight-shoota
Copy link
Member

We should just assign predicate names #superset?, #proper_superset?, #subset?, #proper_subset? which are already used by Set.

@Sija
Copy link
Contributor Author

Sija commented Nov 27, 2019

@straight-shoota I'd like to have ability to use it in specs with .should be > {"foo" => "bar"} notation, which would be impossible if we'd choose to make 'em a regular methods.

@straight-shoota
Copy link
Member

That's a valid point. But I wouldn't want to risk conflicts in API methods just for improving spec usage.

Anyway, I've been thinking about improving the expectations API to allow making a matcher from arbitrary methods, not just comparators. This would be helpful for many other use cases as well. For example Set{1, 2, 3}.should &.superset?(Set{1, 2}) could work. But it's really tricky and I'm not sure if there can even be an easy-to-use generic solution.

@asterite
Copy link
Member

It seems this is actually pretty easy to implement:

require "spec"

module Spec
  module ObjectExtensions
    def should_be(file = __FILE__, line = __LINE__)
      value = yield self
      unless value
        fail("Assertion failed", file, line)
      end
    end
  end
end

describe Array do
  it "is empty" do
    a = [1, 2, 3]
    a.should_be &.empty?
  end
end

https://play.crystal-lang.org/#/r/830q

I think a.should_be &.empty? is a bit more readable than a.should &.empty?, but both could work. Then we should have a.should_not_be &.empty?.

The only problem I see with this is that it can be confused with a.should be (note the space). I don't think there's a way to implement it with a space.

Another problem is that we can't print a friendly error message, like "expected to be empty", but I don't think that's a big problem because the line where the error happened is still shown so one can deduce what the failure is.

An advantage of using should is that one could theoretically do a.should { |obj| some_assertion_using obj } but I would expect this new feature to be used mostly, or only, with straight bang methods.

Thoughts?

This could also be discussed in a separate issue, but I think this would be fantastic to have. Writing a.should_be &.empty? is so much easier and readable than a.empty?.should be_true.

@straight-shoota
Copy link
Member

I've experimented with that used an expectation type wrapping a proc. It works, but it's not as nice. Didn't figure that the block could just be executed directly and you don't need to worry about proc generics 🤦‍♂

should_be could technically be a macro and capture the block's source for display in the failure message.

@asterite
Copy link
Member

I've experimented with that used an expectation type wrapping a proc. It works, but it's not as nice. Didn't figure that the block could just be executed directly and you don't need to worry about proc generics 🤦‍♂

I started writing it like that too, and later realized it could be executed directly! 😅

should_be could technically be a macro and capture the block's source for display in the failure message.

I'm not sure how, macros don't work as instance methods. Meaning that a.should_be can never me a macro call, only should_be or Foo.should_be.

Well, we could think about what an instance macro is, but it's an entirely different, bigger subject.

@asterite
Copy link
Member

I wanted to implement this but in some spec files we have:

File.exists?(some_file).should be_true

With this new feature it would be written like this:

File.should_be &.exists?(some_file)

Shorter, but uglier.

Maybe we could name it assert? So we'd have:

a.assert &.empty?
File.assert &.exists?(...)
a.assert &.includes?(3)
ch.assert &.closed?
reader.assert &.has_next?
'a'.assert &.lowercase?

I think everything reads fine with assert.

@asterite
Copy link
Member

Then also add .refute. Plus the confusion with .should be is gone.

@straight-shoota
Copy link
Member

Let's discuss this in a dedicated issue.

@Sija
Copy link
Contributor Author

Sija commented Nov 22, 2020

I'd like to revisit this PR, as its main purpose is to add support for the sub/superset checking to Hash - either via overloading equality operators or dedicated methods. fwiw, Ruby uses equality operators for this.

Could we settle on some course of action?

I see couple of options:

  1. Use equality operators (i.e. leave the PR as-is)
  2. Same as above + do the same for Set
  3. Use #sub/superset? and #proper_sub/superset?
  4. Drop this PR altogether

Any thoughts?

@straight-shoota
Copy link
Member

I still wouldn't want to compromise API integrity, so number 3.

@Sija Sija force-pushed the feature/hash-sub-superset-operators branch from de59f20 to 955fae9 Compare December 3, 2020 13:58
@Sija
Copy link
Contributor Author

Sija commented Dec 3, 2020

@straight-shoota I've renamed the methods and now they're 1:1 with the ones found in Set.

@Sija Sija changed the title Implement Hash#{<,<=,>,>=} operators for sub/superset checking Add sub/superset checking methods to Hash Dec 3, 2020
@straight-shoota
Copy link
Member

I just realized these methods apply in exactly the opposite direction to what I might expect.
I would probably read a predicate method subset?(b) as "is b a subset? (of the receiver)", i.e. b ⊂ a But instead a.subset?(b) returns true if a ⊂ b.

These semantics are not new. They are the same as in Set and based on Ruby's. I suspect they have it that way to have the same operand relation as with the comparison operators. But it's really confusing. When reading the code, you can't beyond doubt recognize which way it is.

So maybe we should rather introduce these methods here under a differnt, less ambiguous name. And rename those in Set.
Adding a simple _of suffix would fix this. a.subset_of?(b) clearly means a ⊂ b.

@Sija
Copy link
Contributor Author

Sija commented Dec 3, 2020

@straight-shoota Makes sense but I'd start with renaming (and deprecating) methods in Set in a separate PR, followed by changes in here.

@straight-shoota
Copy link
Member

I've just opened a separate issue for this: #10020 And ideally we'd agree on that, then change this PR + add one for Set.

@Sija Sija force-pushed the feature/hash-sub-superset-operators branch from 955fae9 to 465e548 Compare January 18, 2021 12:20
src/hash.cr Outdated Show resolved Hide resolved
@Sija Sija force-pushed the feature/hash-sub-superset-operators branch from 398b611 to 5d8c8c6 Compare January 18, 2021 13:33
@Sija
Copy link
Contributor Author

Sija commented Feb 12, 2021

Can this get a 2nd approval?

@Sija
Copy link
Contributor Author

Sija commented Apr 10, 2021

🏓 /cc @sdogruyol

Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Sija 🙏

@straight-shoota straight-shoota added this to the 1.1.0 milestone Apr 13, 2021
@straight-shoota straight-shoota merged commit 9105f58 into crystal-lang:master Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants