Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the return type of Enumerable#sum/product for union elements #15314

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rvprasad
Copy link

Using the first type of a union type as the type of the result of Enumerable#sum/product() call can cause runtime failures, e.g. [1, 10000000000_u64].sum/product will result in an OverflowError. A safer alternative is to flag/disallow the use of union types with Enumerable#sum/product() and recommend the use of Enumerable#sum/product(initial) with an initial value of the expected type of the sum/product call.

@Blacksmoke16 Blacksmoke16 added breaking-change topic:stdlib:numeric kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Dec 26, 2024
@Blacksmoke16 Blacksmoke16 linked an issue Dec 26, 2024 that may be closed by this pull request
@rvprasad
Copy link
Author

This PR expands on the (accidentally closed) PR 15308.

Tagging @Blacksmoke16, @straight-shoota, and @BlobCodes to continue the discussion here.

Using the first type of a union type as the type of the result of
`Enumerable#sum/product()` call can cause runtime failures, e.g. `[1,
10000000000_u64].sum/product` will result in an ``OverflowError``. A
safer alternative is to flag/disallow the use of union types with
`Enumerable#sum/product()` and recommend the use of
`Enumerable#sum/product(initial)` with an initial value of the
expected type of the sum/product call.
Comment on lines 2302 to 2304
raise("Enumerable#sum/product() does support Union types. " +
"Instead, use Enumerable#sum/product(initial) with an " +
"initial value of the expected type of the sum/product call.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Polish the error message:

Suggested change
raise("Enumerable#sum/product() does support Union types. " +
"Instead, use Enumerable#sum/product(initial) with an " +
"initial value of the expected type of the sum/product call.")
raise("`Enumerable#sum` and `#product` cannot determine the initial value from a union type. " +
"Please pass an initial value of the intended type to the call.")

Copy link
Author

@rvprasad rvprasad Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading your fix, I realized a missing negation in the first sentence of error message I committed :)

Also, how about we update the error message as follows?

raise("Enumerable#sum` and #product do not support union types. Instead, use Enumerable#sum/product(initial) with an initial value of the expected type of the sum/product call.")

Following is my reasoning for the above change.

  1. The existing logic fails to identify an appropriate/safe type that can hold the final value of the call.
  2. Using the identity is a way of determining the appropriate/safe type. Since it is an implementation detail, I think surface such details in documentation while mentioning what (not how) is wrong and its fix in the error message may be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do not support union types" isn't entirely accurate. They do support them. Just without automatically determining the return type, hence the need to specify explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, "sum()" and "product()" will not support union types (but "sum(initial)" and "product(initial)" will support union types). So, do you mean that we should be precise in the error message and the doc by referring to the no-arg variant of the methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could differentiate between the different overloads by signature.

Enumerable#sum() and #product() do not support union types.`

It's a bit of a subtle detail though. Hence my suggestion to write out the issue explicitly:

Enumerable#sum and #product cannot determine the initial value from a union type.

I'm fine either way though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR such that

  1. error messages and doc refer to specific overloads and
  2. error messages focus only on the what is wrong and how to fix it.

The error message and documentation specifically refers to the variants
of Enumerable#sum/product that do not support union types.
Comment on lines 2302 to 2303
raise("Enumerable#sum() and #product() do not support Union types. " +
"Instead, use Enumerable#sum(initial) and #product(initial), " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic suggestion:

Suggested change
raise("Enumerable#sum() and #product() do not support Union types. " +
"Instead, use Enumerable#sum(initial) and #product(initial), " +
raise("`Enumerable#sum` and `#product` do not support Union types. " +
"Instead, use `Enumerable#sum(initial)` and `#product(initial)`, " +

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this change.

Out of curiosity, assuming the back ticks are Markdown's code syntax, what is the effect of including them in in error messages (which I assume are processed and rendered as plain text)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my eyes it makes them more standout and a clear sign we're referencing the method/type definitions, as in the other, doc-related places.

@Sija
Copy link
Contributor

Sija commented Dec 28, 2024

This change is rather ill-placed, given it modifies the internal Enumerable::Reflect(X).first method.

@rvprasad
Copy link
Author

@Sija can you please elaborate on "this change is rather ill-placed'?

@Sija
Copy link
Contributor

Sija commented Dec 29, 2024

@rvprasad As I've already mentioned, this change modifies the internal Enumerable::Reflect(X).first method, defying its sole purpose, since now it doesn't return the first element of the type X anymore.

@rvprasad
Copy link
Author

@Sija I see what you mean. Here are a few thoughts and observations.

  1. I agree that the member named first becomes redundant with this change. So, we could rename it (e.g., as type) and update the comments in the source.
  2. As is, Reflect seems to be a common reusable unit of code that is used at different program points to check the availability of identity . So, we might not want to do away with Reflect. However, since the type argument to Reflect is always the type parameter of the enclosing enumerable, we could replace Reflect struct by a class method of Enumerable but this class method will have to have public visibility, which will be undesirable.
  3. Looking at sum(&: T -> _) and product(&: T -> _), they fail with OverFlowError when the block argument yields values of type different from the element type of the enumerable, e.g., puts [1, 2].sum {|i| i * 10000000000_u64} and puts [1_u64, 2].sum {|i| i * 10000000000_u64} fails but puts [1_u64, 2_u64].sum {|i| i * 10000000000_u64} works fine. I think the cause is that the type parameter of the enclosing enumerable is provided as the type argument to Reflect instead of the type of the value yielded by the block argument. We can solve this issue by using the return type of the block argument or by removing the block variants of sum and product that do not accept initial values.
  4. While all of the below tests fail when I execute make std_spec, the backtrace in the failing product tests does not have a line from enumerable_spec.cr, which seems strange.
    it { [1, 2].sum { |x| x * 20000000000_u64 }.should eq(60000000000_u64) }
    it { [1, 2].sum(1) { |x| x * 20000000000_u64 }.should eq(60000000001_u64) }

    it { [1, 2].product { |x| x * 100000_u64 }.should eq(20000000000_u64) }
    it { [1, 2].product(1) { |x| x * 100000_u64 }.should eq(20000000000_u64) }

While I realize surfacing more issues is not ideal, I hope they can help clear up a small corner of the language.

@straight-shoota
Copy link
Member

I don't think this change is ill-placed. The purpose of Reflect.first is to resolve a union type to a single type for #sum and #product. Of course that isn't the case anymore, so we may want to rename the method. And of course, the comment needs to be updated.
Reflect is a reusable component, but it's only used internally for this specific use case. There's no sense in preserving its original behaviour. If that's no longer used, the type is obsolete.
That could actually be an alternative, to drop Reflect entirely. We would end up calling .multiplicative_identity directly on the element type. In case of a union type, that would result in an error message about the missing method.

I believe the approach taken here to handle this case explicitly with a custom error message is better. The specific implementation such as the method name of the internal helper can be adjusted easily.

Any issue with the yielding overloads sum(&: T -> _) and product(&: T -> _) should be discussed separately. But I believe they are entirely correct though. The overflow error comes from the multiplication in the block in your test code (e.g. 1 * 10000000000_u64, the result is Int32) not the aggregation method itself. It works perfectly fine when you adjust the product type to Int64: [1, 2].sum { |i| 1000000000_i64 * i } # => 3000000000.

With this PR, Reflect is not used to determine the first type of a union type.
This commit aligns Reflect's method and related comments with the class'
purpose.
@rvprasad
Copy link
Author

rvprasad commented Dec 31, 2024

I have pushed a revision that updates the implementation and comments in Reflect.

Regarding the yielding overloads, I did not realize that, during type coercion, Crystal uses the type of expression based on the type of the first sub-expression of the expression, i.e., typeof(1 * 1u64) is Int32 while typeof(1u64 * 1) is UInt64.

On a related note, I believe typeof(yield Enumerable.element_type(self)) will return the element type of the Enumerable and not the type of valued yielded by the block. Is this correct? If so, why can't we instead use the type parameter T of the Enumerable?

Furthermore, when I tried to use T instead of Reflect(T), I encountered the following errors.

  1. Tuple(String, String, String)#sum() at string.cr@5717 (string interpolation) failed as additive_identity was undefined for String.class.
  2. Compiling Array(Array(Char) | Array(Int32))#product() at indexable.cr@226 (cartesian product) failed as multiplicative_identity was undefined for (Array(Char) | Array(Int32)).class.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

With regards to typeof(yield Enumerable.element_type(self)), I think the use of yield executes the block and gets the type of the returned value. If so, I get it now.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

While Reflect can be extended to flag heterogeneous tuples, {"a", "b", 3}.sum(&.to_s) fails. Interestingly, this code snippet fails with v1.14.0 with the error message Error: undefined method 'zero' for String.class.

Also, this observation is true of heterogeneous arrays, i.e., arrays whose element type is a union type.

Edit1: I have captured this observation in issue #15317.

@rvprasad
Copy link
Author

rvprasad commented Jan 1, 2025

Assuming we can pursue other issues in separate dedicated tickets, are there any objections to the proposed change to address the primary issue captured in this ticket?

@rvprasad
Copy link
Author

rvprasad commented Jan 2, 2025

@straight-shoota thanks for the approval!

Do I need to do anything more for these changes to be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return type of Enumerable#sum and #product for union elements
4 participants