Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does collect always return an Array or can it return AbstractArrays? #50051

Open
FelixBenning opened this issue Jun 3, 2023 · 6 comments
Open
Labels
docs This change adds or pertains to documentation

Comments

@FelixBenning
Copy link

The documentation says Array

help?> collect

...
  collect(collection)

  Return an Array of all items in a collection or iterator. For dictionaries, returns Pair{KeyType, ValType}. If the argument is array-like or is an iterator with the HasShape trait, the result will have the same shape and number of dimensions as the argument.

but collect of Iterators.product breaks this promise:

julia> Iterators.product(OffsetArray(rand(3), -1:1), 1:2) |> collect
3×2 OffsetArray(::Matrix{Tuple{Float64, Int64}}, -1:1, 1:2) with eltype Tuple{Float64, Int64} with indices -1:1×1:2:
 (0.892917, 1)  (0.892917, 2)
 (0.398366, 1)  (0.398366, 2)
 (0.550785, 1)  (0.550785, 2)

I think returning an AbstractArray is fine if the documentation is updated. Interestingly collect(collect()) results in a normal Array. So OffsetArrays is following the documentation.

@jishnub
Copy link
Contributor

jishnub commented Jun 3, 2023

Collect for AbstractArrays follows a different path from that for iterators. The latter seems to use similar to generate the destination, which it probably shouldn't

@jakobnissen
Copy link
Contributor

I think we need to be a little careful generalizing this more. See Issue 47777. Briefly:

  • If change collect to allow AbstractArray, then it may be breaking because people have read the documentation, and have written code that assumes it will return an Array, e.g. by feeding the result into a function annotated f(x::Array), which now would error
  • This change would make it harder to write code to "just give me an Array". Sometimes you really just do need an Array specifically.

@FelixBenning
Copy link
Author

FelixBenning commented Jun 4, 2023

Most people probably collect for the getindex feature. This means that an AbstractArray would be fine. Maybe the solution would be to have:

  • collect! do something to this data to allow me to use getindex on it - i.e. turn it into an AbstractArray. Try to avoid copying (take ownership), so no guarantees about the integrity of the original data.
  • collect copy the entire thing into a standard Array.

collect! would be a no-op for any AbstractArray and collect would always be copy.

@jishnub
Copy link
Contributor

jishnub commented Jun 4, 2023

Reading the documentation again,

If the argument is array-like or is an iterator with the HasShape trait, the result will have the same shape and number of dimensions as the argument.

Perhaps "shape" should be replaced either by size or axes, as an iterator that HasShape also defines axes for itself, and size may be interpreted as length.(axes) which is defined by default in this case. In the original iterator example, the axes are preserved, which might have been the intended meaning of shape in this case. Collecting an AbstractArray, on the other hand, preserves size and discards axes.

@LilithHafner LilithHafner added the triage This should be discussed on a triage call label Jun 4, 2023
@andrewjradcliffe
Copy link
Contributor

If I may propose behavior which should not be the case, illustrated on Base types, but which becomes permissible if AbstractArray returns are permissible:

julia> v = collect(Float64, 1:2:5)
1.0:2.0:5.0

julia> typeof(v) <: AbstractArray
true

One must be careful that collect actually produce concrete values, rather than another (implied) iterable entity which will be used to form a generator; or, even more surprising, an abstract representation from a concrete. In other words, let's not facilitate more abuse of AbstractArray on operations which should not be surprising. A non-so-julian solution is to introduce something such as collect_into(container_type, iter); this already exists in Julia as map due to its eager evaluation semantics.

@JeffBezanson
Copy link
Member

JeffBezanson commented Jul 6, 2023

Triage thinks collect means "return a new container where all the values are stored explicitly and contiguously" --- perhaps a DenseArray? I think it is currently always mutable as well. So I do think it can return something other than Array as appropriate. And ideally, if you really want an Array you should be able to get that via Array(itr) (currently, not always implemented unfortunately).

Docs should be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

6 participants