Also support BlockArrays #16

Luapulu · 2020-11-08T22:22:03Z

In light of this discourse discussion, could we also support BlockArrays in ArrayInterface.jl? I have time weekend after next to take a stab at implementing the suggested changes in that post.

Ideally, make efficient BlockArrays possible through traits, rather than having to subtype a AbstractBlockArray. Note, that I'm not familiar with the details here at all. Rerouting reduce operations, broadcasting, etc. through some set of traits might already be basically implemented or trivially easy to add. Or it might not even make sense.

tagging @chriselrod

The text was updated successfully, but these errors were encountered:

Tokazama · 2020-11-08T22:25:24Z

It's probably time packages like BlockArrays.jl start depending on ArrayInterface.jl instead of us defining them here.

chriselrod · 2020-11-08T22:34:34Z

I'll generalize contiguous_axis and contiguous_batch_size by letting them return tuples to represent block arrays.

chriselrod · 2020-11-08T22:35:40Z

The more interesting change will be having this work with ArrayInterface.getindex and ArrayInterface.setindex!. I haven't tried if contiguous_axis and contiguous_batch_size are actually supported yet.

This will be a breaking change, but I don't believe anyone else has used them yet. (I'll still bump the major version number, just saying that it wont cause problems for anyone using ArrayInterface AFAIK.)

Tokazama · 2020-11-08T22:54:30Z

The last thing I need in order to get all of the stuff in "stridelayout.jl" to work with the stuff in "indexing.jl" is replacing this internal method.

chriselrod · 2020-11-09T00:38:07Z

I need to walk back my earlier comment. I'd want equally sized blocks.
I think BlockArray(rand(4, 4), [2,2], [1,1,2]) is out of scope of at least what contiguous_axis and contiguous_batch_size are meant for.

Luapulu · 2020-11-09T01:21:15Z

So, would the block size have to divide the array size? Or could the last blocks along each axis be shorter? Because the second option would be needed for chunked data on disk for example.

chriselrod · 2020-11-09T01:42:50Z

The last block can be shorter.

chriselrod · 2020-11-09T01:44:52Z

But I also need to ask to clarify -- what is the actual memory layout you have in mind?

Do you only intend the array to be iterated in a certain way, or do you intend the memory layout to be in blocks?

Luapulu · 2020-11-09T08:36:06Z

So, for my use case, namely HDF5Arrays, I have both chunked and unchunked datasets. Chunked datasets are stored in separate blocks on disk, (each is internally contiguous). Unchunked datasets are stored in a fully contiguous manner but it may still be worthwhile to access these chunk by chunk to avoid many single disk read operations in favour of fewer larger reads.

There’s a little bit more of a wrinkle, since HDF5 is actually row-major but HDF5.jl reverses the column to row order when reading / writing so that, in memory / on disk a julia array and a HDF5 dataset have the same memory layout.

One more issue: caching. HDF5 internally has a chunk cache, where recently accessed chunks are cached, allowing for faster reads if you keep working with the same chunks. I may also do some benchmarking and find that an explicit in-memory buffer in julia may be worth it. The buffer may be the size of one chunk, or it may be larger. In the case if an unchunked dataset on disk, there wouldn’t be any chunks of course, so the size of the buffer could be anything.

Footnote here: this is why I want to avoid subtyping. Ideally, I’d like one HDF5Array superclass with a number of different implementations as sub types. But subtyping an AbstractBlockArray would force non-chunked arrays to also be BlockArrays. Sure, there are workarounds, I can avoid a superclass and just have a Union of the different implemented types and call It AnyHDF5Array or I can subtype a BlockArray and simply have unchunked arrays consist of a single block, effectively making it non-blocked.

Luapulu mentioned this issue Nov 8, 2020

Use ArrayInterface.jl JuliaArrays/BlockArrays.jl#134

Open

ChrisRackauckas transferred this issue from JuliaArrays/ArrayInterface.jl Feb 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Also support BlockArrays #16

Also support BlockArrays #16

Luapulu commented Nov 8, 2020

Tokazama commented Nov 8, 2020

chriselrod commented Nov 8, 2020

chriselrod commented Nov 8, 2020 •

edited

Loading

Tokazama commented Nov 8, 2020

chriselrod commented Nov 9, 2020

Luapulu commented Nov 9, 2020

chriselrod commented Nov 9, 2020

chriselrod commented Nov 9, 2020 •

edited

Loading

Luapulu commented Nov 9, 2020

Also support BlockArrays #16

Also support BlockArrays #16

Comments

Luapulu commented Nov 8, 2020

Tokazama commented Nov 8, 2020

chriselrod commented Nov 8, 2020

chriselrod commented Nov 8, 2020 • edited Loading

Tokazama commented Nov 8, 2020

chriselrod commented Nov 9, 2020

Luapulu commented Nov 9, 2020

chriselrod commented Nov 9, 2020

chriselrod commented Nov 9, 2020 • edited Loading

Luapulu commented Nov 9, 2020

chriselrod commented Nov 8, 2020 •

edited

Loading

chriselrod commented Nov 9, 2020 •

edited

Loading