-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also support BlockArrays #16
Comments
It's probably time packages like BlockArrays.jl start depending on ArrayInterface.jl instead of us defining them here. |
I'll generalize |
The more interesting change will be having this work with This will be a breaking change, but I don't believe anyone else has used them yet. (I'll still bump the major version number, just saying that it wont cause problems for anyone using ArrayInterface AFAIK.) |
The last thing I need in order to get all of the stuff in "stridelayout.jl" to work with the stuff in "indexing.jl" is replacing this internal method. |
I need to walk back my earlier comment. I'd want equally sized blocks. |
So, would the block size have to divide the array size? Or could the last blocks along each axis be shorter? Because the second option would be needed for chunked data on disk for example. |
The last block can be shorter. |
But I also need to ask to clarify -- what is the actual memory layout you have in mind? Do you only intend the array to be iterated in a certain way, or do you intend the memory layout to be in blocks? |
So, for my use case, namely HDF5Arrays, I have both chunked and unchunked datasets. Chunked datasets are stored in separate blocks on disk, (each is internally contiguous). Unchunked datasets are stored in a fully contiguous manner but it may still be worthwhile to access these chunk by chunk to avoid many single disk read operations in favour of fewer larger reads. There’s a little bit more of a wrinkle, since HDF5 is actually row-major but HDF5.jl reverses the column to row order when reading / writing so that, in memory / on disk a julia array and a HDF5 dataset have the same memory layout. One more issue: caching. HDF5 internally has a chunk cache, where recently accessed chunks are cached, allowing for faster reads if you keep working with the same chunks. I may also do some benchmarking and find that an explicit in-memory buffer in julia may be worth it. The buffer may be the size of one chunk, or it may be larger. In the case if an unchunked dataset on disk, there wouldn’t be any chunks of course, so the size of the buffer could be anything. Footnote here: this is why I want to avoid subtyping. Ideally, I’d like one HDF5Array superclass with a number of different implementations as sub types. But subtyping an AbstractBlockArray would force non-chunked arrays to also be BlockArrays. Sure, there are workarounds, I can avoid a superclass and just have a Union of the different implemented types and call It AnyHDF5Array or I can subtype a BlockArray and simply have unchunked arrays consist of a single block, effectively making it non-blocked. |
In light of this discourse discussion, could we also support BlockArrays in ArrayInterface.jl? I have time weekend after next to take a stab at implementing the suggested changes in that post.
Ideally, make efficient BlockArrays possible through traits, rather than having to subtype a
AbstractBlockArray
. Note, that I'm not familiar with the details here at all. Rerouting reduce operations, broadcasting, etc. through some set of traits might already be basically implemented or trivially easy to add. Or it might not even make sense.tagging @chriselrod
The text was updated successfully, but these errors were encountered: