-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayViews (A systematic array view framework) #5556
Conversation
Please review this. To me, this is ready to merge. The rationale of the design is here: https://github.com/lindahua/ArrayViews.jl |
It looks like I can't take a view of a view using linear indexing. Not sure if that's intentional, but it surprised me: x=randn(100,10)
y=view(x,3:5,3:5)
view(y,1:2)
ERROR: no method roffset(StridedView{Float64,2,1,Array{Float64,2}}, Range1{Int64})
in view at arrayviews.jl:568 |
@malmaud: view(view(x, 3:5, 3:5), a:b) is in a irregular pattern (the distance between adjacent elements vary) |
To be more specific, * * * *
* * * *
* * * *
. . . . If you take
Such a pattern cannot be represented as a strided view. On the other hand, you can perfectly do |
Quite correct, thanks for the explanation. Perhaps a better error message would be in order. Great work, btw. On Jan 27, 2014, at 5:35 PM, Dahua Lin notifications@github.com wrote:
|
I noticed that out-of-bounds views result in #undefs instead of throwing an error. Is that by design? julia> x=randn(2,2)
2x2 Array{Float64,2}:
-1.76904 1.32337
0.2108 1.11225
julia> view(x,:,1:3)
2x3 ContiguousView{Float64,2,Array{Float64,2}}:
-1.76904 1.32337 #undef
0.2108 1.11225 #undef |
I simply follow the behavior of julia> x = randn(2,2)
2x2 Array{Float64,2}:
0.678922 0.953143
-0.112102 0.174408
julia> sub(x, :, 1:3)
2x3 SubArray{Float64,2,Array{Float64,2},(Range1{Int64},Range1{Int64})}:
0.678922 0.953143 #undef
-0.112102 0.174408 #undef I am open to implement different behaviors if people agree that other behaviors are more desirable. |
Whereas I am using a quite different internal implementation & representation, I try to follow what |
I want to note that despite the large size of this PR, it is actually non-breaking. The function In my mind, the path forward:
|
I've looked at this briefly. I think it's a clever approach: come up with a type that works efficiently for special cases (that are common in practice) by placing more smarts in the constructor. In many practical applications, this is clearly an effective approach. Am I right in understanding that Interestingly, your "2D View with contiguous rank 0" case could have efficient linear indexing: because the number of rows is even, this example is equivalent to a single linear array of memory where you keep every other element. So there may be ways to extend this line of reasoning even further, albeit only in special cases (e.g., if that array had an odd number of rows, it couldn't be efficiently linearly-indexed). As far as eventually deprecating |
|
Is this going to be 0.3 stuff? If yes this might better merged sooner than later to get more testing. It would be great if |
I think this is ready for review. |
+1, I would love to see this merged and have |
This is really awesome. We had decided not to make array views the default in 0.3, but this PR is starting to make me change my mind. |
I think if it all possible it would be good to get it in. As @johnmyleswhite pointed out in another issue (I think it was the 0.3 umbrella issue), it's among the biggest breaking changes currently planned. |
@JeffBezanson: I can absolutely understand if you want to freeze the features at some point to get 0.3 out. But IMHO this is a really important design change that should happen rather sooner than later. + the code is of high quality code. I have ported some code from Numpy yesterday and it was 5 times slower because |
@lindahua, it's been a while since I looked at this. I remember that column-slices are contiguous and therefore fast, but what's the status of row-slices? |
@tknopp Yes, it is clear that this implementation is very high quality, which makes me seriously consider adding it to 0.3. @StefanKarpinski @ViralBShah |
@tknopp Could you try your code with |
+1 for merging this sooner than later |
@timholy The code try to detect most contiguous cases whenever it is possible to do so statically. For cases like However, it is completely possible and not too difficult to introduce a specific |
Copy on write is certainly an interesting idea to consider. |
I actually think that view semantics are often quite useful as a way to modify the original array via a view.
|
Copy on write is problematic since it means an array's data pointer can
|
I agree with @JeffBezanson. IMHO, copy-on-write introduces more problems than it solves. |
Maybe we could implement "copy on maybe write" semantics by determining if a given function could change an array's data pointer (i.e., if it might call |
"here's my pointer, so copy me maybe"? |
I get dereference. |
@carlyraejepsen should weigh in |
I nominate this as Jiahao's best pun yet. |
@simonster – with dynamic dispatch, it's amazingly hard to know if arrayset will get called or not. Even worse, whether the compiler thinks it will or not could depend on subtle changes to type inference. Personally, I think it's crazy for something as major as copying an array to happen or not based on something that's implicit and almost impossible to predict. That's my feeling about copy-on-write in general, not just your proposal. |
I guess it is difficult to determine which functions could touch which array because of aliasing: multiple arguments to the function could be or contain the same array. But in a potentially non-existent world where it could be implemented without slowing down array operations, I don't think copy-on-write would be such a bad idea. Subtle changes to type inference can already lead to very large changes in performance. |
I still think that allowing users to explicitly articulate whether they want to make a copy or just creating a shallow view is the clearest way ahead. In a lot of practical applications, I just want a view to write to that part of array. |
In my mind it would be quite natural if array indexing would return views. If we currently think of the following Julia code:
it already behaves like a view when the array is an lvalue. This is the only sensible thing in that situation but still it does something different then when used as an rvalue currently. |
This is great stuff. I've been working on a custom implementation of a strided indexer for my abstract array class, and this kind of thing might render it obsolete. (And I like that.) One thing I ask, though: what about the various functions that can (at least, sometimes) manipulate views in an efficient manner? I'm specifically thinking of I'm happy to share my work if it will help facilitate. I've actually implemented this as a sort of mutlidimensional range class: that is, it's an |
6c7c7e3
to
1a4c02f
Compare
Close since #8501 was merged? |
Yes, this can be closed |
Efficient array views have been a long standing issue in Julia. Recently, I decided to tackle this problem again, and thus made this Pull Request.
This PR is actually migrated from ArrayViews.jl, which I originally created to explore this.
An important feature of this approach is that it has two view types ContiguousView and StridedView. Each strided view is associated with a static number called contiguous rank, which can be used to determine (statically) the contiguousness of a subview thereof.
A contiguous view (with more compact representation and faster indexing) will be returned whenever the contiguousness of a view can be determined statically, thus substantially improving indexing efficiency in a lot of common situations.
Here is a brief summary of my benchmark results obtained so far: