RFC: Simplify bounds checks for multi-dimensional array accesses #18064

MatthiasJReisinger · 2016-08-16T20:02:59Z

This simplifies the array bounds check code that is emitted for
multi-dimensional array accesses that use "regular" indexing, i.e.
accesses of the form A[i1,i2,...,iN] to some N-dimensional array
A.

For example, with this change, the access A[i,j,k] to an array
with the three dimensions m, n and o now leads to bounds checks
that correspond to the following pseudo code:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (k >= o)
  out_of_bounds_error();

So far, the following more complicated bounds checks would have been
emitted:

if (i >= m)
  out_of_bounds_error();
else if (j >= n)
  out_of_bounds_error();
else if (((k * n + j) * m + i) >= m * n * o)
  out_of_bounds_error();

Julia also allows one-dimensional and "partial" linear indexing
(see #14770), i.e. the number of indices used to access an array does
not have to match the actual number of dimensions of the accessed array.
For this case we still have use this old scheme.

One motivation for this change was the following: expressions like
((k * n + j) * m + i) are non-affine and Polly would not be able
to analyze them. This change therefore also facilitates Polly's bounds
check elimination logic, which would hoist such checks out of loops
or may remove them entirely where possible.

blakejohnson · 2016-08-17T02:29:36Z

The existing code is kind of strange. What do we get by checking the total linearized index instead of just checking the last dimension?

MatthiasJReisinger · 2016-08-17T08:29:07Z

I think that we could save a load with the linearized check because we do not need to access the last array dimension (m * n * o isn't actually computed but loaded from memory). But I don't know if that's the real reason. Sorry, that thought was contradictory, since m * n * o has to be loaded too we don't save a load.

timholy · 2016-08-17T14:44:00Z

@blakejohnson, that's probably "partial linear indexing" (see #14770).

tobiasgrosser · 2016-08-17T15:16:45Z

Thinking about this partial indexing, we should be able to handle this case without special-casing Polly by generating the last check as:

if (last_index >= product_of_remaining_dimensions())
  out_of_bound_error();

In case dimensions and indexes match, we get the simple checks Matthias is looking for. In case of partial linear indexing, we still get checks that ensure we do not access outside of allocated memory. This would make Polly work in the common case without partial indexing, but partial indexing would continue to work.

timholy · 2016-08-17T15:34:32Z

Since there seems to be a consensus about getting rid of partial indexing (just needs someone to actually implement the change), another option would be to say you can't use it in @polly-annotated blocks.

But if it's easy to keep, that would be fine, too.

MatthiasJReisinger · 2016-08-17T19:36:14Z

Thank you for your comments! @TobiG With this approach we would have one load for each remaining dimension, whereas with the existing linearized check we only have a single one to retrieve the linearized array length (I hope I didn't mix that up again). Would it be okay to accept this? I just wanted to ask to be one the safe side.

tobiasgrosser · 2016-08-17T19:40:17Z

On Wed, Aug 17, 2016, at 09:36 PM, Matthias J. Reisinger wrote:

Thank you for your comments! @TobiG With this approach we would have one
load for each remaining dimension, whereas with the existing linearized
check we only have a single one to retrieve the linearized array length
(I hope I didn't mix that up again). Would it be okay to accept this? I
just wanted to ask to be one the safe side.

Right. I personally would be OK with this, but I am not part of the
Yulia side. However, I realized that this might not work. Do we actually
have the remaining dimensions? AFAIK the array only saves the full
length.

Best,
Tobias

blakejohnson · 2016-08-18T02:43:35Z

Do we actually have the remaining dimensions?

Yes, emit_arraysize can be used to look up the dimensions.

MatthiasJReisinger · 2016-08-18T13:13:58Z

Thank you for your thoughts again! I now updated the PR in the following way: As soon as we have an array access that uses (1D or partial) linear indexing we still use the existing bounds checking, but in the case of regular indexing the simpler check is used (to be on the safe side here: is the check nidxs < nd sufficient to determine the presence of partial indexing?). That also allows us to remove the special-handling for Polly and still gives us the simpler bounds checks. Then, for code parts that use @polly it would be sufficient to require the absence of partial indexing as @timholy suggested.

Do you think this is the direction we could choose?

tobiasgrosser · 2016-08-18T13:27:58Z

src/cgutils.cpp

+            builder.CreateCondBr(builder.CreateICmpULT(i, alen), endBB, failBB);
+        } else {
+            // We have already emitted a bounds check for each index except for
+            // the last one which we therefore emit here, i.e.


This sentence matches both branches and should be moved before the if (bc). Please also add a comment discussing the difference of the two checks and why each is used. Something like:

For the last dimension there are two options of how to emit bounds checks. In case only a single dimension remains (the number of subscript expressions in the array matches the number of array dimensions), we check if the outermost subscript is smaller than the size of the outermost dimension. In case more than one array dimension is left when we reach the last subscript (partial indexing), we verify if the computed element address is within the bounds of the allocated memory range.

tobiasgrosser · 2016-08-18T13:34:47Z

Hi Matthias,

thanks for the update. I like this change a lot. Two remarks:

The first commit in this PR seems now unnecessary.
The commit message in the second commit is not fully up-to-date. It currently states we use simpler checks when Polly is used, but we now generate them as soon as no partial indexing is used. This means, Polly is now only a motivating use for this change.

MatthiasJReisinger · 2016-08-18T20:41:05Z

Thank you for the feedback! I addressed it in the updated PR.

Best,
Matthias

tobiasgrosser · 2016-08-18T20:43:40Z

This looks great from my side. What do Julia people think?

This simplifies the array bounds check code that is emitted for multi-dimensional array accesses that use "regular" indexing, i.e. accesses of the form `A[i1,i2,...,iN]` to some `N`-dimensional array `A`. For example, with this change, the access `A[i,j,k]` to an array with the three dimensions `m`, `n` and `o` now leads to bounds checks that correspond to the following pseudo code: ``` if (i >= m) out_of_bounds_error(); else if (j >= n) out_of_bounds_error(); else if (k >= o) out_of_bounds_error(); ``` So far, the following more complicated bounds checks would have been emitted: ``` if (i >= m) out_of_bounds_error(); else if (j >= n) out_of_bounds_error(); else if (((k * n + j) * m + i) >= m * n * o) out_of_bounds_error(); ``` Julia also allows one-dimensional and "partial" linear indexing (see JuliaLang#14770), i.e. the number of indices used to access an array does not have to match the actual number of dimensions of the accessed array. For this case we still have use this old scheme. One motivation for this change was the following: expressions like `((k * n + j) * m + i)` are non-affine and Polly would not be able to analyze them. This change therefore also facilitates Polly's bounds check elimination logic, which would hoist such checks out of loops or may remove them entirely where possible.

vtjnash · 2016-08-22T16:02:32Z

lgtm. this seems like a good find and a good fix. I'll merge if nanosoldier agrees (which I expect it will).

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-08-22T16:49:09Z

Something went wrong when running your job: RemoteException(3,CapturedException(NanosoldierError: failed to run benchmarks against primary commit: ErrorException("failed process: Process(make, ProcessExited(2)) [2]"),Any[( in pipeline_error at process.jl:612,1),( in run at process.jl:588,1),( in build_julia! at build.jl:54,1),( in execute_benchmarks! at BenchmarkJob.jl:251,1),( in run at BenchmarkJob.jl:178,1),( in #503 at multi.jl:1410,1),( in run_work_thunk at multi.jl:996,1),( in macro expansion at multi.jl:1410 [inlined],1),( in #502 at event.jl:46,1)]))
cc @jrevels

blakejohnson · 2016-08-22T17:34:30Z

LGTM, too.

jrevels · 2016-08-22T17:57:56Z

Looks like doing a clean build from master is recently broken (is that correct @tkelman?), so Nanosoldier isn't going to be able to build Julia until that gets resolved.

tkelman · 2016-08-23T12:02:39Z

the nightly build hit a different failure, which may or may not happen again...

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-08-23T14:38:45Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

ViralBShah added the compiler:codegen Generation of LLVM IR and native code label Aug 18, 2016

MatthiasJReisinger force-pushed the mjr/polly/boundchecks branch from 32d9e28 to 1fc813d Compare August 18, 2016 13:11

tobiasgrosser reviewed Aug 18, 2016
View reviewed changes

MatthiasJReisinger force-pushed the mjr/polly/boundchecks branch from 1fc813d to 39e1dc5 Compare August 18, 2016 20:26

MatthiasJReisinger changed the title ~~WIP: Emit simpler array bounds checks when using Polly~~ WIP: Simplify bounds checks for multi-dimensional array accesses Aug 18, 2016

MatthiasJReisinger force-pushed the mjr/polly/boundchecks branch from 39e1dc5 to ecd4a37 Compare August 18, 2016 20:35

MatthiasJReisinger force-pushed the mjr/polly/boundchecks branch from ecd4a37 to 9f531d9 Compare August 19, 2016 18:04

MatthiasJReisinger changed the title ~~WIP: Simplify bounds checks for multi-dimensional array accesses~~ RFC: Simplify bounds checks for multi-dimensional array accesses Aug 22, 2016

kshyatt added the arrays [a, r, r, a, y, s] label Aug 22, 2016

vtjnash merged commit d474ca3 into JuliaLang:master Aug 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

MatthiasJReisinger commented Aug 16, 2016 •

edited

Loading

blakejohnson commented Aug 17, 2016

MatthiasJReisinger commented Aug 17, 2016 •

edited

Loading

timholy commented Aug 17, 2016

tobiasgrosser commented Aug 17, 2016

timholy commented Aug 17, 2016

MatthiasJReisinger commented Aug 17, 2016

tobiasgrosser commented Aug 17, 2016

blakejohnson commented Aug 18, 2016

MatthiasJReisinger commented Aug 18, 2016

tobiasgrosser Aug 18, 2016 •

edited

Loading

tobiasgrosser commented Aug 18, 2016

MatthiasJReisinger commented Aug 18, 2016

tobiasgrosser commented Aug 18, 2016

vtjnash commented Aug 22, 2016

nanosoldier commented Aug 22, 2016

blakejohnson commented Aug 22, 2016

jrevels commented Aug 22, 2016

tkelman commented Aug 23, 2016

nanosoldier commented Aug 23, 2016

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

RFC: Simplify bounds checks for multi-dimensional array accesses #18064

Conversation

MatthiasJReisinger commented Aug 16, 2016 • edited Loading

blakejohnson commented Aug 17, 2016

MatthiasJReisinger commented Aug 17, 2016 • edited Loading

timholy commented Aug 17, 2016

tobiasgrosser commented Aug 17, 2016

timholy commented Aug 17, 2016

MatthiasJReisinger commented Aug 17, 2016

tobiasgrosser commented Aug 17, 2016

blakejohnson commented Aug 18, 2016

MatthiasJReisinger commented Aug 18, 2016

tobiasgrosser Aug 18, 2016 • edited Loading

Choose a reason for hiding this comment

tobiasgrosser commented Aug 18, 2016

MatthiasJReisinger commented Aug 18, 2016

tobiasgrosser commented Aug 18, 2016

vtjnash commented Aug 22, 2016

nanosoldier commented Aug 22, 2016

blakejohnson commented Aug 22, 2016

jrevels commented Aug 22, 2016

tkelman commented Aug 23, 2016

nanosoldier commented Aug 23, 2016

MatthiasJReisinger commented Aug 16, 2016 •

edited

Loading

MatthiasJReisinger commented Aug 17, 2016 •

edited

Loading

tobiasgrosser Aug 18, 2016 •

edited

Loading