Skip to content

Commit

Permalink
[docs] add performance tip concerning overly-fused broadcast loops (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
adienes authored Sep 15, 2023
1 parent d9a9be8 commit b5feb45
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 0 deletions.
2 changes: 2 additions & 0 deletions doc/src/manual/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1162,6 +1162,8 @@ julia> 1:5 .|> [x->x^2, inv, x->2*x, -, isodd]
true
```

All functions in the fused broadcast are always called for every element of the result. Thus `X .+ σ .* randn.()` will add a mask of independent and identically sampled random values to each element of the array `X`, but `X .+ σ .* randn()` will add the *same* random sample to each element. In cases where the fused computation is constant along one or more axes of the broadcast iteration, it may be possible to leverage a space-time tradeoff and allocate intermediate values to reduce the number of computations. See more at [performance tips](@ref man-performance-unfuse).

## Further Reading

We should mention here that this is far from a complete picture of defining functions. Julia has
Expand Down
27 changes: 27 additions & 0 deletions doc/src/manual/performance-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,33 @@ a new temporary array and executes in a separate loop. In this example
convenient to sprinkle some dots in your expressions than to
define a separate function for each vectorized operation.

## [Fewer dots: Unfuse certain intermediate broadcasts](@id man-performance-unfuse)

The dot loop fusion mentioned above enables concise and idiomatic code to express highly performant operations. However, it is important to remember that the fused operation will be computed at every iteration of the broadcast. This means that in some situations, particularly in the presence of composed or multidimensional broadcasts, an expression with dot calls may be computing a function more times than intended. As an example, say we want to build a random matrix whose rows have Euclidean norm one. We might write something like the following:
```
julia> x = rand(1000, 1000);
julia> d = sum(abs2, x; dims=2);
julia> @time x ./= sqrt.(d);
0.002049 seconds (4 allocations: 96 bytes)
```
This will work. However, this expression will actually recompute `sqrt(d[i])` for *every* element in the row `x[i, :]`, meaning that many more square roots are computed than necessary. To see precisely over which indices the broadcast will iterate, we can call `Broadcast.combine_axes` on the arguments of the fused expression. This will return a tuple of ranges whose entries correspond to the axes of iteration; the product of lengths of these ranges will be the total number of calls to the fused operation.

It follows that when some components of the broadcast expression are constant along an axis—like the `sqrt` along the second dimension in the preceding example—there is potential for a performance improvement by forcibly "unfusing" those components, i.e. allocating the result of the broadcasted operation in advance and reusing the cached value along its constant axis. Some such potential approaches are to use temporary variables, wrap components of a dot expression in `identity`, or use an equivalent intrinsically vectorized (but non-fused) function.
```
julia> @time let s = sqrt.(d); x ./= s end;
0.000809 seconds (5 allocations: 8.031 KiB)
julia> @time x ./= identity(sqrt.(d));
0.000608 seconds (5 allocations: 8.031 KiB)
julia> @time x ./= map(sqrt, d);
0.000611 seconds (4 allocations: 8.016 KiB)
```

Any of these options yields approximately a three-fold speedup at the cost of an allocation; for large broadcastables this speedup can be asymptotically very large.

## [Consider using views for slices](@id man-performance-views)

In Julia, an array "slice" expression like `array[1:5, :]` creates
Expand Down

0 comments on commit b5feb45

Please sign in to comment.