You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a summary of the current state of antialiased lines in Datashader, and possible future changes.
Datashader supports antialiasing in all Canvas.line calls regardless of the type of data source, for all reductions except std and var, on CPU but not GPU, with and without Dask. There are some good examples of the output in issue #1148. Since then antialiased mean reduction has improved (#1300) so that the output now is better:
It is necessary to understand some of the implementation details to understand the limitations and impact of possible future changes. When using antialiasing, all aggregations are floating point as this is necessary to support the fractional antialiased edge. For example, a non-antialiased count reduction using uint32 whereas an antialiased count reduction uses float32.
Some reductions can be calculated in a single pass, just like non-antialiased reductions, but some need two passes. This is referred to internally as the "2nd stage aggregation". This can have a large performance impact but is necessary to produce accurate results. This is closely related to the concept of self-intersection, and indeed there are some reductions (count and sum) which have a self_intersect boolean kwarg to control whether self-intersection should be considered or ignored.
Consider drawing a 10-pixel wide line in Bokeh with a 50% alpha blue line (RGBA of #00f8) where the line crosses over itself. The pixels displayed are blue wherever the line goes, but the alpha value varies from 0 (no line at all) to 50%. Even where the line crosses over itself, the alpha is not additive, it is the maximum of the alpha of the contributing line segment so it is never more than 50% anywhere. This also applies in the antialiased line edges as where two edges coincide we don't add the alpha we take the max. So a Datashader equivalent of this 2D graphics renderer approach for an any or count reduction (i.e. the really simple ones that don't use some other value) is actually a max reduction. This is the essence of why we need two stage aggregations. For any and max a single stage will always suffice. For a min we need two stages, the first is a max stage and then the second is a min. A count or sum ignoring self-intersections also used two stages, otherwise we double-count those intersection locations. A count or sum with self-intersections can be done in a single stage but we have to be careful. At a join between adjacent line segments the finite width of the line means that we can touch a particular pixel twice, e.g. once in the edge of the end of the first segment and the second time in the middle of the second segment.
In a conventional 2D graphics renderer, pixels where alpha is less than one mix in some of the color below. We fundamentally cannot do this, there is no color below. DataFrame rows are processed in a different order depending on if we are using Dask and if so, how the DataFrame is partitioned. So we can only ever use our alpha to mean fraction of the value we would be using if we weren't antialiasing, we cannot mix this with (1-alpha) of the color/value beneath. This means that the edges of antialiased lines that go over each other have to have jagged edges as we transition from a very small alpha of the best value to a very high alpha of the second best value.
That is a brief summary of the complexity of the internals. Let's now take a step back and identify what we like to do better.
Up until recently the mean reduction gave bad results. It is implemented as a sum divided by a count reduction and so the antialiased edges of both would cancel out giving a result that is not antialiased. This has been fixed though, by adding a extra private reduction that is a count that ignores antialiasing.
Antialiased where reductions can give non-antialiased results. Use of the selector reduction of a where returns a row index which is an integer, so it is not antialiased. If this row index is returned to the user then it is fine. But if the row index is used to lookup a different column to return to the user, it cannot be antialiased as we have already thrown away that information. This has a wider impact than is immediately obvious as first and last reductions using dask are implemented internally as where reductions, thus they can give non-antialiased results if using dask but antialiased if not using dask.
It would be great to get rid of the second-stage aggregation as it is both complicated code, and potentially slow.
Using antialiasing with colorful colormaps can give unusual results. If you use a monochrome or nearly monochrome colormap the results are fine, but consider e.g. a fire colormap used with an any reduction. The middle of lines will be yellow, but the edges will be half of the value meaning they are displayed red. Really we want to display them still as yellow but with half alpha.
Let me say now that I have no answer for these! But I do have some thoughts on possibilities.
To deal with item 2 we could switch to the internal workings of an antialiased reduction using two separate aggregations, one the original non-antialiased one and a second floating-point one that is effectively the corresponding alpha. This fixes item 2 straightaway as the selector for the where would calculate both the row index agg and the alpha agg, and we use the row index agg to lookup the required column to return to the user and multiply it by the alpha. There would be a performance impact here as each append function would be updating two separate agg arrays, but we would still only be looping through the data source once.
It starts to look like we can address some of item 3 with this. Take a min reduction which is currently 2-stage. Now we can store the min value in the first agg and the alpha in the second is the maximum of all alphas that visited this pixel with the min value. I am not yet sure if this works for more complex aggregating (rather than selecting) reductions. Dropping the second-stage aggregation here means we lose the ability to have non-self-intersecting reductions, which might be fine but is a big change. It is also not clear how this affects 3D aggregations like max_n.
So far this new extra alpha agg has just stayed internally within each individual Canvas.line call. For item 4, if this alpha could be kept around and reused for e.g. transfer_functions.shade then we would be able to color pixels by their non-antialiased value and then apply the alpha after that. The edges where we transition from one line to another will be abruptly jagged, and I don't see how this would interact with the current alpha mixing of categorical aggregations. For this to work the returned xr.DataArray or xr.Dataset from the Canvas.line call would have to be different, which is a pretty major API change.
The text was updated successfully, but these errors were encountered:
This is a summary of the current state of antialiased lines in Datashader, and possible future changes.
Datashader supports antialiasing in all
Canvas.line
calls regardless of the type of data source, for all reductions exceptstd
andvar
, on CPU but not GPU, with and without Dask. There are some good examples of the output in issue #1148. Since then antialiasedmean
reduction has improved (#1300) so that the output now is better:It is necessary to understand some of the implementation details to understand the limitations and impact of possible future changes. When using antialiasing, all aggregations are floating point as this is necessary to support the fractional antialiased edge. For example, a non-antialiased
count
reduction usinguint32
whereas an antialiasedcount
reduction usesfloat32
.Some reductions can be calculated in a single pass, just like non-antialiased reductions, but some need two passes. This is referred to internally as the "2nd stage aggregation". This can have a large performance impact but is necessary to produce accurate results. This is closely related to the concept of self-intersection, and indeed there are some reductions (
count
andsum
) which have aself_intersect
boolean kwarg to control whether self-intersection should be considered or ignored.Consider drawing a 10-pixel wide line in Bokeh with a 50% alpha blue line (RGBA of
#00f8
) where the line crosses over itself. The pixels displayed are blue wherever the line goes, but the alpha value varies from 0 (no line at all) to 50%. Even where the line crosses over itself, the alpha is not additive, it is the maximum of the alpha of the contributing line segment so it is never more than 50% anywhere. This also applies in the antialiased line edges as where two edges coincide we don't add the alpha we take the max. So a Datashader equivalent of this 2D graphics renderer approach for anany
orcount
reduction (i.e. the really simple ones that don't use some othervalue
) is actually amax
reduction. This is the essence of why we need two stage aggregations. Forany
andmax
a single stage will always suffice. For amin
we need two stages, the first is amax
stage and then the second is amin
. Acount
orsum
ignoring self-intersections also used two stages, otherwise we double-count those intersection locations. Acount
orsum
with self-intersections can be done in a single stage but we have to be careful. At a join between adjacent line segments the finite width of the line means that we can touch a particular pixel twice, e.g. once in the edge of the end of the first segment and the second time in the middle of the second segment.In a conventional 2D graphics renderer, pixels where
alpha
is less than one mix in some of the color below. We fundamentally cannot do this, there is no color below. DataFrame rows are processed in a different order depending on if we are using Dask and if so, how the DataFrame is partitioned. So we can only ever use ouralpha
to mean fraction of the value we would be using if we weren't antialiasing, we cannot mix this with(1-alpha)
of the color/value beneath. This means that the edges of antialiased lines that go over each other have to have jagged edges as we transition from a very smallalpha
of the best value to a very highalpha
of the second best value.That is a brief summary of the complexity of the internals. Let's now take a step back and identify what we like to do better.
Up until recently the
mean
reduction gave bad results. It is implemented as asum
divided by acount
reduction and so the antialiased edges of both would cancel out giving a result that is not antialiased. This has been fixed though, by adding a extra private reduction that is a count that ignores antialiasing.Antialiased
where
reductions can give non-antialiased results. Use of theselector
reduction of awhere
returns a row index which is an integer, so it is not antialiased. If this row index is returned to the user then it is fine. But if the row index is used to lookup a different column to return to the user, it cannot be antialiased as we have already thrown away that information. This has a wider impact than is immediately obvious asfirst
andlast
reductions using dask are implemented internally aswhere
reductions, thus they can give non-antialiased results if using dask but antialiased if not using dask.It would be great to get rid of the second-stage aggregation as it is both complicated code, and potentially slow.
Using antialiasing with colorful colormaps can give unusual results. If you use a monochrome or nearly monochrome colormap the results are fine, but consider e.g. a
fire
colormap used with anany
reduction. The middle of lines will be yellow, but the edges will be half of the value meaning they are displayed red. Really we want to display them still as yellow but with half alpha.Let me say now that I have no answer for these! But I do have some thoughts on possibilities.
To deal with item 2 we could switch to the internal workings of an antialiased reduction using two separate aggregations, one the original non-antialiased one and a second floating-point one that is effectively the corresponding
alpha
. This fixes item 2 straightaway as theselector
for thewhere
would calculate both the row index agg and the alpha agg, and we use the row index agg to lookup the required column to return to the user and multiply it by thealpha
. There would be a performance impact here as eachappend
function would be updating two separate agg arrays, but we would still only be looping through the data source once.It starts to look like we can address some of item 3 with this. Take a
min
reduction which is currently 2-stage. Now we can store the min value in the first agg and the alpha in the second is the maximum of all alphas that visited this pixel with the min value. I am not yet sure if this works for more complex aggregating (rather than selecting) reductions. Dropping the second-stage aggregation here means we lose the ability to have non-self-intersecting reductions, which might be fine but is a big change. It is also not clear how this affects 3D aggregations likemax_n
.So far this new extra
alpha
agg has just stayed internally within each individualCanvas.line
call. For item 4, if thisalpha
could be kept around and reused for e.g.transfer_functions.shade
then we would be able to color pixels by their non-antialiased value and then apply the alpha after that. The edges where we transition from one line to another will be abruptly jagged, and I don't see how this would interact with the currentalpha
mixing of categorical aggregations. For this to work the returnedxr.DataArray
orxr.Dataset
from theCanvas.line
call would have to be different, which is a pretty major API change.The text was updated successfully, but these errors were encountered: