Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Algorithm for identifying the cases where dynamic indexing can be turned into pointer + offset #3810

Closed
strongoier opened this issue Dec 15, 2021 · 0 comments · Fixed by #3854
Labels
discussion Welcome discussion! enhancement Make existing things or codebases better RFC

Comments

@strongoier
Copy link
Contributor

strongoier commented Dec 15, 2021

In #2590, there is an important step (here dynamic indexing refers to indexing elements of a vector/matrix field with a variable):

Develop an algorithm to identify the cases where the dynamic indexing can be turned into the form of pointer + offset at compile time, and use that form for these cases.

In this issue, I propose an algorithm, and would like to get some feedback on the design. Discussions are highly welcome! If the design looks good to most of you, I will go ahead and implement it.

I will illustrate the algorithm using the following example, which is a bit complex but able to show the ability boundary of the algorithm.

ti.init(arch=ti.cuda, debug=True)

# placeholders
temp_a = ti.field(ti.f32)
temp_b = ti.field(ti.f32)
temp_c = ti.field(ti.f32)
# our focus
v = ti.Vector.field(3, ti.i32)
x = v.get_scalar_field(0)
y = v.get_scalar_field(1)
z = v.get_scalar_field(2)

S0 = ti.root
S1 = S0.pointer(ti.i, 4)
S2 = S1.dense(ti.i, 2)
S3 = S2.pointer(ti.i, 8)
S3.place(temp_a)
S4 = S2.dense(ti.i, 16)
S4.place(x)
S5 = S1.dense(ti.i, 2)
S6 = S5.pointer(ti.i, 8)
S6.place(temp_b)
S7 = S5.dense(ti.i, 16)
S7.place(y)
S8 = S1.dense(ti.i, 2)
S9 = S8.dense(ti.i, 32)
S9.place(temp_c)
S10 = S8.dense(ti.i, 16)
S10.place(z)

Previously, if we want to access v[field_index, vector_index], vector_index must be a compile-time constant. Supporting dynamic indexing means that we can have a variable vector_index. The main idea for enabling this is to figure out addr(v[field_index, vector_index]) = addr(v[field_index, 0]) + f(v, vector_index) whenever possible. As addr(v[field_index, 0]) is what we already have, our focus here is to design the function f(v, vector_index).

The function exists only in the following case: for any field_index, there is addr(y[field_index]) - addr(x[field_index]) == addr(z[field_index]) - addr(y[field_index]) == stride. Then we have f(v, vector_index) = vector_index * stride.

Now the problem becomes how to verify the above condition and calculate stride. My algorithm is as follows:

  1. Get the LCA (lowest common ancestor) of x, y, z, which is S1, and check if the paths from the LCA to x, y, z are all dense (not including the LCA):
#    -> S2 -> S4 -> x
#   /
# S1 -> S5 -> S7 -> y
#   \
#    -> S8 -> S10 -> z

assert S2.ptr.type == ti._lib.core.SNodeType.dense
assert S4.ptr.type == ti._lib.core.SNodeType.dense
assert S5.ptr.type == ti._lib.core.SNodeType.dense
assert S7.ptr.type == ti._lib.core.SNodeType.dense
assert S8.ptr.type == ti._lib.core.SNodeType.dense
assert S10.ptr.type == ti._lib.core.SNodeType.dense
  1. Check if the internal access patterns below the LCA are exactly the same:
# S2 -> S4 level
assert S2.cell_size_bytes == S5.cell_size_bytes == S8.cell_size_bytes
assert S4.offset_bytes_in_parent_cell == S7.offset_bytes_in_parent_cell == S10.offset_bytes_in_parent_cell
# S4 -> x level
assert S4.cell_size_bytes == S7.cell_size_bytes == S10.cell_size_bytes
assert x.snode.offset_bytes_in_parent_cell == y.snode.offset_bytes_in_parent_cell == z.snode.offset_bytes_in_parent_cell
  1. Check if there is a fixed stride:
if S5.offset_bytes_in_parent_cell - S2.offset_bytes_in_parent_cell == S8.offset_bytes_in_parent_cell - S5.offset_bytes_in_parent_cell:
    stride = S5.offset_bytes_in_parent_cell - S2.offset_bytes_in_parent_cell

Running these code will get stride == 384 on my local machine. Its correctness can be verified with:

@ti.kernel
def check_stride():
    for i in range(128):
        assert ti.get_addr(y, i) - ti.get_addr(x, i) == stride
        assert ti.get_addr(z, i) - ti.get_addr(y, i) == stride

check_stride()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Welcome discussion! enhancement Make existing things or codebases better RFC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant