You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now they're treated at runtime information.
Repro script:
import taichi as ti
import numpy as np
ti.init(arch=ti.opengl, log_level=ti.TRACE)
def _test_compiled_functions():
@ti.kernel
def func(a: ti.any_arr(element_dim=1)):
for i in range(5):
for j in range(4):
a[i][j * j] = j * j
v = ti.Vector.ndarray(10, ti.i32, 5)
func(v)
assert ti.get_runtime().get_num_compiled_functions() == 1
v = ti.Vector.ndarray(11, ti.i32, 5)
func(v)
print(ti.get_runtime().get_num_compiled_functions())
_test_compiled_functions()
Note that although we compiled twice for the above program, the generated opengl kernels are indeed identical (shown below). In other words, we ought to have used element_shape as compile time information but we didn't.
This can also be verified in the code snippet below that _args_i32_[16 + 0 * 8 + 1] (element_shape=10 or 11), which should have been a constant, but now it's loaded as a runtime variable.
Side note: the reason we recompile for different element_shape now is that we force that to happen in python. https://github.com/taichi-dev/taichi/blob/master/python/taichi/lang/kernel_impl.py#L271-L289 Unfortunately we only enforce recompile but didn't make use of the compile time information at all.
#version 430 core
layout(local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
precision highp float;
layout(std430, binding = 2) buffer args_i32 { int _args_i32_[];};
layout(std430, binding = 4) buffer arr0_i32 { int _arr0_i32_[];};
const float inf = 1.0f / 0.0f;
const float nan = 0.0f / 0.0f;
void func_c44_00()
{ // range for
// range known at compile time
int _sid0 = int(gl_GlobalInvocationID.x);
for (int _sid = _sid0; _sid < (5); _sid += int(gl_WorkGroupSize.x * gl_NumWorkGroups.x)) {
int _itv = 0 + _sid;
int B = _itv;
int C = int(0);
int D = int(4);
for (int E_ = C; E_ < D; E_ += 1) {
int E = E_;
int F = E;
int G = F * F;
int _li_I = 0;
int _s0_arr0 = _args_i32_[16 + 0 * 8 + 0];
int _s1_arr0 = _args_i32_[16 + 0 * 8 + 1];
{ // linear seek
_li_I *= _s0_arr0;
_li_I += B;
_li_I *= _s1_arr0;
_li_I += G;
}
int I = _li_I << 2;
_arr0_i32_[I >> 2] = G;
}
}
}
void main()
{
func_c44_00();
}
class Ndarray {
// this should be separate field_shape (runtime info, copied to args)
// and static element_shape (compile time info, hard-coded in shader)
std::vector<int> shape;
}
The text was updated successfully, but these errors were encountered:
k-ye
changed the title
element_shape should be treated at compile time information for ndarrays
element_shape should be treated as compile time information for Ndarrays
Dec 29, 2021
element_shape is not used in compiled results simply because we use ExternalPtrStmt for quick implementation at its birth. We may design a new statement in CHI IR if we want to change the way to access external pointer elements.
Now they're treated at runtime information.
Repro script:
Note that although we compiled twice for the above program, the generated opengl kernels are indeed identical (shown below). In other words, we ought to have used element_shape as compile time information but we didn't.
This can also be verified in the code snippet below that
_args_i32_[16 + 0 * 8 + 1]
(element_shape=10 or 11), which should have been a constant, but now it's loaded as a runtime variable.Side note: the reason we recompile for different element_shape now is that we force that to happen in python. https://github.com/taichi-dev/taichi/blob/master/python/taichi/lang/kernel_impl.py#L271-L289 Unfortunately we only enforce recompile but didn't make use of the compile time information at all.
A potential fix is https://github.com/taichi-dev/taichi/blob/master/taichi/program/ndarray.h#L30,
The text was updated successfully, but these errors were encountered: