-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opengl] Optimize range_for for ndarrays #3884
Conversation
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. [ghstack-poisoned]
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. ghstack-source-id: ae80e9c8d92ee131a37f54bff10a710dd21e79de Pull Request resolved: #3884
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. [ghstack-poisoned]
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. ghstack-source-id: bdb9feba35fa8cd6cc99b826679dde353d706e07 Pull Request resolved: #3884
taichi/ir/statements.h
Outdated
@@ -756,7 +762,8 @@ class RangeForStmt : public Stmt { | |||
bit_vectorize, | |||
num_cpu_threads, | |||
block_dim, | |||
strictly_serialized); | |||
strictly_serialized, | |||
range_of_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter name seems slightly confusing?
@@ -85,19 +85,35 @@ class Offloader { | |||
} else { | |||
offloaded->block_dim = s->block_dim; | |||
} | |||
if (auto val = s->begin->cast<ConstStmt>()) { | |||
|
|||
// TODO: We need to update codegen for each backend gradually so let's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we merge this in let's also notify people working on the vulkan ndarray this change
@@ -202,7 +202,7 @@ def init(d: ti.i32, density1: ti.any_arr(), density2: ti.any_arr(), | |||
|
|||
@ti.test(arch=ti.opengl) | |||
def test_opengl_exceed_max_ssbo(): | |||
# 7 ndarrays + gtmp + args > 8 (maximum allowed) | |||
# 8 ndarrays + args > 8 (maximum allowed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want this test? This seems a bit arch specific & a lot of devices support more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this test was mainly a self reminder to double check how many ssbos we create in normal cases - we can remove it later :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a few minor nitpicks
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. [ghstack-poisoned]
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. [ghstack-poisoned]
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. ghstack-source-id: 3e2c2827bf752d6397cfc7780f2a79f9f2853a3d Pull Request resolved: #3884
Note this PR reduces number of generated shaders for mpm88 from 10 to 6. There's still one gtmp related shader remaining for temporary values used across multiple shaders. Whether to further remove that shader need more benchmark so let's just get rid of addtional shaders introduced by ndarray range_for for now. ghstack-source-id: 3e2c2827bf752d6397cfc7780f2a79f9f2853a3d Pull Request resolved: taichi-dev#3884
Stack from ghstack:
Note this PR reduces number of generated shaders for mpm88 from 10 to 6.
There's still one gtmp related shader remaining for temporary values
used across multiple shaders. Whether to further remove that shader need
more benchmark so let's just get rid of addtional shaders introduced by
ndarray range_for for now.