Matrix indexing #534
Replies: 1 comment
-
FWIW we are aware of the perf limitations for metamethods. It's of course difficult/impossible to beat table indexing, after all if you could have a general mechanism that's as fast as table indexing it would mean that table indexing doesn't run as fast as it can ;) That said, there's generally three main bottlenecks in the metamethod dispatch:
We sort of have long term plans for these: For 1, we plan to look into creating special per-table accelerated metamethod caches that use indexes (TM_X). The issue here is that it's not fully clear when to construct this cache; ideally we'd like this mechanism to also apply to script-based metatables but that means we need to create the cache during For 2, we plan to look into a possibility that C functions can be called without an extra frame, maybe only as part of metatable dispatch. The issue here is that it effectively removes the call frame from the call stack, which affects error messages (more minor) and some Lua APIs (more major) such as access to C upvalues. The latter is particularly problematic since Roblox uses these :D For 3, there's a general question of faster bindings / faster stack access that we plan to explore, with various possibilities (eg maybe a batch call where you fetch all arguments at once, that uses some precomputed binding table ahead of time that maps arguments to expected stack locations and types... if we had JIT we could synthesize a hyper efficient thunk for each, but we'd need some solution for the interpreter as well). 2 also interacts with a different R&D path for us to explore which is a different design for all call frames; we may need to explore this first, because it may affect what and how we do for 2. None of these are fully ready to be developed yet for various reasons, but I wanted to acknowledge that there's still some performance advances to be made on this front and note that we sort of have these on the roadmap, just not in a very concrete form. |
Beta Was this translation helpful? Give feedback.
-
As a quick weekend project I tried different ways to implement 4x4 matrices efficiently. The best options seem to be:
Tables with fields x, y, z, w containing 4-wide vectors
Tagged userdata with metatable for indexing with x, y, z, w
All matrix operations (transpose, multiplication, inversion etc.) are implemented in C++ (typically 2-5x faster than Lua versions).
Some benchmark results:
Everything is in favor of tagged userdata except indexing. I briefly profiled the code and the bottleneck is calling the C++ indexing metamethod. I verified that the VM fast path for userdata indexing is triggering. I also tried both atom and non-atom versions of index metamethod but there was no significant difference.
One way to speed up indexing is to use direct function calls, e.g.
get_x(mat)
instead ofmat.x
. Indexing benchmark result improved from 170ms to 93ms with this change. This bypasses the metamethod lookup but it's inconvenient to use function calls everywhere for indexing....I wonder if there's anything we could do to improve userdata indexing in general. I guess some fastcall like mechanism for C functions in general (or just metamethods for userdata?) would be what I'm hoping for.
Anyways, I wanted to share my findings in case it could be useful when tuning Luau VM performance.
Beta Was this translation helpful? Give feedback.
All reactions