Matrix indexing #534

petrihakkinen · 2022-06-11T19:28:01Z

petrihakkinen
Jun 11, 2022

As a quick weekend project I tried different ways to implement 4x4 matrices efficiently. The best options seem to be:

Tables with fields x, y, z, w containing 4-wide vectors
Tagged userdata with metatable for indexing with x, y, z, w

All matrix operations (transpose, multiplication, inversion etc.) are implemented in C++ (typically 2-5x faster than Lua versions).

Some benchmark results:

Benchmark                   Tables  Userdata  Ratio

Create new matrix           364     104       ~3.4x 
Transpose                   492     113       ~4.4x
Multiply                    633     142       ~4.5x
Index                       23      170       ~0.14 (!)

Everything is in favor of tagged userdata except indexing. I briefly profiled the code and the bottleneck is calling the C++ indexing metamethod. I verified that the VM fast path for userdata indexing is triggering. I also tried both atom and non-atom versions of index metamethod but there was no significant difference.

One way to speed up indexing is to use direct function calls, e.g. get_x(mat) instead of mat.x. Indexing benchmark result improved from 170ms to 93ms with this change. This bypasses the metamethod lookup but it's inconvenient to use function calls everywhere for indexing....

I wonder if there's anything we could do to improve userdata indexing in general. I guess some fastcall like mechanism for C functions in general (or just metamethods for userdata?) would be what I'm hoping for.

Anyways, I wanted to share my findings in case it could be useful when tuning Luau VM performance.

zeux · 2022-06-17T18:38:18Z

zeux
Jun 17, 2022
Maintainer

FWIW we are aware of the perf limitations for metamethods. It's of course difficult/impossible to beat table indexing, after all if you could have a general mechanism that's as fast as table indexing it would mean that table indexing doesn't run as fast as it can ;) That said, there's generally three main bottlenecks in the metamethod dispatch:

metamethod lookup, which is a full hash lookup (that's not currently using inline caching either)
C function dispatch, which mostly needs to set up a C call frame
C API access from the metamethod body

We sort of have long term plans for these:

For 1, we plan to look into creating special per-table accelerated metamethod caches that use indexes (TM_X). The issue here is that it's not fully clear when to construct this cache; ideally we'd like this mechanism to also apply to script-based metatables but that means we need to create the cache during setmetatable call, which makes setmetatable less efficient when called with freshly constructed tables as metatables. We might do this only for host metatables because of this, or only for frozen metatables - we'll see.

For 2, we plan to look into a possibility that C functions can be called without an extra frame, maybe only as part of metatable dispatch. The issue here is that it effectively removes the call frame from the call stack, which affects error messages (more minor) and some Lua APIs (more major) such as access to C upvalues. The latter is particularly problematic since Roblox uses these :D

For 3, there's a general question of faster bindings / faster stack access that we plan to explore, with various possibilities (eg maybe a batch call where you fetch all arguments at once, that uses some precomputed binding table ahead of time that maps arguments to expected stack locations and types... if we had JIT we could synthesize a hyper efficient thunk for each, but we'd need some solution for the interpreter as well).

2 also interacts with a different R&D path for us to explore which is a different design for all call frames; we may need to explore this first, because it may affect what and how we do for 2.

None of these are fully ready to be developed yet for various reasons, but I wanted to acknowledge that there's still some performance advances to be made on this front and note that we sort of have these on the roadmap, just not in a very concrete form.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix indexing #534

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Matrix indexing #534

petrihakkinen Jun 11, 2022

Replies: 1 comment

zeux Jun 17, 2022 Maintainer

petrihakkinen
Jun 11, 2022

zeux
Jun 17, 2022
Maintainer