Possible future plans for glam post 0.13 #159
Replies: 1 comment
-
This didn't really pan out. It was generally slower than sticking with scalar math. I think in particular with types that aren't the same stride as the alignment writing back to the scalar values has a large performance penalty. So I'm abandoning this idea. |
Beta Was this translation helpful? Give feedback.
-
glam 1.13.0 was a significant release in that it added support for different primitive types, e.g.
f32
,f64
,i32
&u32
and behind the scenes it was heavily refactored to support a lot of different types, plus SIMD implementations for some types all while not having a lot of code duplication. The internals did get a lot more complex as a result, but this does offer some benefits as well. In general the library has been split into user facing types which prevent a simple interface. Internally these types either use scalar or SIMD storage types. Then there is acore
module which provides implementation against the different types of storage types used. This is a bit analogous to DirectXMath which is a low level library and libraries like DXTK's SimpleMath which is built on top of DirectXMath. The user facing types use a lot of macros for code generation of the different primitive implementations, e.g.Vec3
,DVec3
,IVec3
&UVec3
.There are a few things I am thinking of trying that I think the internal changes will make more possible than before.
At the moment we have some types that use SIMD for storage and implementation if available, notably
Vec3A
,Vec4
,Quat
andMat4
. The other types which use scalars for storage generally do not have any SIMD optimised implementation. This is a bit of a missed opportunity. Because thecore
module operates on a storage type, it should be relatively straight forward to convert say aMat3
to[__m128;3]
or similar and perform SIMD optimised operations on that instead of the scalar values. Depending on the operation, this could be faster than the scalar implementation, each method would need to be measured.Taking this idea further, I think it could be worth considering storing
Vec4
,Quat
, andMat4
as normal floats, i.e. not SIMD. The main reason being that the 16 byte alignment might be a problem for real world projects. For example, the Bevy transform is currentlyVec3
,Quat
,Vec3
for scale, rotation and translation. BecauseQuat
is 16 byte aligned the Bevy transform is also 16 byte aligned and thus contains 8 bytes of padding. This may not be optimal, the main problem is Bevy don't have any control over this other than disabling SIMD entirely in glam. As an alternative I am thinking store these types as scalar and then load into SIMD as required. I think it will be slower than the current SIMD implementation but faster than scalar code. Additionally it should be relatively easy to provideVec4A
,QuatA
,Mat4A
etc which are still 16 byte aligned for the best performance (assuming they are actually faster). The macros used to generate the other types can help here as well.Those are the main things I'm thinking about trying next with glam.
Separate to that, I'm still not entirely happy with the internals of glam. It is quite complicated now, this causes problems for debugging, docs (rustdoc src view is no longer useful) and contributing. However I haven't thought of a less complicated implementation that doesn't involve a lot of code duplication (e.g. copy and paste). For example, I could just use generics internally, but specialization which I need for the way SIMD is used in glam isn't really sufficient in Rust. I think I've restructured glam 3 times now, there might be a 4th if I can come up with a way to simplify what I have without duplicating a lot of code.
Beta Was this translation helpful? Give feedback.
All reactions