Possible future plans for glam post 0.13 #159

bitshifter · 2021-04-07T10:14:46Z

bitshifter
Apr 7, 2021
Maintainer

glam 1.13.0 was a significant release in that it added support for different primitive types, e.g. f32, f64, i32 & u32 and behind the scenes it was heavily refactored to support a lot of different types, plus SIMD implementations for some types all while not having a lot of code duplication. The internals did get a lot more complex as a result, but this does offer some benefits as well. In general the library has been split into user facing types which prevent a simple interface. Internally these types either use scalar or SIMD storage types. Then there is a core module which provides implementation against the different types of storage types used. This is a bit analogous to DirectXMath which is a low level library and libraries like DXTK's SimpleMath which is built on top of DirectXMath. The user facing types use a lot of macros for code generation of the different primitive implementations, e.g. Vec3, DVec3, IVec3 & UVec3.

There are a few things I am thinking of trying that I think the internal changes will make more possible than before.

At the moment we have some types that use SIMD for storage and implementation if available, notably Vec3A, Vec4, Quat and Mat4. The other types which use scalars for storage generally do not have any SIMD optimised implementation. This is a bit of a missed opportunity. Because the core module operates on a storage type, it should be relatively straight forward to convert say a Mat3 to [__m128;3] or similar and perform SIMD optimised operations on that instead of the scalar values. Depending on the operation, this could be faster than the scalar implementation, each method would need to be measured.

Taking this idea further, I think it could be worth considering storing Vec4, Quat, and Mat4 as normal floats, i.e. not SIMD. The main reason being that the 16 byte alignment might be a problem for real world projects. For example, the Bevy transform is currently Vec3, Quat, Vec3 for scale, rotation and translation. Because Quat is 16 byte aligned the Bevy transform is also 16 byte aligned and thus contains 8 bytes of padding. This may not be optimal, the main problem is Bevy don't have any control over this other than disabling SIMD entirely in glam. As an alternative I am thinking store these types as scalar and then load into SIMD as required. I think it will be slower than the current SIMD implementation but faster than scalar code. Additionally it should be relatively easy to provide Vec4A, QuatA, Mat4A etc which are still 16 byte aligned for the best performance (assuming they are actually faster). The macros used to generate the other types can help here as well.

Those are the main things I'm thinking about trying next with glam.

Separate to that, I'm still not entirely happy with the internals of glam. It is quite complicated now, this causes problems for debugging, docs (rustdoc src view is no longer useful) and contributing. However I haven't thought of a less complicated implementation that doesn't involve a lot of code duplication (e.g. copy and paste). For example, I could just use generics internally, but specialization which I need for the way SIMD is used in glam isn't really sufficient in Rust. I think I've restructured glam 3 times now, there might be a 4th if I can come up with a way to simplify what I have without duplicating a lot of code.

bitshifter · 2021-05-19T22:05:01Z

bitshifter
May 19, 2021
Maintainer Author

At the moment we have some types that use SIMD for storage and implementation if available, notably Vec3A, Vec4, Quat and Mat4. The other types which use scalars for storage generally do not have any SIMD optimised implementation. This is a bit of a missed opportunity. Because the core module operates on a storage type, it should be relatively straight forward to convert say a Mat3 to [__m128;3] or similar and perform SIMD optimised operations on that instead of the scalar values. Depending on the operation, this could be faster than the scalar implementation, each method would need to be measured.

This didn't really pan out. It was generally slower than sticking with scalar math. I think in particular with types that aren't the same stride as the alignment writing back to the scalar values has a large performance penalty. So I'm abandoning this idea.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible future plans for glam post 0.13 #159

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Possible future plans for glam post 0.13 #159

bitshifter Apr 7, 2021 Maintainer

Replies: 1 comment

bitshifter May 19, 2021 Maintainer Author

bitshifter
Apr 7, 2021
Maintainer

bitshifter
May 19, 2021
Maintainer Author