Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are matrices 2-d arrays instead of 1-d? #81

Closed
moon-chilled opened this issue Apr 4, 2019 · 9 comments
Closed

Why are matrices 2-d arrays instead of 1-d? #81

moon-chilled opened this issue Apr 4, 2019 · 9 comments

Comments

@moon-chilled
Copy link

Hi, this is not an issue so much as just a question I have. Why are matrices represented as an array[m][n] instead of a flat array[m*n]? It seems like doing it the second way would be better for performance since it takes up less memory and doesn't require as many dereferences to access, and you can still trivially get individual vectors out of flat-array-represented matrices.

@recp
Copy link
Owner

recp commented Apr 4, 2019

Hi @Elronnd

Thank you for your feedbacks/contributions

Representing matrices as [m][n] makes things easier to manage; you can access column vectors as matrix[0], matrix[1], matrix[2] and matrix[3].

It seems like doing it the second way would be better for performance

this is not true. Why do you think it would give better performance?

since it takes up less memory and doesn't require as many dereferences to access

matrices are two dimensional arrays but still continuous arrays (check type definitions: https://github.com/recp/cglm/blob/master/include/cglm/types.h). Two dimensional arrays do not take more memory, why do you think that? Also why do you think flat arrays don't require "many dereferences to access"? What is the difference to compiler to use flat array or two-dimensional array? Probably compiler will generate similar (probably same) assembly or cpu instructions... Am I wrong?

To multiply two matrices cglm do not access matrix elements randomly, it loads columns to SSE/AVX registers then shuffles them if required.

I see no benefit to use flat arrays for now, but if you convince us then things may change 🤔

if you can make matrix flat array then we can compare results, for instance for glm_mat_mul()

@moon-chilled
Copy link
Author

I thought -- for instance, mat3 is vec3[3] which is float[3][3]. Which is like float**, so if I want to get a mat[1][2], that's like (*(mat+1))[2] which is *(*(mat + 1) + 2), which is two dereferences, as opposed to *(mat + 1*3 + 2) which does include an extra mul, but only one dereference.

@recp
Copy link
Owner

recp commented Apr 4, 2019

Thank you very much for explanation,

typedef float vec3[3];
typedef vec3  mat3[3];
typedef float mat3_flat[9];

float
get1(mat3 mat, int m, int n) {
  return mat[m][n];
}

float
get2(mat3_flat mat, int m, int n) {
  return mat[m * 3 + n];
}

float
get3(mat3 mat, int m, int n) {
  return ((float *)mat)[m * 3 + n];
}

float
get1_literal(mat3 mat, int m, int n) {
  return mat[1][2];
}

float
get2_literal(mat3_flat mat) {
  return mat[1 * 3 + 2];
}

Output (https://godbolt.org/z/YXn4Ms):

get1:
        movsx   rsi, esi
        movsx   rdx, edx
        lea     rax, [rsi+rsi*2]
        lea     rax, [rdi+rax*4]
        movss   xmm0, DWORD PTR [rax+rdx*4]
        ret
get2:
        lea     eax, [rsi+rsi*2]
        add     eax, edx
        cdqe
        movss   xmm0, DWORD PTR [rdi+rax*4]
        ret
get3:
        lea     eax, [rsi+rsi*2]
        add     eax, edx
        cdqe
        movss   xmm0, DWORD PTR [rdi+rax*4]
        ret
get1_literal:
        movss   xmm0, DWORD PTR [rdi+20]
        ret
get2_literal:
        movss   xmm0, DWORD PTR [rdi+20]
        ret

I got same result if m and n are known at compile time. But got different result if they unkown at compile time (which makes your right)

On the other hand I got same result for get2 and get3

So maybe instead of changing all design, we could provide a macro (or inline function) like, to access matrix elements at runtime:

#define GLM_MAT3_GET(mat, m, n) (((float *)mat)[m * 3 + n])

What are your suggestions?

@moon-chilled
Copy link
Author

moon-chilled commented Apr 4, 2019

So maybe instead of changing all design, we could provide a macro (or inline function) like, to access matrix elements at runtime:

#define GLM_MAT3_GET(mat, m, n) (((float *)mat)[m * 3 + n])

That sounds good.

Although, now that I think about it further, I'm not sure if it's such a good idea. Disassembly is good but I should probably do an actual benchmark. Also, I'm pretty sure that that relies on undefined behaviour--I don't think that multidimensional arrays are guaranteed to be laid out in memory like that.

EDIT: that definitely wouldn't work. Since mat is a float[][] (or float**, depending on how you look at it), if you dereference it, you get a float*, not a float.

@recp
Copy link
Owner

recp commented Apr 4, 2019

I should probably do an actual benchmark.

That would be really perfect!

Also, I'm pretty sure that that relies on undefined behaviour--I don't think that multidimensional arrays are guaranteed to be laid out in memory like that.

Okay you are right, let's ignore GLM_MAT3_GET then.

EDIT:

-I don't think that multidimensional arrays are guaranteed to be laid out in memory like that.

Added to TODOs, I should do some research for this.

@recp recp mentioned this issue May 2, 2019
@Akaricchi
Copy link
Contributor

You're both very confused about how arrays in C work.

A two-dimensional array is not the same as an array of pointers! The reason why you seemingly get a pointer from dereferencing a 2D array once is because you actually get an array, but in C arrays decays into pointers to their first element in most context. Here's an attempt at an explanation: https://godbolt.org/z/W_R-b9

Also, I'm pretty sure that that relies on undefined behaviour--I don't think that multidimensional arrays are guaranteed to be laid out in memory like that.

They are guaranteed to be contiguous. Any array of objects is contiguous in memory. A 2D array is just an array of arrays, and is therefore also contiguous. There is no special exception here, and this is not undefined behavior.

that definitely wouldn't work. Since mat is a float[][] (or float**, depending on how you look at it), if you dereference it, you get a float*, not a float.

This is wrong. Dereferencing a float[][]does NOT give you a float*, it gives you a float[]. Just like dereferencing a float[] gives you a float. The float[] is only implicitly converted into a float* (a pointer to the 0th element in the array) when you actually try to use it for almost anything, which is where the confusion comes from. Non-evaluating contexts like sizeof are immune to this, see https://godbolt.org/z/_metqv

@recp
Copy link
Owner

recp commented Mar 6, 2020

@Akaricchi thanks for your comments, I'll check godbolt results later.

They are guaranteed to be contiguous. Any array of objects is contiguous in memory. A 2D array is just an array of arrays, and is therefore also contiguous. There is no special exception here, and this is not undefined behavior.

One thing here; in cglm mat4 looks like below (see cglm/types.h):

typedef CGLM_ALIGN_IF(16) float vec4[4];
typedef CGLM_ALIGN_MAT    vec4  mat4[4];

since vec4 is aligned and it is base element of mat4, I'm not sure it is contiguous 100% percent. This makes me confused a bit. There was alignment on vec3 and mat3 types but we removed the alignment for them (probably because of this).

If 2d array is not contiguous at all, then we should update the statements in the docs (https://cglm.readthedocs.io/en/latest/opengl.html), because it suggests to pass cglm's types to OpenGL directly.

What do you think about the design of cglm? (1-d array vs 2d?) 2d array makes things easier to manage I think (for me)


PS: Users can disable alignment of course, by defining CGLM_ALL_UNALIGNED macro before cglm header[s].

@Akaricchi
Copy link
Contributor

since vec4 is aligned and it is base element of mat4, I'm not sure it is contiguous 100% percent. This makes me confused a bit. There was alignment on vec3 and mat3 types but we removed the alignment for them (probably because of this).

Alignment can only introduce padding when the size isn't an integer multiple of the alignment. sizeof(float[4]) is 16, which is, of course, an integer multiple of 16. So there's no issue here.

vec3/mat3 is a curious case, as an alignment of 16 would exceed the array's size. First of all GCC wouldn't even let such code compile:

<source>:14:1: error: alignment of array elements is greater than element size
   14 | typedef vec3_aligned mat3_aligned[3];
      | ^~~~~~~

Clang does compile it, but does not actually add any padding to the inner vec3 arrays. It pads the mat3 array however. But the floats are still contiguous. I'm not sure whether this is conformant behavior; I wouldn't rely on it. https://godbolt.org/z/634GDG

If 2d array is not contiguous at all, then we should update the statements in the docs (https://cglm.readthedocs.io/en/latest/opengl.html), because it suggests to pass cglm's types to OpenGL directly.

They are contiguous. I use cglm with OpenGL myself, and it's working fine.

What do you think about the design of cglm? (1-d array vs 2d?) 2d array makes things easier to manage I think (for me)

I think 2D arrays are perfectly fine.

If you need alignment for vec3/mat3 types for whatever reason, you need to ask yourself what exactly do you want to align there - the whole matrix or each individual vector in it? If it's the later, that's not OpenGL-compatible. If it's the former, that is what clang seems to do, but it's not reliable - probably better to resort to a 1D array in this case.

@recp
Copy link
Owner

recp commented Mar 6, 2020

I think 2D arrays are perfectly fine.

Awesome 🎉

If you need alignment for vec3/mat3 types for whatever reason

Currently we don't.

simd/sse2/mat3.h is already used SIMD with _mm_loadu_ps which is for unaligned location.

IIRC, I was used alignment for glmm_load3(), glmm_store3() to load vec3 fast enough to SIMD register. There was an another option to load without alignment if I remember correctly... My goal is to try SIMD on vec3 operations to compare speed between scalar and SIMD. Currently glmm_load3() and glmm_store3() are not used in vec3 operations internally.

It seems using 2d array and alignment configuration seem good for now. Nothing to change.

PS: #124 introduces new way to multiply 3 or 4 general and special matrices which probably will make things faster and more readable. Any feedback is also welcome for that too (in there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants