remove extra word from object header (better approach to alignment) #10898

JeffBezanson · 2015-04-19T16:16:52Z

Currently every object has an extra word (added by 0d8cec3) to make the data area 16-byte aligned. This is not ok. The extra word should be removed, and alignment should instead be done by offsetting the first object in a page, and putting extra space between objects as necessary.

I believe the alignment rules should be (sizes not including tag):

64-bit platforms: objects < 16 bytes aligned 8, all others aligned 16
32-bit platforms: objects < 8 bytes aligned 4, otherwise same as 64-bit

cc @vtjnash @carnaval

Clarification: this issue asks that we fix the win64 issue without adding a word to every object. Implementing the above alignment scheme can be done if convenient, but is not itself required to close this.

The text was updated successfully, but these errors were encountered:

pao · 2015-04-19T16:57:43Z

If this plays into the C interop of structures, then there's an exception for x86 Linux--doubles are only aligned 4 by default on that platform & architecture.

If you need to reliably match native alignment, jl_native_alignment() accesses the appropriate LLVM API.

vtjnash · 2015-04-19T18:25:51Z

the C-interop story here would be based on the semantics of malloc, which are as Jeff described above.

vtjnash · 2015-04-19T18:26:34Z

i disagree that this is priority or v0.4 target. v0.4.x seems more reasonable to me

JeffBezanson · 2015-04-19T18:30:01Z

This has to be fixed immediately. You can't just add an extra word to every object and then shrug and say we don't have time to fix it.

JeffBezanson · 2015-04-19T18:38:07Z

It's particularly egregious that boxed Float64s and Int64s (etc.) are now 50% bigger for no reason, because they don't end up 16 aligned anyway, and they don't need to be. This is a major performance regression that affects a large amount of code, to fix a relatively narrow issue.

carnaval · 2015-04-19T18:42:01Z

So all those ABI considerations are still pretty vague for me. Given 64bit arch, and the fact that type tags and data have to be contiguous, we in fact are forced to waste 8 bytes per 16 bytes object to satisfy those constraints right ?
But this is only needed for vector types load/stores and setjmp right ?

carnaval · 2015-04-19T18:43:03Z

(I meant 16 bytes objects as 16 bytes without the tag)

JeffBezanson · 2015-04-19T18:48:40Z

Yes, we would still waste 8 bytes for some objects, but that's a lot better than wasting 8 bytes for all objects.

It would be fine with me to fix this as narrowly as possible, and only ensure alignment for jmp_buf where it is needed. We can leave vector type alignment for another day.

tkelman · 2015-04-19T18:49:24Z

this probably should've been fixed before #10579 (comment) was merged at all

carnaval · 2015-04-19T19:54:21Z

So for example, on my linux/x64 box, a jmp_buf is 200 bytes, which bring a jl_task_t to 320 bytes (including tag). 320 is both a multiple of 16 and (conveniently) an available pool size, so even if we apparently don't need it on this ABI, we actually are guaranteed to have task->ctx be 16 bytes aligned (since its offset inside the struct is 80 bytes).
(all of this is without the current "fat tag" hack if I'm not mistaken).
If we could just arrange so that the same situation arises on windows (by tweaking pool sizes and task_t padding, maybe reordering the fields) aren't we done ? At least until we wan't SIMD aligned loads and stores.

vtjnash · 2015-04-19T20:02:29Z

the compiler will not make that easy, since it is already guaranteeing that jmp_buf is at a 16-byte offset from the start of the struct

carnaval · 2015-04-19T20:11:46Z

Oh. Now I get it. So with careful packing pragma and manual padding this should be doable right ?

ScottPJones · 2015-04-24T23:20:05Z

I would also agree with @JeffBezanson that this should be fixed for 0.4.0, not some 0.4.x.
Memory usage has a huge effect on performance, esp. when you are dealing with lots and lots of processes on a machine...

vtjnash · 2015-04-25T02:55:29Z

Implementing the above alignment scheme can be done if convenient, but is not itself required to close this.

as it turns out, this was required by one of the dsp tests. i'm just waiting for travis to greenlight the i686 code to merge this.

…her than every object (fix #10898)

…her than every object (fix JuliaLang#10898)

ScottPJones · 2015-06-19T01:41:59Z

I've been having some crazy ideas, about alignment and allocation in Julia... may be all wet...
Say the box takes 8 bytes, and you have 8 bytes for either a pointer or data value. Those only need 8-byte alignment. But if you have an UInt128/Int128, (i.e. like ByteVec), it is better if you can keep that part always aligned. If you have a structure that is <= 56 bytes, it would be good to try to ensure that it is always in the same cache line as its box.
You could intertwine pools, to get best alignment, without wasting space for example, for one cache line:
[box] [box] [16-bytes aligned] [box] [box] [16-bytes aligned]
[box] [16-bytes unaligned] [box] [32-bytes, 32-byte aligned]
[box][box] [48-bytes 16-byte aligned]
etc.
It doesn't seem like julia currently is very cache line aware... but I very well may be mistaken...

vtjnash · 2015-06-19T01:58:20Z

what is a box? why does your example not seem to have the data in the box?

modern allocators have generally found that it is more efficient to segregate allocations by size. this wastes some space on odd-sized allocations, but is generally much faster at allocation and walking the pool (since it is constant size). and it saves a byte on each allocation to store the size of the subsequent data field.

if you've hit the memory allocator, you've already missed the fast-path of staying entirely in registers / on the stack with an extra couple function calls, data copies.

ScottPJones · 2015-06-19T02:45:19Z

OK, I'm still learning about julia's allocator, but it looked like for many things like strings at least, there was 8 bytes of "box"ing, and then either a value or 8-byte pointer.

With my idea for intertwined pools, depending on how they are allocated, the pool still looks like it is a constant size, it simply has an offset to the next one larger than the element size.
In the above example, you'd have aligned 16-byte values, at a 64-byte cache-line offset + 48, with the tag information at offset 40, and the next aligned one in the pool would be in the next cache line.
There is no extra "byte" needed.

Staying in registers/stack is great, for the objects you are currently working on, yes... but if you've got lots of 128 bit or 256 bit fields or even 512 bit fields, you really want those to be optimally aligned
(i.e. for AVX, AVX-256, AVX-512 instructions).

JeffBezanson added performance Must go faster priority This should be addressed urgently GC Garbage collector labels Apr 19, 2015

JeffBezanson added this to the 0.4.0 milestone Apr 19, 2015

JeffBezanson changed the title ~~better approach to object alignment~~ remove extra word from object header (better approach to alignment) Apr 19, 2015

JeffBezanson mentioned this issue Apr 22, 2015

Poor performance of readdlm on master #10428

Closed

simonster mentioned this issue Apr 23, 2015

Regression: Higher order function memory usage #10954

Closed

vtjnash closed this as completed in 8b8b261 Apr 25, 2015

vtjnash added a commit that referenced this issue Apr 27, 2015

move the 16-byte realignment offset to the start of the type pool rat…

461f440

…her than every object (fix #10898)

mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015

move the 16-byte realignment offset to the start of the type pool rat…

ef7ef77

…her than every object (fix JuliaLang#10898)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove extra word from object header (better approach to alignment) #10898

remove extra word from object header (better approach to alignment) #10898

JeffBezanson commented Apr 19, 2015

pao commented Apr 19, 2015

vtjnash commented Apr 19, 2015

vtjnash commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

carnaval commented Apr 19, 2015

carnaval commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

tkelman commented Apr 19, 2015

carnaval commented Apr 19, 2015

vtjnash commented Apr 19, 2015

carnaval commented Apr 19, 2015

ScottPJones commented Apr 24, 2015

vtjnash commented Apr 25, 2015

ScottPJones commented Jun 19, 2015

vtjnash commented Jun 19, 2015

ScottPJones commented Jun 19, 2015

remove extra word from object header (better approach to alignment) #10898

remove extra word from object header (better approach to alignment) #10898

Comments

JeffBezanson commented Apr 19, 2015

pao commented Apr 19, 2015

vtjnash commented Apr 19, 2015

vtjnash commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

carnaval commented Apr 19, 2015

carnaval commented Apr 19, 2015

JeffBezanson commented Apr 19, 2015

tkelman commented Apr 19, 2015

carnaval commented Apr 19, 2015

vtjnash commented Apr 19, 2015

carnaval commented Apr 19, 2015

ScottPJones commented Apr 24, 2015

vtjnash commented Apr 25, 2015

ScottPJones commented Jun 19, 2015

vtjnash commented Jun 19, 2015

ScottPJones commented Jun 19, 2015