-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Support nested anonymous struct and unions. #985
Comments
How would you access that struct though without adding confusion to the callsite? The following accomplishes what you are looking for and more? const ARGBColor = packed union {
bytes: u32,
argb: packed struct {
a: u8,
r: u8,
g: u8,
b: u8,
},
}; |
In C the parent struct/union basically adopts the members of the anonymous child struct/union. https://stackoverflow.com/questions/8932707/what-are-anonymous-structs-and-unions-useful-for-in-c11 |
Yes, but my point was that it adds obfuscation to the call site (clarified original comment, a little); when you call Like the following call; |
The idea is c.argb.a doesn't add any new information. In this case I care about the elements of the color, and need to pass the entire color to say gl. I don't think this code is hard to understand or that naming the struct makes a difference.
|
Can someone give an example that is not solved by Zig's other features?
Ofc, you would probably have As for another example from the stack overflow linked earlier:
Here, you should really use tagged unions in Zig:
Before we can consider nested anonymous struct/unions we need to know the exact use-case. Arguing over example code that is solved by Zig's other features is not very productive. Link some Zig code that could be improved by this feature, or links some C code that is not easily mapped to Zig because of nested struct/unions. We want real world examples not "made up on the spot to get a feature into a language" examples. |
Recently I used it for color and vector representations. Zig solves tagged unions nicely but this is for packed structs. I understand this is a qol enhancement, but casting, bit shifting and functions to do something trivial like accessing the bytes in the color value seems like a bad solution. Here is a better example (typing code on a phone sucks). Here the two example color formatea both work with the foo function without having to know what order the format is in.
And yes the component struct could have a common name like bits or something but I don't find this ads more information just more typing since in most cases your manipulating the components and passing an array of them off to be rendered. |
There are two glaring issues with the example code you gave; the first one being that the order is different therefore the value of bytes will be different, yet you are passing it into the same function which presumably wants them in the same order? I just can't think of a function that wouldn't care about the order of those 4 components, without you telling it the specific order?? That is "both work with the foo function without having to know what order the format is in." Makes no sense to me, its a colour; it matters what order they are in if you are setting a standard and I don't see how Regardless if we look pass this and pretend lets say that the order doesn't matter you could also just write; const ARGB = packed struct {
a: u8, r:u8, g:u8, b:u8
pub fn bytes(self: &this) u32 {
return @ptrCast(&u8, self)[0..4];
}
};
const RGBA = packed struct {
r: u8, g:u8, b: u8, a:u8
pub fn bytes(self: &this) u32 {
return @ptrCast(&u8, self)[0..4];
}
}
fn foo(c: var) void {
c.r *=2;
glFoobar(c.bytes());
}
Keeping in mind that I'm presuming that your code is correct! Maybe double check glFoobar, because from my knowledge as I've said I can't think of one that was order invariant :). |
Yes foo doesn't care about the byte order in this example, all it cared about was doubling the red color and passing it to a fake openGl function, that function would care about the order but that is not foos concern (foo assumes you passed the correct struct depending on the format of the openGl device. This is the beauty of generic code, ie why implement foo for each possible color format (that could include 24 bit colors too!). This is also assuming lots of interactions with C functions. Having to casting seems wrong and the function unnecessary in that case. Does the compiler ensure its inlined? if not then you wont use the function in critical paths of code, and will end up always casting, and code with lots of casting == bugs. Without this feature I would just name the nested struct, the union is to useful but is adding more to be remembered that doesn't add anything. here is another version of the color object(note: I am not familiar with aligning in zig yet so might be wrong). const ColorARGB= packed union align(4) {
bytes : u32,
packed struct {
a: u8,
packed union align(1) {
pure: u24,
packed struct {
r:u8,
g:u8,
b:u8
}
}
},
array: [4]u8,
}; Vectors are also common:
|
I suggest you run some C code that does what you want; you'll find that the bytes value is NOT the same (you may have to mark the structs as volatile else C may optimise the order, though I sincerely doubt it). This is talking about the ARGB and RGBA example of course. Your function doesn't 'care about order' as all it does is access Now onto some of your comments and your examples; I'll split it up so I can clarify a few things.
Maybe have a look at what casting actually is in code to understand; raw data has no type, so when you cast typically you are indicating to the compiler what type you want it to pretend it is. So even if you use a union you are still casting; it is just up to whether or not you see the cast in your code or if it is behind the scenes. Unnecessary enough to add an entire new construct? I would say not, I would say it is even extra necessary to indicate what is actually going on.
Different issue, not dependent on this; and yes it would/should.
Why wouldn't you? I think your overvaluing the benefit of inlining. A function call is pretty minimal as all things go. Don't understand how this amounts to lots of casting?? You're performing a single cast per 'bytes' call, the same as if you had a flattened nested struct. I'll cover the fact that casting is often not actually carried out in assembly/machine code a little further below :). As with your other examples I don't see any real benefit towards having them be non-nested? For example let's break down the vector one;
Of course in some cases casting will add more instructions for example when casting to a higher promoted type such as f128, and definately when you cast across bounds like floating to integer (though this isn't relevant when talking about ptr casting). So in some cases it'll add an instruction or two, but in reality those instructions would exist even if you just used a flattened nested struct. Basically this issue seems to be at a standstill; to convince me (not speaking for everyone of course, but I would presume @Hejsil based on his comments), I would have to see there be a significant difference in some areas;
Maybe instead we should add a small std module that helps you perform these casts safer such as |
Huh? If C unions dont keep the order, then unions are not safe to use. Maybe your talking about packing, with align all of these unions would work in C as intended. Also C/C++ dont rearrange the order of members. They can add padding between members based on alignment, but thats it. Without that you couldn't pass unions/structs around. So yes without align and packed the stuct of u8 may not align (though I dont know of any modern compiler that wouldnt pack that correctly), but anyone that is doing this kind of data packing optimization would understand this, or their code wouldnt work. Function calls are expensive in critical path code (16ms is not a lot of time to update 1000s of vectors and color arrays), saving registers to the stack, and you risk cache miss while calling the function. In C/C++ your example functions would all be macros or use a compiler that inlines your code or errors if it cannot. The purpose of the union is to not cast and not have the data modified at all. In the vector example you want that data to fit into a simd register and you want to avoid calling the unaligned asm, you also want to access each axis without bit shifting and casting all the time. So currently just naming the components is really the alternative, this ask is a QOL enhancement. Some examples from github projects: |
Fairly certain the functions in question will in fact be inlined. You can enforce this by marking them as
is functionally identical to
|
Overall; maybe the IRC is more of a place to talk about inling and non relevant points as stated below let's keep this about just nested unions. |
I haven't read this issue yet, I'll try to clear up the confusion when I do. But let's remember our community motto
@BraedonWooding No need to repeat yourself. Even if someone is nitpicking, don't accuse them of nitpicking. Just ignore it and move on. Meta-conversation is off topic. That means this comment (the one I am typing right now) is off topic and replying to this comment is off topic. |
Back to the topic at hand...
I do use something like this in C where you have a sort of fake inheritance
using anonymous structs. You have a base struct. Other structures embed
the base struct at the beginning of themselves. You then upcast pointers
from the base type to the actual type. It was a pattern I first saw in
Amiga programming more decades ago than I want to think about. Often you
had a member "base" at the beginning of each derived struct. However, it
become a little nicer and cleaner with anonymous structs.
```
struct base {
int type_id;
struct base *next;
};
struct derived {
struct base; /* anonymous */
int a_new_field;
...
};
```
Now if I have a variable of type `derived`, then it appears to have fields
`id` and `next`. If I change the definition of `base` it automatically
gets these.
I have not had much use for anonymous unions but where I have seen them
used was for some form of variant record:
```
struct {
int variant_type;
union {
struct variant_A a_var;
struct variant_B b_var;
struct variant_C c_var;
....
}; /* anonymous */
} variant_struct;
```
Obviously, this variant record type is better handled in Zig. However, the first example with
the anonymous structs is pretty handy. While enum structs in Zig cover
most of this, they are not possible to extend after the fact. The C
version is.
Grr. I replied via email and now the formatting is gone... Anyone know how to fix that?
|
@kyle-github Your first example is pretty neat, actually. I didn't know you could "inline" an existing struct into another in C. This C OOP is also a thing that is done in Zig code, though the pattern is a little different. Because Zig is allowed to rearrange fields of a none packed struct, casting from For
Also, on another note. Isn't unions that represent the data in two or more forms only really a thing because it avoids C strict aliasing bugs since casting between pointer types violates strict aliasing? |
Hi @Hejsil, yes the "OOP" method is heavily used in a lot of C. Often the "base" object will actually be the vtable plus some sort of type ID or something simple. I am not sure I am tracking your point about The point about extensibility is that using something like |
const Base = struct {
const Self = this;
type_id: u64,
next: ?&Self,
// Could use a vtable instead.
fooFn: fn(&Self) usize,
fn cast(self: &Self, comptime D: type) ?&D {
if (type_hash(D) != self.type_id)
return null;
return @fieldParentPtr(D, "base", self);
}
fn foo(self: &Self) usize {
return self.fooFn(self);
}
};
const Derived = struct {
const Self = this;
base: Base,
data: usize,
fn init(data: usize) Self {
return Self { .base = {.type_id=type_hash(Self), .next=null, .fooFn=foo }, .data=data };
}
fn foo(base: &Base) usize {
const self = @fieldParentPtr(Self, "base", base);
return self.data;
}
};
|
Struct embedding has a long history before Go made it popular. The origin seems to be the Plan 9 C compiler (GCC has it today, see https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html). If it is "standard enough" for a flavor of C to have it, it might be worthwhile to do in Zig. Then again, Zig has much better support for inheritance. |
Came here to write this very proposal. I've used it in both C and Go, and it has been very useful for code clarity, in my opinion. I hope it makes it in. 😄 |
The problem is that there is no way to specify which union tag to fill, and filling that in is well defined but feels like it defeats the goals of the initializer field checks. A couple examples: const X = extern struct {
union {
b: u32,
f: f32 = 1.0,
},
};
X{ }; // this is weird, we don't allow this in normal extern unions.
const Y = extern union {
struct {
a: u32,
b: f32 = 1.0,
}, struct {
c: f32 = 1.0,
d: u32,
},
};
Y{} // combines defaults from multiple union members, but they don't overlap. |
Small nit about the approved syntax, I find the lack of field name a bit disturbing since it breaks the
The following code is IMO more readable and less foreign-looking (less chances of being asked what is this stray const Foo = extern struct {
a: u32,
_: union {
b: u32,
_: struct {
c: u16,
d: f16,
},
e: f32,
},
f: u64,
};`` |
The case against the underscore syntax is that it makes the scope look like a type literal, which it is not. These are all compile errors: const T = union { ... };
const Foo = extern struct {
a: u32,
_: T, // reference to existing type, rejected in #5561 and #1214
}; const Foo = extern struct {
a: u32,
_: union(enum) { ... }, // doesn't create a type, this makes no sense
}; const Foo = extern struct {
a: u32,
_: union {
// cannot contain decls, this is not a type.
pub const Self = @This();
},
}; And this one compiles but is misleading: const Foo = extern union {
_: struct {
// @This() here return Foo, not the anonymous scope
x: [*:0]const u8 = @typeName(@This());
};
}; |
That's the idea, treat the RHS as we already do and parse it as a type literal. Forbidding it from having any declaration can be safely done during the semantic analysis phase with ease. |
I don't mean to imply that it's difficult, I just mean that it's a surprising limitation. Everywhere else in the language, when you have an expression of a certain type, you can replace that with a comptime known variable of that type. This feature doesn't work like that. The other benefit to doing this with a new syntax is that errors are caught in the parser, rather than in the compiler, so you will see these problems in code that is not semantically analyzed. |
By using
Meh, I'm all for letting the parser... Parse? And analyze the code somewhere else. |
Yes but in this case the syntax is unique. When you see the bare |
One thing to note, we may want to disallow an embedded struct/union if we are not moving from one to the other. So we may want to make the following a compile error: struct {
a: u32,
// it's nonsensical to embed an anonymous struct inside another struct
struct {
b: u32,
},
}
union {
a: u32,
// it's nonsensical to embed an anonymous union inside another union
union {
b: u32,
},
} |
That's the same trap you fall into with the proposed syntax, if I see
Beside my subjective opinion on the ugliness of this special case I think |
@marler8997 Er... but isn't that exactly what C does? It is a very common idiom in C to embed an anonymous struct within a struct as a form of "subclassing". Translating that into Zig means handling that. I see structs within structs all the time (and use it somewhat often myself). I cannot think of a time when I saw a union within a union. But unions within structs? All the time. Especially with embedded or memory-tight systems. My understanding from what @SpexGuy wrote above is that the main driver here is interop with C. Again, thank you @SpexGuy, @andrewrk and all for having such an open conversation. I learn so much from reading about the reasoning for these decisions! |
The new syntax conflicts with accepted #4335. |
@kyle-github...I see structs inside structs all the time as well (and unions within structs) but I don't recall ever seeing an anonymous struct inside another struct. I'm also not familiar with what you mean by "subclassing" here. Could you show me an example of where you've seen this and clarify what you mean by subclassing? Or at least create a made up example that shows how an anonymous struct directly inside another struct provides some sort of benefit? As far as I can tell, inserting/removing an anonymous struct that is directly within another struct is completely equivalent. Thanks, would be interesting to learn something new. |
@marler8997 This is a contrived example. struct base {
struct base *next;
int id;
};
struct derived {
struct base; /* anonymous */
struct window_position pos;
}; The struct struct derived *dp = ...;
dp->pos.x = 42;
if(dp->id == MY_NIFTY_ID) {...} /* field 'id' is in the base struct */
dp = list;
while(dp) {
draw_window(dp);
dp = dp->next; /* field 'next' is in base */
} By using anonymous types like this you avoid some casting (so a bit safer) and your code is a bit more generic since access to common elements is done the same way regardless of the type of the derived struct. The difference between anonymous structs and named structs is not great, but it is used in situations like this. Another example: variant records in C using anonymous unions: struct variant {
int variant_type;
union {
int i_val;
float f_val;
void *p_val;
}; /* anonymous */
}
struct variant *var = ...;
if(var->variant_type == INTEGER) {
var->i_val = 1337; /* anonymous access to sub-field */
} Here the Zig's facilities are much, much nicer here! The only reason I see for accepting this is for translation of C code. The resulting Zig code will be closer to 1-to-1 with the original C code. I hope that gives you some ideas of how this is used in C. |
In what version of C is this valid? I know it isn't valid in C11, as it gives a |
@kyle-github, I'm very confused by your example. I've never heard of a C feature that would allow you to embed a struct "anonymously" inside another struct that is declared elsewhere, nor does this accepted Zig proposal allow that. If either language supported this then you would have a use case for supporting structs inside structs, but I've never heard of such a feature. Where did these semantics come from? I tried compiling this and I get semantic errors that I would expect. |
This is Plan9 style struct embedding, already discussed and rejected in #1214. |
Oh wow I've never heard of this feature before. Given this new knowledge, the question that comes to mind is what is the plan for handling this feature in translate C? Will Zig just assert an error if it encounters C code using this Plan9 style of anonymous struct embedding? |
@marler8997, @zigazeljko there are two things being conflated here. IIUC, this embedding is NOT the Plan 9 version. It is the MS version. See the GCC docs on this. The first part of that describes how anonymous structs and unions work and then describes the Plan 9 extension on top of that (which is considerably more complicated and approaches C++ multiple inheritance for performance impact). I am only talking about the MS extension, definitely not the Plan 9 extension! I do not know where the extension idea came from originally, but even GCC bundles it under I do see the MS extension used with some frequency. I almost never see the Plan 9 extra extensions used. I probably muddied the waters with my examples because I was following a pattern from old Amiga coding where the |
According to GCC docs you linked, it says the struct {
struct my_tag {
int a;
};
}; I don't see where it says the struct foo { int a; }
struct bar { struct foo; }
struct bar b;
b.a; |
The Plan 9 compiler lists it as their extension. Section 3.3 |
@Nairou thanks for looking that up. @marler8997 here is a sample: #include <stdio.h>
struct foo {
int bar;
};
struct blah {
int snork;
struct foo;
};
int main() {
struct blah blah_var;
blah_var.bar = 42;
printf("blah.var=%d\n", blah_var.bar);
return 0;
} Compile and run it:
Here is what happens when you modify the C code to use the Plan 9 extensions. Here's the code: #include <stdio.h>
struct foo {
int bar;
};
struct blah {
int snork;
struct foo;
};
int main() {
struct foo *foo_ptr;
struct blah blah_var;
blah_var.bar = 42;
foo_ptr = &blah_var;
printf("foo_ptr->bar=%d\n", foo_ptr->bar);
return 0;
} Try to compile with just the MS extensions:
Try again with the Plan 9 extensions:
I definitely do not see the Plan 9 extensions used often. Thankfully. Being able to cast the pointer like that seems to have the performance impact of C++ multiple inheritance with few of the advantages. |
Ok thanks @kyle-github, I see where I missed that from the GCC docs:
So now that we've established where these semantics come from, the question is what will Zig do with these when it encounters them during translate C? What's the plan? |
Since this was accepted, some good concerns have been raised. We revisited this at a recent design meeting. Most notably, the previously accepted version of this proposal did not play well with type info. Type info is built for reflection, and makes a separation between unions and structures. The idea behind this separation is that reflection code should be able to assume that all members of a struct are simultaneously valid and do not overlap, and only one member of a union is valid at a time. The implementation of this idea that plays well with type info is #5561. However, after examining the use cases for this feature, we no longer believe it is worth the complexity. cImport should not decide the direction of the language. We make an effort to support C, but the C features are meant to be an optional extension to the language. We should be able to cleanly remove the C features from the language, and be left with something that is complete. #5561 can’t be reasonably restricted to only extern structs. If we implemented that, it would be expected to work on bare structs as well. But we’ve rejected all use cases of this feature that are not C interop, so it’s not worth including this in the non-C part of the language. And since there would be dissonance in having this feature work only on extern data types, this should not be included in the language. While we don’t necessarily reject the validity of the C interop use case, we do not see a path to supporting it that is worth the complexity it would introduce. Another important feature of the design of Zig is that it is meant to have a “normal form”, in the mathematical sense. There shouldn’t be multiple styles of zig code, and there shouldn’t be features that some people decide to use and others don’t. Zig should look pretty much the same regardless of the problem domain it’s tackling, and regardless of the person writing the code. This is a stronger argument against this feature, which would create a new tradeoff to consider that is purely stylistic. We can completely remove this mental overhead by not having this feature. Zig's stance on syntactic sugar is this: If there is a common pattern, where the best way to do it is unappealing, and the appealing way has footguns or drawbacks, that’s a good argument that we should introduce sugar to make the better way easier. A classic example here is that we used to have an unwrapping operator, like .?, for error unions. We removed it because it made the “bad” thing of transforming error codes into UB much easier than handling errors. It was replaced it with Applying this reasoning to field inclusion, the behavior it encourages is not necessarily better or safer. It’s more complex, and it introduces the footgun of accidentally clobbering fields of a struct which overlap. So we've decided to reject this proposal. We apologize for the whiplash, and we'll make a stronger effort going forward to avoid accepting and then rejecting issues. |
TL;DR: basically #19660 :) Just as an additional data piece, I'm currently also missing unnamed nested packed structs and unions in my experiments to map a computer emulator system bus to a packed struct where each bit represents a connection between system chips, but where the same bits should be accessible under different names, or groups of bits either as an unsigned integer or individual bits. Currently I'm doing this with the system bus being an unsigned integer and regular bit twiddling. Technically I can mostly achieve what I want with packed structs, but when each nested packed struct or union requires a name it leads to very long chained access code, like: const d0 = bus.cpu.dbus.bits.d0;
const dbus = bus.ctc.dbus.val; ...for what could be: const d0 = bus.d0;
const dbus = bus.db; ...and sometimes I also need to come up with pointless placeholder names just because I want to have the same bit accessible under two different names. IMHO the const Bus = packed union {
_: packed struct {
_: packed union {
db: u8,
_: packed struct {
d0: u1,
d1: u1,
// ...
},
},
_: packed union {
m1: u1,
cs: u1,
},
mreq: u1,
iorq: u1,
},
}; ...at the same time this ...I think the general idea of having differently named and typed 'views' onto the same bits in packed struct is a very useful thing to have. Maybe unnamed nested structs and unions are just the wrong approach because of its 'C/C++ baggage', and a different language feature makes more sense for that same idea. For the same reason it would be nice if packed structs could contain arrays, like Of course I could also just always fall back to regular bit twiddling on unsigned integers, but packed structs are so close to replace bit twiddling code to something a lot more convenient and readable. |
Details of Exactly What was once Accepted
Details of why we went back on that and rejected this issue
Proposal: Support nested anonymous struct and unions
Abstract
A useful and common pattern found in a few other languages is the ability for structs and unions to contain anonymous structs and unions.
Example: Views into packed unions
Example: Struct OOP
Pros
Cons
The text was updated successfully, but these errors were encountered: