Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for proto3 'optional' is coming in protobuf 3.12 #311

Closed
haberman opened this issue Apr 24, 2020 · 11 comments · Fixed by #327
Closed

support for proto3 'optional' is coming in protobuf 3.12 #311

haberman opened this issue Apr 24, 2020 · 11 comments · Fixed by #327

Comments

@haberman
Copy link

Hi there, I wanted to give you a heads-up that protobuf 3.12 is adding experimental support for optional in proto3, with a goal of making it generally available in protobuf 3.13:

protocolbuffers/protobuf#1606 (comment)

This may require some changes in protoreflect to fully support. Here is a guide to the changes that are generally required. I'm not super-familiar with protoreflect so I can't make any specific recommendations, but hopefully this doc will help:

https://github.com/protocolbuffers/protobuf/blob/master/docs/implementing_proto3_presence.md

Please let me know if I can answer any questions. Thanks!

@jhump
Copy link
Owner

jhump commented Apr 24, 2020

@haberman, how will this be represented in descriptors? I assume (hope) that the "label" field in FieldDescriptorProto will simply be absent/unset if no "optional" keyword was present. So code generators would distinguish between "optional - explicit presence" and "no presence - implicitly optional" based on whether the label is explicitly set to OPTIONAL or left unset.

Does that make sense?

Luckily, I think this will be a minor change to support. Changing protoparse to support this should be trivial. The less trivial (but still do-able) change is adding this functionality to dynamic messages in the dynamic package.

Do you know in what version of the Go protobuf runtime this is expected to be implemented?

@haberman
Copy link
Author

I assume (hope) that the "label" field in FieldDescriptorProto will simply be absent/unset if no "optional" keyword was present.

While that would be a clear way of expressing this in descriptors, it is unfortunately incompatible with the existing ecosystem on both the producing and consuming side:

  • since proto3 was introduced in 2015, the protocol compiler has explicitly set the label for proto3 fields to LABEL_OPTIONAL even when there was no optional keyword in the .proto file. So if we did what you suggested, all new software would mis-interpret all old descriptors by giving them all explicit presence.
  • as a result, none of the software out there uses the presence of the FieldDescriptorProto.label field as a signal for whether proto3 fields have presence. So if we started giving it that meaning, old software would misinterpret these new descriptors that support proto3 optional.

Instead we put each proto3 optional field inside a one-field oneof. You can read more about this in the doc I linked above. While this adds cruft to the descriptor, it also is compatible with existing proto3 reflection-based algorithms, since fields in a oneof always have presence. For example, the C++ TextFormat class uses reflection to parse/serialize protos to text format, and this class did not require any changes to support proto3 presence properly.

Do you know in what version of the Go protobuf runtime this is expected to be implemented?

@dsnet could answer this question.

@jhump
Copy link
Owner

jhump commented Apr 26, 2020

Instead we put each proto3 optional field inside a one-field oneof.

Oy. I guess I didn't read thoroughly enough. That is unfortunate. So, to recap, if the optional keyword is present on a field with proto3 syntax the field is automatically wrapped in a oneof. How is the name of the oneof chosen (since it must be unique across the whole namespace of the enclosing message)?

Do you know in what version of the Go protobuf runtime this is expected to be implemented?

It sounds like no new runtime support is necessary if the change is in how the descriptor is generated. The existing Go runtime would continue to work fine, but just expose these optional fields precisely as if they were one-field oneofs (since that is what the descriptor will look like that is fed to protoc-gen-go).

In that case, I was wrong about where the impact will be. I think the dynamic package will be fine, and it will just be a (still small-ish) change in protoparse.

@haberman
Copy link
Author

How is the name of the oneof chosen (since it must be unique across the whole namespace of the enclosing message)?

First we prepend just a _. If that isn't enough we prepend XXXXXX_ until we get something unique. You can see the algorithm here in C++: https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/parser.cc#L785-L803

The existing Go runtime would continue to work fine, but just expose these optional fields precisely as if they were one-field oneofs

We don't want the generated code to emit a generated API for the synthetic oneof. The oneof is visible to reflection, but it should not be visible to generated code.

For a user who is writing a .proto file and compiling to a generated API, proto2 and proto3 optional fields should result in the same generated code, both in API and implementation. The user should not be aware that the synthetic oneof is present in proto3.

@jhump
Copy link
Owner

jhump commented Apr 26, 2020

We don't want the generated code to emit a generated API for the synthetic oneof. The oneof is visible to reflection, but it should not be visible to generated code.

I guess that means that the generated code will have to use a field mask to know which optional fields were set?

More importantly, in order to distinguish synthetic oneofs from user-written ones, I think that means they must have a new option, to tell plugins/generators how to treat it, right? Or is this how all single-field oneofs are to be treated during code generation? (That seems like it would be an incompatible change in the generated code.)

@jhump
Copy link
Owner

jhump commented Apr 26, 2020

I see that there is indeed a new option: proto3_optional. And, like default and json_name, it is a weird pseudo-option that ends up on the descriptor itself instead of in FieldOptions.

@jhump
Copy link
Owner

jhump commented Apr 26, 2020

Out of curiosity, @haberman, do you know why it was chosen to be an attribute of the field instead, for example, a new field in OneofOptions? That would have been simpler IMO for plugins/generators to handle.

@haberman
Copy link
Author

I guess that means that the generated code will have to use a field mask to know which optional fields were set?

Each language will do whatever it does for proto2 optional fields. In C++ there is a bitmask with one bit for each optional field which tracks presence information.

Out of curiosity, @haberman, do you know why it was chosen to be an attribute of the field instead, for example, a new field in OneofOptions?

The FieldDescriptorProto.proto3_optional field is intended to be an encoding detail of descriptors, and not something that users interact with directly. We don't want code generators to be paying attention to proto2 vs proto3 when they are generating code. For example, a code generator shouldn't have to write code like this:

void GenerateCodeForField(const FieldDescriptor* field) {
  if (!field->is_repeated() &&
      (field->file()->syntax() == SYNTAX_PROTO2) ||
       field->options().proto3_optional())) {
    // Field has presence (optional or oneof).
  }
}

That code is complicated and brittle, and will have to be updated if we ever created anything like a "proto4". Instead we want code generators to use functions that describe the semantic meaning of each field:

void GenerateCodeForField(const FieldDescriptor* field) {
  if (field->has_presence()) {
    // Field has presence (optional or oneof).
  }
}

This code is more robust, because it is not directly paying attention to the difference between proto2 and proto3. Whoever owns the descriptor API can define the has_presence() method to do the right thing, and if a new version of proto ever came out, the method could be updated appropriately without affecting users.

Also, field options are generally something that users can set directly in .proto files. But we don't want to let users write this:

syntax = "proto3"

message Foo {
  int32 bar [proto3_optional = true];
}

@jhump
Copy link
Owner

jhump commented Apr 27, 2020

Also, field options are generally something that users can set directly in .proto files. But we don't want to let users write this:

Yeah, after I left that comment, I realized this was better and my comment/question was misguided.

It just goes against the precedent of map_entry -- which is an option on a message (though it should just be a field on the message, like this one; in fact, due to that strange choice, protoc has to explicitly disallow a user-configured value for it). And it's really not a case like default and json_name (which IMO should have been options instead of attributes of the field, since that is how they are defined syntactically).

Anyhow, I think I've got enough to start. @dsnet, how often are generated Go files kept in sync with the proto sources in the main protobuf repo? In other words, how long do you think it will be before the Go file is updated to reflect latest changes to the proto source.

@dsnet
Copy link

dsnet commented Apr 27, 2020

Do you know in what version of the Go protobuf runtime this is expected to be implemented?

@dsnet could answer this question.

I hope to get to it this week. Past few weeks have been notably distracting for me as a number of unexpected issues have required my investigation and time.

@dsnet
Copy link

dsnet commented Apr 28, 2020

https://go-review.googlesource.com/c/protobuf/+/230698 is the change that will add proto3 optional support to the google.golang.org/protobuf module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants