Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing value/null support for scalar value types in proto 3 #1606

Closed
lostindark opened this issue May 26, 2016 · 154 comments
Closed

Missing value/null support for scalar value types in proto 3 #1606

lostindark opened this issue May 26, 2016 · 154 comments

Comments

@lostindark
Copy link

From the protobuf wire format we can tell whether a specific field exists or not. And in protobuf 2 generated code, all fields have a "HasXXX" method to tell whether the field exists or not in code. However in proto 3, we lost that ability. E.g. now we can't tell if a int32 field is missing, or has a value of 0 (default for int32 type).

In many scenario, we need the ability to differentiate missing vs default value (basically nullable support for scalar types).

Feng suggest workarounds:

  1. Use a wrapper message, such as google.protobuf.Int32Value. In proto3, message fields still have has-bits.
  2. Use an oneof. For example:
    message Test1 {
    oneof a_oneof {
    int32 a = 1;
    }
    }
    then you can check test.getAOneofCase().

However, this requires change the message definition, which is a no go if you need to keep the message compatible while upgrade from proto2 to proto 3.

We need to add back the support to detect whether a scalar type field exist or not.

More discussion here: https://groups.google.com/forum/#!topic/protobuf/6eJKPXXoJ88

@ngbrown
Copy link
Contributor

ngbrown commented May 27, 2016

While proto3 mentions using proto2 Message Types

It's possible to import proto2 message types and use them in your proto3 messages, and vice versa. However, proto2 enums cannot be used in proto3 syntax.

It might not be good application design to expect the .proto files and binary format to stay the same after switching to proto3.

Even though it appears that using the wrappers.proto is the way proto3 is addressing nullable fields, and the c#/.net library has convenient nullable formatting for them, they will end up consuming an extra byte per value because of the additional message layer.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented May 27, 2016

@lostindark, workaround (2) using an oneof is a wire compatible change.

@lostindark
Copy link
Author

lostindark commented Jun 3, 2016

@lostindark, workaround (2) using an oneof is a wire compatible change.

The issue is it doesn't produce nice code, and it needs box/unbox when use (hurt performance).

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Jun 3, 2016

@lostindark what if we optimize that case (one primitive field in an oneof) to eliminate the box/unbox cost?

@lostindark
Copy link
Author

@xfxyjwf That seems a good solution with minimum change required. However, I do wish I don't need to name the oneof, like below:
message Test1 { oneof { int32 a = 1; } }

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Jun 4, 2016

I am thinking of a dedicated syntax for that. Something like:

message Foo {
  nullable int32 value = 1;
}

which is basicaly a syntax sugar for:

message Foo {
  oneof value_oneof {
    int32 value = 1;
  }
}

I'll bring this up with the team and see how others think about this.

@skorokhod
Copy link

@xfxyjwf

message Foo
{
int x = 1;
oneof v1 { int32 value = 1;}
oneof v2 { string value = 1;}
}

gets error on v2: "value" is already defined in "Foo".
Is it possible to have several independent oneofs in one message?
Moreover it's not clear to me which tag is assigned for v1 and v2 fields?

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Jun 8, 2016

@skorokhod

Oneof fields are just like regular fields and are in the parent message's naming scope. You need to define your message as:

message Foo {
  int x = 1;
  oneof v1 { int32 value1 = 2; }
  oneof v2 { int32 value2 = 3; }
}

All these fields need different names and field numbers.

@ngbrown
Copy link
Contributor

ngbrown commented Jun 8, 2016

The C#/.NET library already optimizes for the wrappers.proto case and doesn't box/unbox those nullable values. Looping in @jskeet to ask if it can/should optimize both cases?

Even in my own application which used protobuf2, I relied on the HasXXX functions for field presence. This was used to broadcast updates, only filling in the fields that had changed. Not present didn't mean that it was the default value, but that it hadn't changed from the previous value.

I think though, this was probably viewed as a mistake in the original design/spec since it conflicts with the eliding of fields with default values. Going forward, I don't see many realistic ways to say that v2 and v3 are binary compatible for applications/messages that relied on this behavior.

@jskeet
Copy link
Contributor

jskeet commented Jun 8, 2016

@ngbrown: I don't want to start special-casing the situation where there's a oneof with only a single field in it, no. That feels like complexity for little benefit. The way proto3 has been designed to work is that if you want a primitive field with presence, you use the wrapper type. I wouldn't want to subvert that.

@christian-storm
Copy link

In struggling (#1655) with this 'bug' I realized that I don't understand the rational behind having default values at all when you can't set the default value or make the field required, i.e., trust that you should use the default value if the field isn't set on the wire. Why not do away with defaults all together? If a field is set put it on the wire even it is a zero length string or a zero. Otherwise one can assume it is null.

This problem is killing me because I have a situation where a int32 field may or may not be set and one of the valid data values is zero. As it stands I have to use oneof or do something hackish like +1 to all values zero or greater before serialization and -1 after parsing the message.

In the current incarnation enums are particularly problematic. Given that "every enum definition must contain a constant that maps to zero as its first element" and fields are inherently optional it is impossible to use it in an application. Was the zeroth element explicitly set before serialization? Was it not set? There should be a big warning- don't use the zeroth element in an enum. You won't know if it has been set or not. It is best to set the first element to something that will never be selected, e.g., NOT_SET = 0;

@christian-storm
Copy link

Ran across this in python_message.py and thought it may be pertinent. Just trying to help...

def _AddPropertiesForNonRepeatedScalarField(field, cls):
  """Adds a public property for a nonrepeated, scalar protocol message field.
  Clients can use this property to get and directly set the value of the field.
  Note that when the client sets the value of a field by using this property,
  all necessary "has" bits are set as a side-effect, and we also perform
  type-checking.
 <snip>
  def getter(self):
    # TODO(protobuf-team): This may be broken since there may not be
    # default_value.  Combine with has_default_value somehow.
    return self._fields.get(field, default_value)

@lostindark
Copy link
Author

@jskeet The problem with wrapper type is it is not wire format compatible with existing proto 2 message. Assume you want to upgrade to proto 3 for some services in a large project, what is the option here if the project already leverage HasXXX feature in proto 2? Upgrade to a different wire format? The cost might be too high.
I know proto 3 is not 100% compatible to proto 2. But I think it shouldn't break important functionalities that people may frequently depend on. Maybe in proto 3 design the choice was made to get ride of this support for simplicity. However, this break an important feature and makes the upgrade from proto 2 painful.

@lostindark
Copy link
Author

@xfxyjwf Is there any update on the discussion?

@christian-storm
Copy link

Thanks for replying @lostindark. I thought my comment may have fallen on deaf ears. I'm actually coming to this from a new project perspective. I was attracted to v3 vs v2 because of its tight integration with JSON. Since v3 doesn't support HasXXX I've so far worked around this issue by wrapping fields that may be set to the default value in oneof statements. This is very clumsy and brittle to me. Enums take the cake though. All of my enums have the necessary 0th enum element set to NOT_SET. I still don't get why defaults are needed at all in v3 since they are effectively not supported anymore.

I know defaults aren't set over the wire. One open question that I have is whether fields set to a default should have some sort of has_been_set_bit flipped to true so that the receiver knows to set the value of a seemingly valueless field to the default? That way when you try to access it you get the default value instead of nothing.

@lostindark
Copy link
Author

@christian-storm In my perspective proto 3 has some design issue.

In the doc it says:

Message fields can be one of the following:
•singular:
a well-formed message can have zero or one of this field (but not more than one).
•repeated:

This make people think all fields are optional now.
However, the generated code can't not tell if it exists or not, as it can't tell the difference between missing and default value. This means those fields are not optional, instead, they're actually "required". The fields are always there. If you don't set them, they have the default value. The default value is stored as "not exist field" in the wire format though.

Does people need optional fields? Yes we do. We don't want wrapper fields (why extra space? and not compatible with proto 2), and we don't want ugly one of approach.

I hope this will be fixed in proto 3, or else it will be a big problem for people who have similar requirements.

@dopuskh3
Copy link

We also need this. Adding oneof to every fields that should be nullable if pretty verbose and syntaxic sugar version would be fantastic. What do you think?

@anentropic
Copy link

anentropic commented Aug 29, 2016

message Foo {
  oneof value_oneof {
    int32 value = 1;
  }
}

so, this is the recommended/only way to handle a nullable field?

this is quite a non-obvious usage for oneof, the docs barely mention that it actually means 'one or none of'
https://developers.google.com/protocol-buffers/docs/proto3#oneof

surely nearly everyone designing their first protobuf messages has this question, the docs could give more guidance I think

@anentropic
Copy link

another possibility I've stumbled on is using a wrapper type, since the default value for message fields is null

so you can:

import "google/protobuf/wrappers.proto";

message Foo {
  google.protobuf.Int32Value value = 1;
}

the available wrappers correspond the "Well-Known Types" listed in the docs https://developers.google.com/protocol-buffers/docs/reference/google.protobuf although I didn't see it explained anywhere what they're intended to be used for or how to import them

but it seems a bit nicer than the oneof way, because you can't do this with oneof:

message Foo {
  oneof bar {
    int32 value = 1;
  }
  oneof baz {
    int32 value = 2;
  }
}

...the compiler complains about re-use of value label, so you'd want to come up with your own convention like oneof bar { int32 bar_value = 1; } and you have to get/set like bar.bar_value = x

whereas with the wrappers, the field is always just called value so you can get/set like bar.value = x which is a bit nicer

it's annoying that, either way, your nullable fields therefore have different get/set code to other fields

to be honest, if message fields can be unset and effectively have null value I don't understand why all the fields weren't designed that way. default values should be responsibility of the application. it seems dangerous to have eg int32 defaults to 0 when in many domains that is a meaningful value

I am wondering if it makes sense to just use the wrappers for all my fields.

@ckamel
Copy link

ckamel commented Sep 27, 2016

A question to the protobuf team if they're around. Why not add a new wire type for null values? There are 3 bits reserved in the format allowing for 7 values, and only 6 are used (2 of which are deprecated).
I'm running into the same issue and the alternative of wrapping every nullable field in a *Value adds a byte to the wire packet for each. In my case, I'm reading from a DB where (almost) any field can be null.

@acozzette
Copy link
Member

@ckamel, adding a new wire type could work, but I think there are a few big downsides to doing that:

  • We only have two unused wire types left (6 and 7), so if we want to add any new ones in the future we have to be very thrifty with them.
  • Older parsers wouldn't know how to interpret the new wire type, so it would be tricky to roll out the change in a way that doesn't break anything. We would possibly have to do something like update all parsers to understand the new type, wait a year or so for them to be deployed, and only then start serializing in the new format.
  • The proto3 semantics is already specified and being used, and so reintroducing nullability could break a lot of existing code and be disruptive.

@ckamel
Copy link

ckamel commented Sep 29, 2016

That makes sense. I think null would be a worthy cause for one of those unused wire types left :) But if the older parsers can't ignore unknown wire types then rolling this out would be complicated.
I worked around this by wrapping every field in a message, the downside is an extra byte per field whenever it's not null (which is the predominant case).
Thanks, @acozzette!

@dopuskh3
Copy link

dopuskh3 commented Jan 14, 2017

Hey,

Right now, not having nullable field is a very annoying issue:

  • switching from encoding that has nullable to proto can be extra hard and result in wrapping everything using google wrapper or adding a huge amount of boolean fields.
  • using messages for machine learning applications or were you just throw a set of dimensions (the message) in a black box. Not having null type in in this case breaks a lot of such applications.

You will probably argue that this kind of use-case should clearly end in adding extra fields in order to explicitly encode meaning of null value - which is the cleanest way to go.
Unfortunately, this is type of change can be extra-hard to achieve in legacy code, when trying to migrate from encoding like json to protobuf. This is clearly a stopper.

@acozzette I understand the above point but let me point out a few things:

  • Adding a nullable things may not break other implementations in case the unknown type will just be ignored which is the most common case.
  • In the current implementation, I propose to add a isXXXNull generated method that will return true in case field type is null. Getters will still return the type default value, remaining backward compatible.
  • Regarding the fact that only 2 free rooms are left. I think this nullable feature is such a pain for users that the question is worth asking. Plus group start/end values that are deprecated may be re-usable at some point

@qinghui-xu
Copy link

I think @dopuskh3 's suggestion would be interesting, adding a isXXXDefined (or isXXXnull) method will not hurt the existing code using protobuf 3.

@natbraun
Copy link

I concur with @dopuskh3, having nullable types in ProtoBuf is a very interesting option. I think people migrating from JSON to ProtoBuf are faced with that problem sonner or later, and I'd love to see a clean way to make the transition smoother.

In some applications, null actually carries information. Consider for instance the time since last event: if the event actually occured, then you have an int value, otherwise null.

@acozzette
Copy link
Member

@dopuskh3 Have you considered using proto2? Proto2 already has generated methods for checking presence like the one you suggested--if you have a field x you can call has_x() to see if it's set (this is C++ but it's similar in other languages). Proto2 is still fully supported and we're still working on new improvements to it with every release.

For proto3 I think currently the best approach is to either use the wrapper types or oneof fields in cases where you need nullability. The oneof option is especially nice because it is wire-compatible with proto2. I would say that if the oneof trick proves to be too awkward, we should spend more time looking into @xfxyjwf's idea above, which would basically be to add some nice syntactic sugar around the oneof trick. This would probably be the easiest way forward because it would be very backward-compatible (both with the wire format and API) and wouldn't require much work since the oneof functionality already exists. Does that sound reasonable?

@nilsonsfj
Copy link

Five years of pain coming to an end!

@marshallma21
Copy link

Huzzah, get ready for required optionals again!

// Required
optional int32 foo = 1;

Frustrating but working. I hope this is not required in 3.13.

@yulrizka
Copy link

Thanks for the effort on this! Does this got released (not experimental) any more on [https://github.com/protocolbuffers/protobuf/releases/tag/v3.13.0](v3.13.0]? I did not see it on the Release page and wanted to double check

@aardappel
Copy link

Funny, we just added very similar support for null scalars (not present, not default) in FlatBuffers: google/flatbuffers#6014

@kriskowal
Copy link

Will encoders be obliged to write every field they know about, regardless of whether it is equivalent to a zero? That is, will checking for presence of a field indicate whether the write-side knew of the existence of the field?

@acozzette
Copy link
Member

@yulrizka Proto3 optional is still guarded by an experimental flag in the 3.13 release.

@kriskowal No, encoders with proto3 optional fields will work the same way as optional fields in proto2. The field will be written only if it was present, and the notion of presence is unrelated to whether the value is 0 or not.

@smund01
Copy link

smund01 commented Nov 19, 2020

Is there any targeted future release version for field presence feature to ship without the experimental flag?

@mattwelke
Copy link

Well this issue was a rollercoaster to read. I'm glad that in the end, we got this feature. I can understand some of the pros of the original proto3 approach that were expressed, notably that it simplifies language generator implementations for languages like Go, and that it makes the wire protocol more efficient for folks who don't need to check for field presence.

I like that the new optional feature, if I'm understanding it correctly, would let folks who want the ability to check for field presence use a clearer definition of the field in their schema. To them, optional makes more sense than some.wrapper.type, and they're willing to take the hit of the few extra bytes that would be needed to convey this information over the wire (it sounds like both optional and existing techniques like wrapper types would both incur extra bytes).

I can't tell yet if I'll take advantage of optional because I think we'll have use cases where we need to check for presence of repeated fields, and the new proto3 optional design document says this won't be supported still. For our use cases, an int array not provided vs. an empty int array have different meaning. So we'll end up having to use wrapper types or something for that. For that reason, we might end up using wrapper types for all our nullable fields for consistency, but I'm still happy to see people have this option instead.

@shwuhk
Copy link

shwuhk commented Feb 21, 2021

Has* function is only generated for scalar value types but not nested message?

@mattwelke
Copy link

@shwuhk As far as I know, there needs to be a way to check for presence of nested messages. Most often with generated code, I've seen this with "has" functions, but I've noticed that it isn't always given to you. For Go, the nested message type ends up being a pointer, which means you can check for presence without a "has" function. You'd check whether the pointer is nil:

message MyMessage {
  google.protobuf.BoolValue my_bool_value = 1;
}
if myMessage.MyBoolValue != nil {
	fmt.Printf("Data provided. It is %v.\n", myMessage.MyBoolValue.Value)
} else {
	fmt.Printf("Data not provided.\n")
}

I'd double check your language's Protobuf documentation and raise an issue in its repo if the ability to check for field presence for non-scalar fields is lacking.

@shwuhk
Copy link

shwuhk commented Feb 22, 2021

@mattwelke I think it is not consistency in the generated codes of different language. "has" functions are generated in Java codes but not C# and I am now checking the presence of nested messages just like you said, == null.

Thanks for your help, you make it very clearly!

@lilorsarry
Copy link

jsonstring:[{"Type":"String","tag":1,"value":"root"}]
language:C++
error occured when using JsonStringToMessage,erro message: Root element must be a message.
how can I do?

@jskeet
Copy link
Contributor

jskeet commented Apr 10, 2023

@lilorsarry: That appears to be unrelated to this issue. Please open a new issue, providing significantly more detail (ideally a short but complete program to reproduce the problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests