Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map<_,type> appears to fail for zero types #99

Closed
UserAB1236872 opened this issue May 2, 2018 · 14 comments
Closed

Map<_,type> appears to fail for zero types #99

UserAB1236872 opened this issue May 2, 2018 · 14 comments

Comments

@UserAB1236872
Copy link

UserAB1236872 commented May 2, 2018

I'll be honest, I'm not 100% certain if this is a prost issue of an issue with the Javascript library, but since I'm using the one included in Google's own tree:

https://github.com/google/protobuf/tree/master/js

I'm inclined to try here first. The issue is with the following code:

message Foo {
    map<string, int> bar = 1;
}

In Rust

let mut x = Foo::default();
x.bar.insert("val", 0);

Now encode and send it somewhere. Prost appears to only write the key, and not the subsequent value of 0. At least in Javascript, this causes a panic due to receiving a key with a missing value, leading me to have to use string types and conversions back and forth from numbers in order to bypass this.

This applies to all zero types, floats/doubles do this on 0.0, bools do this on false.

It's possible the specification allows this and the Javascript is in error, but again, I'm trying here first since I think this is more likely a prost bug.

@danburkert
Copy link
Collaborator

The language guide says that map fields are treated as repeated key/value message types, so I believe prost is at least not wrong to be making this optimization. Have you tried reproducing this against one of the more mature implementations like C++ or Java?

@danburkert
Copy link
Collaborator

The map encoding code goes way out of it's way to do this optimization, so at some point in the past I decided this was an important optimization, although I can't remember why now. Perhaps it's checked in the conformance tests.

@per-gron
Copy link

per-gron commented Jul 5, 2018

With equivalent C++ code, C++ protobuf generates 0a070a0376616c1000 which does include the default value, so I do not think this is required for conformance (I assume C++ protobuf is conformant).

C++ protobuf also accepts 0a070a0368656a as input (without the encoded zero). I think prost is not wrong here.

@UserAB1236872
Copy link
Author

UserAB1236872 commented Jul 7, 2018

I'll close then

(Also, I don't believe C++ can be wrong. It's the reference implementation if I'm not mistaken?)

@danburkert
Copy link
Collaborator

FWIW I'm still not convinced prost should be doing this optimization. It makes the code a bit uglier, and if it causes compat issues then I think we should consider removing it.

@per-gron
Copy link

A guess as to why you could have thought of this optimization (or whatever you'd like to call it) as important: Because of Prost's way to represent messages as plain Rust structs, if this optimization wouldn't be there, a proto message with many int fields would take noticeable amount of space when serialized even when the object is just Default::default(). This is not the case for Java or C++ default constructed messages.

Not saying that it has to be kept, but that seems like a somewhat useful feature to me.

@danburkert danburkert reopened this Aug 2, 2018
danburkert added a commit that referenced this issue Aug 2, 2018
Removes an optimization that skipped encoding default map values.

fixes #99
@danburkert
Copy link
Collaborator

I looked into what it would look like to change the behavior so that map keys are always encoded. It ends up making some of the most complicated code in prost simpler: 0753974. I'm somewhat inclined to merge it given the simplifications and compat issues. I'd really like to see this corner case added to the upstream conformance test suite, since merging this means prost will no longer be able to test that it can correctly decode such maps (unless some examples are hardcoded).

I'm curious if y'all have thoughts.

@danburkert
Copy link
Collaborator

A guess as to why you could have thought of this optimization (or whatever you'd like to call it) as important: Because of Prost's way to represent messages as plain Rust structs, if this optimization wouldn't be there, a proto message with many int fields would take noticeable amount of space when serialized even when the object is just Default::default(). This is not the case for Java or C++ default constructed messages.

That's definitely the case for proto3 default visibility fields, and prost does the exact same optimization, but the code that does that is distinct from map encoding/decoding, so the optimization can be backed out for maps without affecting proto3 messages. My gut is that this optimization really doesn't matter for maps much, since it will be relatively rare to have default values. It's definitely important for proto3 default visibility fields, though, since those will be the default much more often.

@brentnk
Copy link

brentnk commented Apr 25, 2021

Fwiw, this breaks with the the following configuration: [server: tonic/prost] -> [envoy] -> [dart/flutter using grpc-web]. Dart experiences a Code 15 Unrecoverable data loss or corruption.

A go server implementing the same grpc service responds with the encoded 0 (confirmed with wireshark) for my map<int32, OtherMessage> field and is decoded properly. Using index +/- 1 on both sides is a good enough workaround for now.

@osa1
Copy link

osa1 commented Aug 26, 2022

Prost appears to only write the key, and not the subsequent value of 0.

This is the right thing to do according to proto spec: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-tag-value-stream-wire-format-serialization

When serializing, fields with no presence are not serialized if they contain their default value.

Encoding of map fields are specified in https://developers.google.com/protocol-buffers/docs/proto#maps

In proto syntax, if you have a map field like

map<int32, MyMapValue> my_map_field = 123;

It's encoded as if it was

repeated MyMapFieldEntry my_map_field = 123;

where

message MyMapFieldEntry {
    optional int32 key = 1;
    optional MyMapValue value = 2;
}

In the original example, map value type is int (which is not a valid proto type, I'm guessing the OP meant one of the valid integer types like int32 or int64), and because the default value for integer types is 0, prost is doing the right thing by omitting the value. If the receiving end fails to read the message then it's a bug in the protobuf implementation running on the receiving end.

For Dart protobuf implementation, we fix the decoding for map fields to handle this case in google/protobuf.dart#745.

Note that since both key and value fields of map entries are optional, it's also possible to omit the key field. For example

message Foo {
    map<int32, int32> bar = 1;
}
let mut x = Foo::default();
x.bar.insert(0, 0);

A spec compliant implementation should encode x as

[
    (1 << 3) | 2, // tag = 1, wire type = length delimited
    0, // length = 0, i.e. empty entry message
]

@LucioFranco
Copy link
Member

@osa1 thanks for the explanation, so I guess we can close this now since this isn't really a bug with prost?

@osa1
Copy link

osa1 commented Aug 26, 2022

Thinking more about this, I'm not as confident now that this is not a bug.

In the spec I quoted in my comment above:

When serializing, fields with no presence are not serialized if they contain their default value.

Here I missed the "fields with no presence" part. optional enables presence. So in this encoding shown in https://developers.google.com/protocol-buffers/docs/proto#maps:

message MapFieldEntry {
  optional key_type key = 1;
  optional value_type value = 2;
}

repeated MapFieldEntry map_field = N;

Not encoding a default value (in key or value fields) is actually different than encoding it, because key and value fields are optional, so they have presence checks. (i.e. in addition to reading their values, you can check whether they exist in a message)

Now, the question is what happens on the receiving end when you get a map entry with missing key, value, or both key and value fields, and you represent proto map fields as actual an actual map (rather than a list of entries) in your language. Protobuf spec does not mention this. However, the behavior should be consistent with the "desugaring" shown above (where map is represented as repeated MapFieldEntry), and for that we need to use the default values of key and value fields when they are missing, because that's what we would do if the map field was actually repeated MapFieldEntry.

Unfortunately, protobuf spec does not specify what the default values should be for optional Message fields. I did a survey of this a while ago: google/protobuf.dart#309 (comment) you can see that different implementations return different values.

What this means is, if you have a map<int32, MyMessage> field, and omit a MyMessage value because it's the same as default message, on the receiving end, some implementations will read the map entry as 123 => null, others read it as 123 => MyMessage() (i.e. value is empty/default message).

To avoid this problem prost may want to serialize the value fields always. I'm not sure if not doing it is a bug or not.

@osa1
Copy link

osa1 commented Sep 1, 2022

I asked this to protobuf team (in context of google/protobuf.dart#309).

It's fine if an implementation omits key or value of a map entry when the value is the default for the field type.

So prost's behavior seems fine, and if an implementation cannot handle it then it's a bug in that implementation. (this is what we fix in the PR linked above)

That said, official protobuf implementation will never omit key or value fields, even when they have the default value for the type: https://github.com/protocolbuffers/protobuf/blob/8d3a7327606b115dda871be54c8b8cb66d41ff90/src/google/protobuf/wire_format.cc#L1184-L1203

I'm guessing other protobuf implementation by the protobuf team (e.g. the JS one linked in the bug report) also do the same, and that's why they cannot handle when key or value fields are missing. When sending they always serialize those fields so the case where one or both of the fields are missing is not considered and tested.

@LucioFranco
Copy link
Member

Ok thanks for the update, I will close this issue since the implementation seems correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants