Parsing from `proto` should keep field ID. #7645

dbaileychess · 2022-11-16T03:18:10Z

When parsing from a proto that have explicit field identifiers, flatbuffers need to keep those identifiers stable by explicitly using the id attribute. This is because protos allow new field placement anywhere in their message, even before other fields, as long as they use a new field ID.

Example:

message Foo {
  optional int32 baz = 2;
  optional int32 bar = 1;
}

The text was updated successfully, but these errors were encountered:

enum-class · 2022-11-18T03:58:41Z

would you mind if I do it ?

dbaileychess · 2022-11-18T04:23:27Z

@enum-class Sure, I always welcome PRs

dbaileychess · 2022-12-05T18:03:22Z

Ok, I have given this a bit more thought.

We should map proto-field-ids to flatbuffer-ids in the following way:

Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
Generate a flatbuffer-id for each of those fields, starting at 0 and going up consecutively
Be mindful of union types that have a hidden type field that needs to be placed.
We should (if possible) emit the flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.

We will support gaps in proto-field-ids by just ignoring the skipped values.

For conformity-sake, if someone adds a proto-field-id in an existing gap, flatc will generate a valid .fbs schema with the above rules. However, this schema won't be a proper evolved schema from its prior history. The user is expected to check this with the --conform test of flatc on the two generated .fbs files. No explicit support for protos within--conform is needed; users are expected to compare the generated .fbs schema instead.

To aid the user at this possibility of non-conformity, a new flag should be added: --disallow-proto-field-gaps that defaults to true. When this flag is true, if there are any gaps in the proto-field-ids an error will be emitted and no .fbs will be generated. When this flag is false, only an warning is generated and the .fbs will be generated. (I'm open to making this flag an enum with warn, error, no-op or something, if we want more control).

@enum-class Please see above for edits to your PR.

@aardappel for any opinions.

aardappel · 2022-12-05T18:21:36Z

This generally seems ok, though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Or, if you specify --proto, that it works with you specifying --conform in the same command-line, and if you don't, big shouty warning --proto ran without --conform, new .fbs not guaranteed to be compatible with past conversions! or similar.

enum-class · 2022-12-06T00:22:30Z

Sort all the proto-field-ids by their numerical value, and stack from low-to-high.

What about oneof (union) that does not have id ?

enum-class · 2022-12-06T00:25:20Z

--disallow-proto-field-gaps

As we sort and map the ids, what is the point of this config flag ?

dbaileychess · 2022-12-06T00:48:28Z

Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
What about oneof (union) that does not have id ?

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

--disallow-proto-field-gaps
As we sort and map the ids, what is the point of this config flag ?

This is to alert the user that they could introduce issues in the future. If you have:

field_a = 1;
field_b = 1000;

And later do

field_a = 1;
field_b = 1000;

field_c = 10

That would lead to the following flatbuffer schema:

a (id = 0)
b (id = 1)

and

a (id = 0)
c (id = 1)
b (id = 2)

Those two schema are "valid" on their own, but are not conforming to each other. So the flag is alerting the user that this potential could happen in the future.

dbaileychess · 2022-12-06T00:54:44Z

though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Well, on the first conversion from proto->fbs, you wouldn't have a previous schema. That could probably be fixed with a flag to explicitly state "--first-proto-conversion" or whatnot.

But, the later about forgetting to include --conform, is that we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform .

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Yes, but this is kind of the purpose of the --disallow-proto-field-gaps flag. We only have to warn in situations where there is a potential for making incompatible schema changes.

enum-class · 2022-12-06T01:31:38Z

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ?
As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

enum-class · 2022-12-06T01:34:47Z

This is to alert the user that they could introduce issues in the future. If you have:

It sounds reasonable.
And what should we do for reserved id ? In protobuf, if reserved id be used, it will alert. should we alert something similar ?

aardappel · 2022-12-06T06:28:41Z

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform

I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

fergushenderson · 2022-12-06T11:46:24Z

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers.

So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

dbaileychess · 2022-12-06T18:54:22Z

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.
What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ?
As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

@enum-class

The fields within a one-of are just normal fields with their respective ids. Protobufs just ensures if that if one is set, the other fields are not-set. We don't have that mechanism in flatbuffers. So by ignoring that type, I just mean process the inner types as normal fields and disregard that they are wrapped in a oneof.

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform
I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

@aardappel

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers.
So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

@fergushenderson

Yeah, we have to make sure that existing users are not broken. I think the legacy flag is appropriate, though I would want to default to off.

aardappel · 2022-12-06T19:08:05Z

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

Rather than requiring it, with a flag to turn the error off, I suggested a clear warning (that you can ignore at your peril).

enum-class · 2022-12-07T04:56:15Z

In conclusion:

Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
As oneof elements in protobuf which is equal to union in flatbuffer, do not have id we assign them the least possible id to them. (Be mindful of union types that have a hidden type field that needs to be placed.)
Check the assigned id in the proto file not to be used twice in a message, or not to be missed.
Check the reserved id in proto not to be used
Map fields id to new id that is consecutive from 0.
Keep flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.
Need to add --disallow-proto-field-gaps flag (default to no-op). if there are any gaps in the ids In warning mode it will emit a warning and generate fbs and in Error mode show error messages and not generate fbs.
Need to add –keep-proto-id (default to false) for backward compatibility
And about --conform, I don’t know :)) ??! Maybe --conform be required with --proto

cc: @aardappel @dbaileychess @fergushenderson

This one is designed to catch ABI breakage from possible future changes to the FlatBuffer compiler's proto to FlatBuffer schema conversion (e.g. see google/flatbuffers#7645). To do that, this CL checks in a snapshot of the .fbs file generated by the FlatBuffer compiler as of this CL, and then at build/test time we generate a new .fbs file, which will potentially use a different version of the FlatBuffer compiler (if/when the FlatBuffer compiler is updated). Then we use 'flatc --conform' to verify compatibility of those two. PiperOrigin-RevId: 493867469

* Parsing from proto should keep field ID. (fixes #7645) * Fix failed tests * Fix windows warning * Improve attribute generation in proto to fbs * Check if id is used twice. fix Some clang-format problems * Test if fake id can solve the test problem * Validate proto file in proto -> fbs generation. * Fix error messages * Ignore id in union * Add keep proto id for legacy and check gap flag have been added. Reserved id will be checked. * Add needed flags * unit tests * fix fromat problem. fix comments and error messages. * clear * More unit tests * Fix windows build * Fix include problems * Fake commit to invoke rebuild * Fix buzel build * Fix some issues * Fix comments, fix return value and sort for android NDK * Fix return type * Break down big function * Place todo --------- Co-authored-by: Derek Bailey <derekbailey@google.com>

enum-class added a commit to enum-class/flatbuffers that referenced this issue Nov 20, 2022

Parsing from proto should keep field ID. (fixes google#7645)

43a2d81

enum-class mentioned this issue Nov 20, 2022

Parsing from proto should keep field ID. (fixes #7645) #7655

Merged

dbaileychess closed this as completed in f838017 Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing from `proto` should keep field ID. #7645

Parsing from `proto` should keep field ID. #7645

dbaileychess commented Nov 16, 2022

enum-class commented Nov 18, 2022

dbaileychess commented Nov 18, 2022

dbaileychess commented Dec 5, 2022

aardappel commented Dec 5, 2022

enum-class commented Dec 6, 2022

enum-class commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

enum-class commented Dec 6, 2022 •

edited

Loading

enum-class commented Dec 6, 2022

aardappel commented Dec 6, 2022

fergushenderson commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

aardappel commented Dec 6, 2022

enum-class commented Dec 7, 2022

Parsing from proto should keep field ID. #7645

Parsing from proto should keep field ID. #7645

Comments

dbaileychess commented Nov 16, 2022

enum-class commented Nov 18, 2022

dbaileychess commented Nov 18, 2022

dbaileychess commented Dec 5, 2022

aardappel commented Dec 5, 2022

enum-class commented Dec 6, 2022

enum-class commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

enum-class commented Dec 6, 2022 • edited Loading

enum-class commented Dec 6, 2022

aardappel commented Dec 6, 2022

fergushenderson commented Dec 6, 2022

dbaileychess commented Dec 6, 2022

aardappel commented Dec 6, 2022

enum-class commented Dec 7, 2022

Parsing from `proto` should keep field ID. #7645

Parsing from `proto` should keep field ID. #7645

enum-class commented Dec 6, 2022 •

edited

Loading