Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing from proto should keep field ID. #7645

Closed
dbaileychess opened this issue Nov 16, 2022 · 15 comments
Closed

Parsing from proto should keep field ID. #7645

dbaileychess opened this issue Nov 16, 2022 · 15 comments

Comments

@dbaileychess
Copy link
Collaborator

When parsing from a proto that have explicit field identifiers, flatbuffers need to keep those identifiers stable by explicitly using the id attribute. This is because protos allow new field placement anywhere in their message, even before other fields, as long as they use a new field ID.

Example:

message Foo {
  optional int32 baz = 2;
  optional int32 bar = 1;
}
@enum-class
Copy link
Contributor

would you mind if I do it ?

@dbaileychess
Copy link
Collaborator Author

@enum-class Sure, I always welcome PRs

@dbaileychess
Copy link
Collaborator Author

Ok, I have given this a bit more thought.

We should map proto-field-ids to flatbuffer-ids in the following way:

  • Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
  • Generate a flatbuffer-id for each of those fields, starting at 0 and going up consecutively
  • Be mindful of union types that have a hidden type field that needs to be placed.
  • We should (if possible) emit the flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.

We will support gaps in proto-field-ids by just ignoring the skipped values.

For conformity-sake, if someone adds a proto-field-id in an existing gap, flatc will generate a valid .fbs schema with the above rules. However, this schema won't be a proper evolved schema from its prior history. The user is expected to check this with the --conform test of flatc on the two generated .fbs files. No explicit support for protos within--conform is needed; users are expected to compare the generated .fbs schema instead.

To aid the user at this possibility of non-conformity, a new flag should be added: --disallow-proto-field-gaps that defaults to true. When this flag is true, if there are any gaps in the proto-field-ids an error will be emitted and no .fbs will be generated. When this flag is false, only an warning is generated and the .fbs will be generated. (I'm open to making this flag an enum with warn, error, no-op or something, if we want more control).

@enum-class Please see above for edits to your PR.

@aardappel for any opinions.

@aardappel
Copy link
Collaborator

This generally seems ok, though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Or, if you specify --proto, that it works with you specifying --conform in the same command-line, and if you don't, big shouty warning --proto ran without --conform, new .fbs not guaranteed to be compatible with past conversions! or similar.

@enum-class
Copy link
Contributor

  • Sort all the proto-field-ids by their numerical value, and stack from low-to-high.

What about oneof (union) that does not have id ?

@enum-class
Copy link
Contributor

--disallow-proto-field-gaps

As we sort and map the ids, what is the point of this config flag ?

@dbaileychess
Copy link
Collaborator Author

Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
What about oneof (union) that does not have id ?

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

--disallow-proto-field-gaps
As we sort and map the ids, what is the point of this config flag ?

This is to alert the user that they could introduce issues in the future. If you have:

field_a = 1;
field_b = 1000;

And later do

field_a = 1;
field_b = 1000;

field_c = 10

That would lead to the following flatbuffer schema:

a (id = 0)
b (id = 1)

and

a (id = 0)
c (id = 1)
b (id = 2)

Those two schema are "valid" on their own, but are not conforming to each other. So the flag is alerting the user that this potential could happen in the future.

@dbaileychess
Copy link
Collaborator Author

though I wonder if it is worth it to (by default) require the previous .fbs schema, i.e. it is effectively going to run --conform for you (with a way to disable that of course). I fear if you don't, this idea of having to run --conform separately is going to get forgotten, and people will end up with subtle bugs of old serialized data being read with an incompatible schema.

Well, on the first conversion from proto->fbs, you wouldn't have a previous schema. That could probably be fixed with a flag to explicitly state "--first-proto-conversion" or whatnot.

But, the later about forgetting to include --conform, is that we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform .

Alternatively, some clear message saying .proto converted to .fbs, please check with --conform before using this schema or similar?

Yes, but this is kind of the purpose of the --disallow-proto-field-gaps flag. We only have to warn in situations where there is a potential for making incompatible schema changes.

@enum-class
Copy link
Contributor

enum-class commented Dec 6, 2022

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.

What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ?
As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

@enum-class
Copy link
Contributor

This is to alert the user that they could introduce issues in the future. If you have:

It sounds reasonable.
And what should we do for reserved id ? In protobuf, if reserved id be used, it will alert. should we alert something similar ?

@aardappel
Copy link
Collaborator

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform

I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

@fergushenderson
Copy link
Contributor

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers.

So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

@dbaileychess
Copy link
Collaborator Author

I think we should just ignore that type. We don't have a similar concept in flatbuffers other than unions. But they are not the same thing.
What do you mean by ignore them ? I mean what id should we assign to union (oneof) type ?
As they do not have id I already consider them "0", so they will be assigned to the minimum possible id in the struct. Do you agree with this approach ?

@enum-class

The fields within a one-of are just normal fields with their respective ids. Protobufs just ensures if that if one is set, the other fields are not-set. We don't have that mechanism in flatbuffers. So by ignoring that type, I just mean process the inner types as normal fields and disregard that they are wrapped in a oneof.

we don't currently put this burden on normal users of flatbuffers. I.e., someone could evolve a fbs poorly and never run --conform
I think its a lot easier to run into incompatibility issues when the a .proto is the source of truth than when a .fbs is.

@aardappel

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

While I am firmly in favour of addressing this issue, I realize that the changes to take proto field numbers into account may actually break binary backwards compatibility for existing users of the --proto flag, by generating a .fbs schema file that doesn't conform with the .fbs schema file generated by previous versions of flatbuffers.
So I think we may need to have a --legacy-ignore-proto-field-numbers flag to preserve binary backwards compatibility in those cases?

@fergushenderson

Yeah, we have to make sure that existing users are not broken. I think the legacy flag is appropriate, though I would want to default to off.

@aardappel
Copy link
Collaborator

True, i'm fine having --conform be required with --proto, as long as their is a flag to disable this.

Rather than requiring it, with a flag to turn the error off, I suggested a clear warning (that you can ignore at your peril).

@enum-class
Copy link
Contributor

In conclusion:

  1. Sort all the proto-field-ids by their numerical value, and stack from low-to-high.
  2. As oneof elements in protobuf which is equal to union in flatbuffer, do not have id we assign them the least possible id to them. (Be mindful of union types that have a hidden type field that needs to be placed.)
  3. Check the assigned id in the proto file not to be used twice in a message, or not to be missed.
  4. Check the reserved id in proto not to be used
  5. Map fields id to new id that is consecutive from 0.
  6. Keep flatbuffer fields in the same order that the proto fields are ordered within the file, disregarding the field-id. This keeps the logical grouping of fields consistent.
  7. Need to add --disallow-proto-field-gaps flag (default to no-op). if there are any gaps in the ids In warning mode it will emit a warning and generate fbs and in Error mode show error messages and not generate fbs.
  8. Need to add –keep-proto-id (default to false) for backward compatibility
  9. And about --conform, I don’t know :)) ??! Maybe --conform be required with --proto

cc: @aardappel @dbaileychess @fergushenderson

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this issue Dec 8, 2022
This one is designed to catch ABI breakage from possible future changes
to the FlatBuffer compiler's proto to FlatBuffer schema conversion
(e.g. see google/flatbuffers#7645).
To do that, this CL checks in a snapshot of the .fbs file generated by
the FlatBuffer compiler as of this CL, and then at build/test time we
generate a new .fbs file, which will potentially use a different version
of the FlatBuffer compiler (if/when the FlatBuffer compiler is updated).
Then we use 'flatc --conform' to verify compatibility of those two.

PiperOrigin-RevId: 493867469
dbaileychess added a commit that referenced this issue Mar 15, 2023
* Parsing from proto should keep field ID. (fixes #7645)

* Fix failed tests

* Fix windows warning

* Improve attribute generation in proto to fbs

* Check if id is used twice. fix Some clang-format problems

* Test if fake id can solve the test problem

* Validate proto file in proto -> fbs generation.

* Fix error messages

* Ignore id in union

* Add keep proto id for legacy and check gap flag have been added. Reserved id will be checked.

* Add needed flags

* unit tests

* fix fromat problem. fix comments and error messages.

* clear

* More unit tests

* Fix windows build

* Fix include problems

* Fake commit to invoke rebuild

* Fix buzel build

* Fix some issues

* Fix comments, fix return value and sort for android NDK

* Fix return type

* Break down big function

* Place todo

---------

Co-authored-by: Derek Bailey <derekbailey@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants