-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFD-139. Gogo Must Go. #28386
RFD-139. Gogo Must Go. #28386
Conversation
This describes an initial RFD for removing Gogo from Teleport.
rfd/0139-gogo-must-go.md
Outdated
- Allows us to remove gogoproto extensions without affecting backend storage. | ||
- Allows us to utilize fields in the backend more efficiently and store larger objects. | ||
|
||
The primary disadvantage here is that manually examining the backend is more difficult as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "examining" isn't the only concern here - sometimes during incidents we need to perform surgery on the Teleport store in order to be able to bring something back online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some time ago, @strideynet and I took a look at a hidden protobuf tag called json_name
that when used with jsonpb allows you to do the same as gogo proto JSON but unfortunately it does not support structure embedding which we also abuse a lot
|
||
##### `gogoproto.stdtime` | ||
|
||
We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this migration something we can start doing now ? Would this be a wire-compatible change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK all of these changes should be wire compatible. This is something that we can do now (I'm putting together a POC for it as we speak, actually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not allowed to break protobuf wire compatibility anyway. 😎
rfd/0139-gogo-must-go.md
Outdated
##### `gogoproto.nullable` | ||
|
||
We will need to remove the `gogoproto.nullable` tags, which overall shouldn't be a huge impact. | ||
We should migrate to using `optional` as defined [here](https://protobuf.dev/programming-guides/proto3/#field-labels). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know support for optional
hasn't been in proto3
since initiation - and I've not used it yet with proto3
. If we can't use it, then we can use the google provided wrappers: https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullable tags are actually client side only and they are null if they are not set in proto. This happens when the value is the default value for the type so using a wrapper isn't an option as well because they change the wire representation
If a int field has zero value, it won't be marshalled to proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yuck, I didn't even think about the fact that optional
changes the wire representation. Yeah, this won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, all the same, CheckAndSetDefaults
will realistically handle this for most structures. I think we can probably get away with just removing nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, nullable has no effect so we can handle internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think y'all got this backwards - optional
is the default for all proto3 fields. It is required
that got dropped between proto2 and proto3, because as it turns out required
was a terrible idea to begin with.
I don't recommend writing optional
before our proto fields, and strongly not recommend messing with required
(since I'm mentioning it). None of this changes the wire format - that is something that just doesn't happen with Google protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not as simple as it is on the tin. You are correct that proto3
gets rid of required
and leans towards what could be called optional
fields. However, an optional
keyword was introduced in proto v3.15. With the Go generator, this usually yields a pointer to the expected type w/ something like:
message MyMessage {
optional string foo = 1;
}
type MyMessage struct {
// various pb fields snipped
Foo *string
}
I can't say I'm a huge fan of pointers being used to indicate the presence/non-presence of a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hoped we learned our lesson about custom generator options by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional
is a perfectly legitimate feature (especially on scalar fields where you wouldn't otherwise get any "presence" info), but it's not supported by the javascript/typescript generator we're using, and gogoproto makes terrible code for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My usual advice would be to not rely on field presence - ideally you shouldn't care whether it's empty, nil, set explicitly, a default value, etc. If you do care about presence, then it's best to design in a way that makes it explicitly clear. A good example is using a FieldMask to inform partial updates - https://cloud.google.com/apis/design/standard_methods#update.
rfd/0139-gogo-must-go.md
Outdated
- Allows us to remove gogoproto extensions without affecting backend storage. | ||
- Allows us to utilize fields in the backend more efficiently and store larger objects. | ||
|
||
The primary disadvantage here is that manually examining the backend is more difficult as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here, as explored in devicetrust
recently, is to write separate Go structs for storage, rather than marshaling/unmarshaling pb messages - https://github.com/gravitational/teleport.e/blob/master/lib/devicetrust/storage/types.go
This is something we can probably undertake gradually as an option if we wanted to break away from using the same types for storage as we do for RPC. There's various arguments for/against this which might be worth investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm obviously biased, but I think this is the way to go - in fact, I think it's almost always the better way to go, as having API representations dictate storage is quite limiting (point in case this RFD section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I'll update the RFD to reflect this.
rfd/0139-gogo-must-go.md
Outdated
- Allows us to remove gogoproto extensions without affecting backend storage. | ||
- Allows us to utilize fields in the backend more efficiently and store larger objects. | ||
|
||
The primary disadvantage here is that manually examining the backend is more difficult as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm obviously biased, but I think this is the way to go - in fact, I think it's almost always the better way to go, as having API representations dictate storage is quite limiting (point in case this RFD section).
rfd/0139-gogo-must-go.md
Outdated
##### `gogoproto.nullable` | ||
|
||
We will need to remove the `gogoproto.nullable` tags, which overall shouldn't be a huge impact. | ||
We should migrate to using `optional` as defined [here](https://protobuf.dev/programming-guides/proto3/#field-labels). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think y'all got this backwards - optional
is the default for all proto3 fields. It is required
that got dropped between proto2 and proto3, because as it turns out required
was a terrible idea to begin with.
I don't recommend writing optional
before our proto fields, and strongly not recommend messing with required
(since I'm mentioning it). None of this changes the wire format - that is something that just doesn't happen with Google protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are broadly 2 ways to go about this:
-
We stay with gogo, or remain gogo-compatible. In this path we do the least bit of work in Teleport and either fork gogo ourselves (which we might be forced to if we do nothing for too long) or adopt an active gogo fork. (Not saying I like this, but it's an option to be considered.)
-
We move away from Gogo, likely to plain Google protobuf. In this path we systematically rid ourselves of Gogo extensions until, eventually, we can move out of it. This is more work, but has the potential to land us in a close-to-ideal situation in the future (this is a no-brainer for a new project, for example). We'll need to map problems and solutions clearly so everyone can chime in as we do it.
I lean towards this approach
Because it fixes several problems while moving away from gogo.
In theory, it takes more work but thinking about the future it can save us 2 major refactors |
|
||
##### Write custom marshalers for objects | ||
|
||
In order to keep our user facing representations of the objects, we will need to write | ||
custom JSON and YAML marshalers for our objects. This may be time intensive but not overall | ||
a significant amount of work. | ||
|
||
#### Embedding types (removing `gogoproto.embed`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes like this will cause Go-level incompatible changes in api/.
I think we should discuss whether these can be avoided and, if not (or not worth it), what kind of "migration" path we are thinking of implementing. (Ie, how we'll deal with customer breakages and communication, potential e/ breakages, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. I'm not sure we can avoid breakages without keeping gogo intact, or at least a segment of Teleport that still maintains has it. Even getting rid of nullable
will cause a Go level breakage here. Maybe we could have an api-legacy
that we maintain for 3 versions and then remove?
I don't think we can avoid e breakages without keeping the object interfaces intact, or without doing some deep wizardry on the backend service interfaces, which sounds hideous.
Customer breakages should definitely be well communicated, but I'm not sure of the best mechanism there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Communication and a plan seem good enough in this case. It's worth noting that this could cause pains for a few versions while everything is moved to new formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we currently care much about code-level incompatibility in the API module, fwiw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we caused breakages for customers a few times because of API changes. @zmb3 for feedback on how much we should worry about this.
|
||
##### Write custom marshalers for objects | ||
|
||
In order to keep our user facing representations of the objects, we will need to write | ||
custom JSON and YAML marshalers for our objects. This may be time intensive but not overall | ||
a significant amount of work. | ||
|
||
#### Embedding types (removing `gogoproto.embed`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we currently care much about code-level incompatibility in the API module, fwiw.
It's imperative that these objects are compatible with the current JSON/YAML representations in Teleport. | ||
That is, the objects that are in Teleport today should unmarshal properly into our new internal objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we freeze the current resource types and require a version bump (to a "new world" version that uses protojson in the backend and a dedicated service for RPCs) for any future change, to avoid accidental mistakes slipping through, or do you think we can chug along for a while manually adding new fields to the conversion functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good idea. I'm less sold on the protojson
purely for the reasons I described in #28386 (comment).
|
||
##### `gogoproto.stdtime` | ||
|
||
We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not allowed to break protobuf wire compatibility anyway. 😎
Rather than use gRPC's wire format or serialized JSON format, we should consider writing | ||
a storage object format. This has a number of benefits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use protojson for this? It's a canonical encoding of protobuf, and I don't see us abandoning protobuf anytime soon. My guess is that if we require a hand-rolled json struct for this we'll end up writing a json struct that's almost field-by-field compatible with the protobuf counterpart and we'll just deserialize-serialize it to convert it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few reasons for doing this:
- By writing our own storage object, we decouple the API representation of objects from our backend.
- We can ensure backwards compatibility with existing backends with no migration step, though migration isn't terrible.
- We can use these objects subsequently for CLI display and maintain the current k8s-like JSON/YAML structure that we've got.
What do you think?
…o package in example.
Removing myself as a bot assigned reviewer as I won't have anything meaningful to add here. |
Okay, so after some offline discussion, I think a major point of contention is storage objects vs. using I wanted to kick off a discussion about this. Thoughts @strideynet, @codingllama, @espadolini, @tigrato? |
I'm +1 on storage objects, but I also have a suggestion. Something I've done a few times in the past is to keep a 1:1 mapping between API and storage initially, but then refactor to a distinct storage object once there's a reason to deviate. If the main concern is the cost of having separate representations, this is a way around that. I do like the clarity we get from separate representations and don't mind the initial cost, specially for big distributed teams like Teleport, but I've also seen this argument go around enough times. Both alternatives are doable. |
For reference, we had explicit storage definitions for device trust from the start: https://github.com/gravitational/teleport.e/blob/1961fb1e4ec54aab8f6ec014439acf83407ef690/lib/devicetrust/storage/types.go#L14-L25 All conversions are explicit, "regular" Go code. This is a simple example: https://github.com/gravitational/teleport.e/blob/1961fb1e4ec54aab8f6ec014439acf83407ef690/lib/devicetrust/storage/storage.go#L1426-L1443 These "unentangle" API protos from JSON tags (and Gogo), but a strong reason for them is to allow us to read, write and expire certain "parts" of the representation separately from others. |
Can you give an example of what this might look like in practice? |
In Teleport terms we'd The main hurdle here is that we would have to move away from Gogo while keeping the JSON representation 100% compatible, which I imagine is going to be hard or awkward to do for some of the existing types. Selectively applying protojon and custom written types may be the way to go. |
So the general idea is: its up to the implementer to choose |
More importantly, we can change that at the point when we have a reason to decouple the RPC-level representation from the storage-level representation. |
Whilst I personally have a preference towards distinct go structs for storage, it's relatively easy for us to migrate to that in future on a per resource basis (assuming the behaviour of protojson is predictable enough to let us convert it to a Go struct without any mismatches). I imagine that allowing the use of the proto types for storage will be the path of least resistance here and encourage folks to follow the guidelines set out in this RFD. So with that in mind, I'm happy to support reusing the API types. It'd definitely be nice to include within the RFD the strengths/weaknesses of both, and consider them both the "accepted implementations" going forward, as well as including guidance on when to move to pure Go structs and how to do so. Otherwise I think we risk having a bunch of different implementations across all the different packages - it'd be nice to have "legacy", "Proto Messages" and "pure Go struct" as the maximum number of variants 😓 |
An argument against using the same representation for the API level and the storage level storage is that at times it can be a bit of an antipattern to use the storage object in a Create/Upsert RPC whose intent is not "store this data in the backend" but "please create an entity in the cluster with this object as a loose template", as we end up only using a subset of the object and validating away or ignoring the rest. Conversely, we can end up in a situation where the resource in the backend is implicitly a subset of the API level object, which is also confusing: |
Hey all (@espadolini, @strideynet, @codingllama), I'd like to revive this and try to take it through to approval/it's natural conclusion so that we have something to reference going forward. I've made some updates since we last discussed this, do you all mind taking a look and see what you think about its current state? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my main feedback, after seeing how some of the ideas played out in access lists, is to use the protos for communication between layers and limit the custom-written objects to storage. The "duped" types under api/types add significant boilerplate, contribute to the existence of the catch-all types package and cause some other problems of their own (like not being easily cloneable).
I realize this doesn't answer all the problems, but I'm not ready to adopt the RFD in its entirety yet.
// ResourceHeader will contain the version, kind, and metadata fields. | ||
ResourceHeader | ||
|
||
Spec *userGroupSpec `json:"spec" yaml:"spec"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What value does a spec provide? Can we get rid of that pattern too?
Hey all, just taking another look at this. I'll take a swing at this and update, but I'm thinking the thing to do here is to get rid of the resource header as a recommendation and then we can just use |
I think much of this has been obviated by https://github.com/gravitational/teleport/blob/master/rfd%2F0153-resource-guidelines.md. I think at this point it will need a rewrite any way, so I'm going to close this. |
This describes an initial RFD for removing Gogo from Teleport.
Note: The initial point of this RFD is not to provide a comprehensive solution, but to provide a starting place for ideation and to try to assemble many viewpoints to try to figure out what concerns we need to take in and how best to address them. I heavily suspect this RFD will change quite a bit.
Rendered