RFD-139. Gogo Must Go. #28386

mdwn · 2023-06-27T19:17:17Z

This describes an initial RFD for removing Gogo from Teleport.

Note: The initial point of this RFD is not to provide a comprehensive solution, but to provide a starting place for ideation and to try to assemble many viewpoints to try to figure out what concerns we need to take in and how best to address them. I heavily suspect this RFD will change quite a bit.

Rendered

This describes an initial RFD for removing Gogo from Teleport.

strideynet · 2023-06-27T19:29:56Z

rfd/0139-gogo-must-go.md

+- Allows us to remove gogoproto extensions without affecting backend storage.
+- Allows us to utilize fields in the backend more efficiently and store larger objects.
+
+The primary disadvantage here is that manually examining the backend is more difficult as


I think "examining" isn't the only concern here - sometimes during incidents we need to perform surgery on the Teleport store in order to be able to bring something back online.

Some time ago, @strideynet and I took a look at a hidden protobuf tag called json_name that when used with jsonpb allows you to do the same as gogo proto JSON but unfortunately it does not support structure embedding which we also abuse a lot

strideynet · 2023-06-27T19:31:06Z

rfd/0139-gogo-must-go.md

+
+##### `gogoproto.stdtime`
+
+We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag.


Is this migration something we can start doing now ? Would this be a wire-compatible change ?

AFAIK all of these changes should be wire compatible. This is something that we can do now (I'm putting together a POC for it as we speak, actually).

You're not allowed to break protobuf wire compatibility anyway. 😎

strideynet · 2023-06-27T19:38:28Z

rfd/0139-gogo-must-go.md

+##### `gogoproto.nullable`
+
+We will need to remove the `gogoproto.nullable` tags, which overall shouldn't be a huge impact.
+We should migrate to using `optional` as defined [here](https://protobuf.dev/programming-guides/proto3/#field-labels).


I know support for optional hasn't been in proto3 since initiation - and I've not used it yet with proto3. If we can't use it, then we can use the google provided wrappers: https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto

Nullable tags are actually client side only and they are null if they are not set in proto. This happens when the value is the default value for the type so using a wrapper isn't an option as well because they change the wire representation

If a int field has zero value, it won't be marshalled to proto

Oh yuck, I didn't even think about the fact that optional changes the wire representation. Yeah, this won't work.

Well, all the same, CheckAndSetDefaults will realistically handle this for most structures. I think we can probably get away with just removing nullable.

Yeah, nullable has no effect so we can handle internally

I think y'all got this backwards - optional is the default for all proto3 fields. It is required that got dropped between proto2 and proto3, because as it turns out required was a terrible idea to begin with.

I don't recommend writing optional before our proto fields, and strongly not recommend messing with required (since I'm mentioning it). None of this changes the wire format - that is something that just doesn't happen with Google protobuf.

It's not as simple as it is on the tin. You are correct that proto3 gets rid of required and leans towards what could be called optional fields. However, an optional keyword was introduced in proto v3.15. With the Go generator, this usually yields a pointer to the expected type w/ something like:

message MyMessage { optional string foo = 1; }

type MyMessage struct { // various pb fields snipped Foo *string }

I can't say I'm a huge fan of pointers being used to indicate the presence/non-presence of a field.

I'd hoped we learned our lesson about custom generator options by now.

optional is a perfectly legitimate feature (especially on scalar fields where you wouldn't otherwise get any "presence" info), but it's not supported by the javascript/typescript generator we're using, and gogoproto makes terrible code for it.

My usual advice would be to not rely on field presence - ideally you shouldn't care whether it's empty, nil, set explicitly, a default value, etc. If you do care about presence, then it's best to design in a way that makes it explicitly clear. A good example is using a FieldMask to inform partial updates - https://cloud.google.com/apis/design/standard_methods#update.

strideynet · 2023-06-27T19:46:13Z

rfd/0139-gogo-must-go.md

+- Allows us to remove gogoproto extensions without affecting backend storage.
+- Allows us to utilize fields in the backend more efficiently and store larger objects.
+
+The primary disadvantage here is that manually examining the backend is more difficult as


Another option here, as explored in devicetrust recently, is to write separate Go structs for storage, rather than marshaling/unmarshaling pb messages - https://github.com/gravitational/teleport.e/blob/master/lib/devicetrust/storage/types.go

This is something we can probably undertake gradually as an option if we wanted to break away from using the same types for storage as we do for RPC. There's various arguments for/against this which might be worth investigating.

I'm obviously biased, but I think this is the way to go - in fact, I think it's almost always the better way to go, as having API representations dictate storage is quite limiting (point in case this RFD section).

I like this idea. I'll update the RFD to reflect this.

rfd/0139-gogo-must-go.md

codingllama · 2023-06-27T22:01:42Z

rfd/0139-gogo-must-go.md

+- Allows us to remove gogoproto extensions without affecting backend storage.
+- Allows us to utilize fields in the backend more efficiently and store larger objects.
+
+The primary disadvantage here is that manually examining the backend is more difficult as


I'm obviously biased, but I think this is the way to go - in fact, I think it's almost always the better way to go, as having API representations dictate storage is quite limiting (point in case this RFD section).

codingllama · 2023-06-27T22:04:17Z

rfd/0139-gogo-must-go.md

+##### `gogoproto.nullable`
+
+We will need to remove the `gogoproto.nullable` tags, which overall shouldn't be a huge impact.
+We should migrate to using `optional` as defined [here](https://protobuf.dev/programming-guides/proto3/#field-labels).


I think y'all got this backwards - optional is the default for all proto3 fields. It is required that got dropped between proto2 and proto3, because as it turns out required was a terrible idea to begin with.

I don't recommend writing optional before our proto fields, and strongly not recommend messing with required (since I'm mentioning it). None of this changes the wire format - that is something that just doesn't happen with Google protobuf.

codingllama

I think there are broadly 2 ways to go about this:

We stay with gogo, or remain gogo-compatible. In this path we do the least bit of work in Teleport and either fork gogo ourselves (which we might be forced to if we do nothing for too long) or adopt an active gogo fork. (Not saying I like this, but it's an option to be considered.)
We move away from Gogo, likely to plain Google protobuf. In this path we systematically rid ourselves of Gogo extensions until, eventually, we can move out of it. This is more work, but has the potential to land us in a close-to-ideal situation in the future (this is a no-brainer for a new project, for example). We'll need to map problems and solutions clearly so everyone can chime in as we do it.

tigrato · 2023-06-28T08:26:09Z

I lean towards this approach

We move away from Gogo, likely to plain Google protobuf. In this path we systematically rid ourselves of Gogo extensions until, eventually, we can move out of it. This is more work, but has the potential to land us in a close-to-ideal situation in the future (this is a no-brainer for a new project, for example). We'll need to map problems and solutions clearly so everyone can chime in as we do it.

Because it fixes several problems while moving away from gogo.

Get rid of Gogo
Allow us to split our AuthService into smaller services without having a huge impact now
Allow us to decouple API and storage objects

In theory, it takes more work but thinking about the future it can save us 2 major refactors

rfd/0139-gogo-must-go.md

codingllama · 2023-06-28T16:57:25Z

rfd/0139-gogo-must-go.md


 ##### Write custom marshalers for objects

 In order to keep our user facing representations of the objects, we will need to write
 custom JSON and YAML marshalers for our objects. This may be time intensive but not overall
 a significant amount of work.

+#### Embedding types (removing `gogoproto.embed`)


Changes like this will cause Go-level incompatible changes in api/.

I think we should discuss whether these can be avoided and, if not (or not worth it), what kind of "migration" path we are thinking of implementing. (Ie, how we'll deal with customer breakages and communication, potential e/ breakages, etc.)

Hrm. I'm not sure we can avoid breakages without keeping gogo intact, or at least a segment of Teleport that still maintains has it. Even getting rid of nullable will cause a Go level breakage here. Maybe we could have an api-legacy that we maintain for 3 versions and then remove?

I don't think we can avoid e breakages without keeping the object interfaces intact, or without doing some deep wizardry on the backend service interfaces, which sounds hideous.

Customer breakages should definitely be well communicated, but I'm not sure of the best mechanism there.

Communication and a plan seem good enough in this case. It's worth noting that this could cause pains for a few versions while everything is moved to new formats.

I don't think we currently care much about code-level incompatibility in the API module, fwiw.

I think we caused breakages for customers a few times because of API changes. @zmb3 for feedback on how much we should worry about this.

mdwn · 2023-06-29T16:56:46Z

Hey all, I've introduced a new object AccessList and attempted to implement it in a "correct" way. Do you all mind taking a look? I'm not expecting a full on review, but perhaps this could be used to inform this RFD.

#28385
#28479

Thoughts?

espadolini · 2023-06-30T09:17:05Z

rfd/0139-gogo-must-go.md


 ##### Write custom marshalers for objects

 In order to keep our user facing representations of the objects, we will need to write
 custom JSON and YAML marshalers for our objects. This may be time intensive but not overall
 a significant amount of work.

+#### Embedding types (removing `gogoproto.embed`)


I don't think we currently care much about code-level incompatibility in the API module, fwiw.

rfd/0139-gogo-must-go.md

espadolini · 2023-06-30T09:24:42Z

rfd/0139-gogo-must-go.md

+It's imperative that these objects are compatible with the current JSON/YAML representations in Teleport.
+That is, the objects that are in Teleport today should unmarshal properly into our new internal objects.


Should we freeze the current resource types and require a version bump (to a "new world" version that uses protojson in the backend and a dedicated service for RPCs) for any future change, to avoid accidental mistakes slipping through, or do you think we can chug along for a while manually adding new fields to the conversion functions?

That seems like a good idea. I'm less sold on the protojson purely for the reasons I described in #28386 (comment).

espadolini · 2023-06-30T09:26:18Z

rfd/0139-gogo-must-go.md

+
+##### `gogoproto.stdtime`
+
+We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag.


You're not allowed to break protobuf wire compatibility anyway. 😎

espadolini · 2023-06-30T09:28:56Z

rfd/0139-gogo-must-go.md

+Rather than use gRPC's wire format or serialized JSON format, we should consider writing
+a storage object format. This has a number of benefits:


Why not use protojson for this? It's a canonical encoding of protobuf, and I don't see us abandoning protobuf anytime soon. My guess is that if we require a hand-rolled json struct for this we'll end up writing a json struct that's almost field-by-field compatible with the protobuf counterpart and we'll just deserialize-serialize it to convert it anyway.

I see a few reasons for doing this:

By writing our own storage object, we decouple the API representation of objects from our backend.

We can ensure backwards compatibility with existing backends with no migration step, though migration isn't terrible.

We can use these objects subsequently for CLI display and maintain the current k8s-like JSON/YAML structure that we've got.

What do you think?

rfd/0139-gogo-must-go.md

…o package in example.

ravicious · 2023-07-03T13:24:56Z

Removing myself as a bot assigned reviewer as I won't have anything meaningful to add here.

mdwn · 2023-07-03T17:48:20Z

Okay, so after some offline discussion, I think a major point of contention is storage objects vs. using protojson directly. I'm in favor of storage objects because I think they give us greater flexibility and allow us to obfuscate away protobuf specific things from our internal logic. However, if we use protojson, then we can get around needing to write custom marshalers/unmarshalers entirely.

I wanted to kick off a discussion about this. Thoughts @strideynet, @codingllama, @espadolini, @tigrato?

codingllama · 2023-07-03T19:01:36Z

Okay, so after some offline discussion, I think a major point of contention is storage objects vs. using protojson directly. I'm in favor of storage objects because I think they give us greater flexibility and allow us to obfuscate away protobuf specific things from our internal logic. However, if we use protojson, then we can get around needing to write custom marshalers/unmarshalers entirely.

I wanted to kick off a discussion about this. Thoughts @strideynet, @codingllama, @espadolini, @tigrato?

I'm +1 on storage objects, but I also have a suggestion. Something I've done a few times in the past is to keep a 1:1 mapping between API and storage initially, but then refactor to a distinct storage object once there's a reason to deviate. If the main concern is the cost of having separate representations, this is a way around that.

I do like the clarity we get from separate representations and don't mind the initial cost, specially for big distributed teams like Teleport, but I've also seen this argument go around enough times. Both alternatives are doable.

codingllama · 2023-07-03T19:09:09Z

For reference, we had explicit storage definitions for device trust from the start: https://github.com/gravitational/teleport.e/blob/1961fb1e4ec54aab8f6ec014439acf83407ef690/lib/devicetrust/storage/types.go#L14-L25

All conversions are explicit, "regular" Go code. This is a simple example: https://github.com/gravitational/teleport.e/blob/1961fb1e4ec54aab8f6ec014439acf83407ef690/lib/devicetrust/storage/storage.go#L1426-L1443

These "unentangle" API protos from JSON tags (and Gogo), but a strong reason for them is to allow us to read, write and expire certain "parts" of the representation separately from others.

mdwn · 2023-07-03T19:09:23Z

I'm +1 on storage objects, but I also have a suggestion. Something I've done a few times in the past is to keep a 1:1 mapping between API and storage initially, but then refactor to a distinct storage object once there's a reason to deviate. If the main concern is the cost of having separate representations, this is a way around that.

Can you give an example of what this might look like in practice?

codingllama · 2023-07-03T19:16:40Z

I'm +1 on storage objects, but I also have a suggestion. Something I've done a few times in the past is to keep a 1:1 mapping between API and storage initially, but then refactor to a distinct storage object once there's a reason to deviate. If the main concern is the cost of having separate representations, this is a way around that.

Can you give an example of what this might look like in practice?

In Teleport terms we'd json.Marshal the proto directly to storage to begin with, like we do for most types now. Eventually you refactor into something like the device trust types I mentioned above - with strong parity to begin with, then greater liberty over time.

The main hurdle here is that we would have to move away from Gogo while keeping the JSON representation 100% compatible, which I imagine is going to be hard or awkward to do for some of the existing types. Selectively applying protojon and custom written types may be the way to go.

mdwn · 2023-07-03T19:29:32Z

I'm +1 on storage objects, but I also have a suggestion. Something I've done a few times in the past is to keep a 1:1 mapping between API and storage initially, but then refactor to a distinct storage object once there's a reason to deviate. If the main concern is the cost of having separate representations, this is a way around that.

Can you give an example of what this might look like in practice?

In Teleport terms we'd json.Marshal the proto directly to storage to begin with, like we do for most types now. Eventually you refactor into something like the device trust types I mentioned above - with strong parity to begin with, then greater liberty over time.

The main hurdle here is that we would have to move away from Gogo while keeping the JSON representation 100% compatible, which I imagine is going to be hard or awkward to do for some of the existing types. Selectively applying protojon and custom written types may be the way to go.

So the general idea is: its up to the implementer to choose protojson or custom marshaling while maintaining backwards compatibility?

espadolini · 2023-07-04T08:36:34Z

So the general idea is: its up to the implementer to choose protojson or custom marshaling while maintaining backwards compatibility?

More importantly, we can change that at the point when we have a reason to decouple the RPC-level representation from the storage-level representation.

strideynet · 2023-07-04T08:59:47Z

Whilst I personally have a preference towards distinct go structs for storage, it's relatively easy for us to migrate to that in future on a per resource basis (assuming the behaviour of protojson is predictable enough to let us convert it to a Go struct without any mismatches). I imagine that allowing the use of the proto types for storage will be the path of least resistance here and encourage folks to follow the guidelines set out in this RFD. So with that in mind, I'm happy to support reusing the API types.

It'd definitely be nice to include within the RFD the strengths/weaknesses of both, and consider them both the "accepted implementations" going forward, as well as including guidance on when to move to pure Go structs and how to do so. Otherwise I think we risk having a bunch of different implementations across all the different packages - it'd be nice to have "legacy", "Proto Messages" and "pure Go struct" as the maximum number of variants 😓

espadolini · 2023-07-04T10:13:33Z

An argument against using the same representation for the API level and the storage level storage is that at times it can be a bit of an antipattern to use the storage object in a Create/Upsert RPC whose intent is not "store this data in the backend" but "please create an entity in the cluster with this object as a loose template", as we end up only using a subset of the object and validating away or ignoring the rest.

Conversely, we can end up in a situation where the resource in the backend is implicitly a subset of the API level object, which is also confusing: UserV2 stores the .spec.local_auth field in a separate backend item, which gets marshaled/unmarshaled separately and transparently recombined if the appropriate options and permissions are set; splitting the backend representation from the API representation would make this obvious.

rfd/0139-gogo-must-go.md

mdwn · 2023-09-29T15:50:00Z

Hey all (@espadolini, @strideynet, @codingllama), I'd like to revive this and try to take it through to approval/it's natural conclusion so that we have something to reference going forward. I've made some updates since we last discussed this, do you all mind taking a look and see what you think about its current state?

codingllama

I think my main feedback, after seeing how some of the ideas played out in access lists, is to use the protos for communication between layers and limit the custom-written objects to storage. The "duped" types under api/types add significant boilerplate, contribute to the existence of the catch-all types package and cause some other problems of their own (like not being easily cloneable).

I realize this doesn't answer all the problems, but I'm not ready to adopt the RFD in its entirety yet.

rosstimothy · 2023-10-27T15:09:07Z

rfd/0139-gogo-must-go.md

+  // ResourceHeader will contain the version, kind, and metadata fields.
+  ResourceHeader
+
+  Spec *userGroupSpec `json:"spec" yaml:"spec"`


What value does a spec provide? Can we get rid of that pattern too?

mdwn · 2023-11-29T15:47:33Z

Hey all, just taking another look at this. I'll take a swing at this and update, but I'm thinking the thing to do here is to get rid of the resource header as a recommendation and then we can just use protojson as per @espadolini's earlier recommendations. That will get rid of the conversion functions as well. Thoughts?

mdwn · 2024-03-09T04:23:27Z

I think much of this has been obviated by https://github.com/gravitational/teleport/blob/master/rfd%2F0153-resource-guidelines.md. I think at this point it will need a rewrite any way, so I'm going to close this.

RFD-139. Gogo Must Go.

5df5874

This describes an initial RFD for removing Gogo from Teleport.

strideynet reviewed Jun 27, 2023

View reviewed changes

codingllama reviewed Jun 27, 2023

View reviewed changes

Add in some clarity, add initial proposed solution.

a852688

codingllama reviewed Jun 28, 2023

View reviewed changes

New example does not maintain UserGroup interface.

8da7c53

espadolini reviewed Jun 30, 2023

View reviewed changes

Remove references to builder, use usergroupv1 package instead of prot…

e5686ed

…o package in example.

mdwn mentioned this pull request Jun 30, 2023

Introduce Access List internal object. #28479

Merged

Implementation plan updates, more clarity added to proposed solution.

3952939

mdwn marked this pull request as ready for review June 30, 2023 17:55

github-actions bot added rfd Request for Discussion size/md labels Jun 30, 2023

github-actions bot requested review from kimlisa and ravicious June 30, 2023 17:55

ravicious removed their request for review July 3, 2023 13:25

Updates to support protojson in addition to internal objects.

c710b04

tigrato reviewed Jul 5, 2023

View reviewed changes

rfd/0139-gogo-must-go.md Show resolved Hide resolved

Mention json_name tag for protojson.

9a305dd

mdwn mentioned this pull request Jul 12, 2023

Clean up access list protos, add in conversion functions tests. #28787

Merged

codingllama reviewed Oct 4, 2023

View reviewed changes

rosstimothy reviewed Oct 27, 2023

View reviewed changes

rosstimothy mentioned this pull request Dec 4, 2023

RFD 0156: Notifications #35281

Merged

mdwn closed this Mar 9, 2024


		##### `gogoproto.stdtime`

		We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag.

		It's imperative that these objects are compatible with the current JSON/YAML representations in Teleport.
		That is, the objects that are in Teleport today should unmarshal properly into our new internal objects.

		Rather than use gRPC's wire format or serialized JSON format, we should consider writing
		a storage object format. This has a number of benefits:

RFD-139. Gogo Must Go. #28386

RFD-139. Gogo Must Go. #28386

Conversation

mdwn commented Jun 27, 2023

Choose a reason for hiding this comment

tigrato Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strideynet Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strideynet Jun 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codingllama left a comment

Choose a reason for hiding this comment

tigrato commented Jun 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdwn commented Jun 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ravicious commented Jul 3, 2023

mdwn commented Jul 3, 2023

codingllama commented Jul 3, 2023

codingllama commented Jul 3, 2023

mdwn commented Jul 3, 2023

codingllama commented Jul 3, 2023

mdwn commented Jul 3, 2023

espadolini commented Jul 4, 2023 • edited Loading

strideynet commented Jul 4, 2023

espadolini commented Jul 4, 2023

mdwn commented Sep 29, 2023

codingllama left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdwn commented Nov 29, 2023

mdwn commented Mar 9, 2024

tigrato Jun 27, 2023 •

edited

Loading

strideynet Jun 27, 2023 •

edited

Loading

strideynet Jun 27, 2023 •

edited

Loading

espadolini commented Jul 4, 2023 •

edited

Loading