Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD-139. Gogo Must Go. #28386

Closed
wants to merge 7 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions rfd/0139-gogo-must-go.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
---
authors: Michael Wilson (michael.wilson@goteleport.com)
state: draft
---

# RFD 129 - Gogo Must Go

### Required Approvers

* Engineering @r0mant, @justinas
* Security @reed
* Product: @klizhentas

## What

Migrate Teleport away from the unmaintained gogo protobuf implementation.

## Why

When Teleport was initially implemented, `gogo/protobuf` was a reasonable choice for Teleport's
gRPC support, as it was up to date, performant, and well maintained. However, since then, gogo has
been deprecated and there's no clear path to update to one of the more recent protobuf
implementations.

Furthermore, we've come to rely on several features of gogo that are not replicated in other
protobuf implementations. In particular, those features are:

- custom types that allow for specifying custom types during code generation.
- JSON tags that allow for explicitly labeled JSON tags when marshaling/unmarshaling protobuf
messages.
codingllama marked this conversation as resolved.
Show resolved Hide resolved

As we continue to lag the more modern releases of protobuf behind, we run the risk of falling prey
to vulnerabilities with no easy mitigation strategy, and the burden of migration grows with
additional messages and features that we add.

## Details

### Case studies

The following are case studies of other projects which have migrated away from `gogo/protobuf` and
related tooling that could be useful for our migration.
not go into thorough detail about each, but address some takeaways that I view as useful.

#### containerd's `gogo/protobuf` migration

[Relevant blog post](https://thoughts.8-p.info/en/containerd-gogo-migration.html)

containerd migrated away from `gogo/protobuf` starting in 2021 and finishing in April 2022.

**Important Takeaways**:

- Removing all `gogoproto` extensions ahead of time made the final migration significantly easier.

#### Thanos's `gogo/protobuf` migration

[Relevant blog post](https://giedrius.blog/2022/02/04/things-learned-from-trying-to-migrate-to-protobuf-v2-api-from-gogoprotobuf-so-far/)

This contains some general advice, but nothing particularly applicable in terms of specifics.

#### CrowdStrike's `csproto` library

https://github.com/CrowdStrike/csproto

CrowdStrike used `gogo/protobuf` to overcome various problems with Google's original protobuf
library and subsequently ran into issues during `gogo/protobuf`'s deprecated. `csproto`
is a library that elects to use the proper protobuf runtime for the appropriate message.
As a result, you could have an intermingling of different protobuf libraries without compatibility
issues.

### Our complications

Teleport has a number of complications tied in with our use of `gogo/protobuf` that will make
our migration unique. Along with our complications will be listed a number of potential
mitigation strategies to discuss.

**The primary issue here is our use of `gogo/protobuf` extensions.** We should strive
to remove these extensions while maintaining backwards compatibility.

#### JSON/YAML (de)serialization (removing `gogoproto.jsontag`)

We rely heavily on `gogo/protobuf`'s custom JSON tags for serialization our messages into JSON.
Additionally, we use a library called [`ghodss/yaml`](https://github.com/ghodss/yaml) to create YAML,
which also relies on these JSON tags.

Each resource is serialized into JSON before being pushed into our backend, which means the
protobuf JSON tags have an impact quite far down our stack. There are a number of potential
mitigations here with differing levels of impact. The primary goal of any mitigation strategy
here should be **migrate away from gogoproto.jsontag extensions.**

##### Serialize objects in backend as the gRPC wire format

Serializing the objects in the backend as the gRPC wire format has a number of benefits:

- Seems to remove the need for most unmarshaling functions in `lib/services`. (confirm?)
- The backend has no need for JSON marshaling/unmarshaling, so JSON changes will not
affect the backend.
- Allows us to remove gogoproto extensions without affecting backend storage.
- Allows us to utilize fields in the backend more efficiently and store larger objects.

The disadvantages here:

- Manually examining the backend is more difficult as the stored values will no longer be
human readable.
- For support cases we will occasionally need to perform surgery in the backend, which
requires manual editing of objects. This will be significantly more difficult with the
backend serialized in the wire format.

##### Write custom storage objects for the backend

Rather than use gRPC's wire format or serialized JSON format, we should consider writing
a storage object format. This has a number of benefits:
Comment on lines +110 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use protojson for this? It's a canonical encoding of protobuf, and I don't see us abandoning protobuf anytime soon. My guess is that if we require a hand-rolled json struct for this we'll end up writing a json struct that's almost field-by-field compatible with the protobuf counterpart and we'll just deserialize-serialize it to convert it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few reasons for doing this:

  • By writing our own storage object, we decouple the API representation of objects from our backend.
  • We can ensure backwards compatibility with existing backends with no migration step, though migration isn't terrible.
  • We can use these objects subsequently for CLI display and maintain the current k8s-like JSON/YAML structure that we've got.

What do you think?


- The backend continues to be easy to read and modify.
- We decouple our protobuf definitions from our storage representation, which makes changes
to the frontend less impactful.
- This can be done without making any changes to the proto definitions.
- It gives us more control over storage representation.
- It makes migration easier, as we can easily maintain multiple versions of an object within
the Teleport codebase.

The downsides:

- It adds to the boilerplate necessary for an object.

##### Write custom marshalers for objects

In order to keep our user facing representations of the objects, we will need to write
custom JSON and YAML marshalers for our objects. This may be time intensive but not overall
a significant amount of work.

#### Embedding types (removing `gogoproto.embed`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this will cause Go-level incompatible changes in api/.

I think we should discuss whether these can be avoided and, if not (or not worth it), what kind of "migration" path we are thinking of implementing. (Ie, how we'll deal with customer breakages and communication, potential e/ breakages, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. I'm not sure we can avoid breakages without keeping gogo intact, or at least a segment of Teleport that still maintains has it. Even getting rid of nullable will cause a Go level breakage here. Maybe we could have an api-legacy that we maintain for 3 versions and then remove?

I don't think we can avoid e breakages without keeping the object interfaces intact, or without doing some deep wizardry on the backend service interfaces, which sounds hideous.

Customer breakages should definitely be well communicated, but I'm not sure of the best mechanism there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Communication and a plan seem good enough in this case. It's worth noting that this could cause pains for a few versions while everything is moved to new formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we currently care much about code-level incompatibility in the API module, fwiw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we caused breakages for customers a few times because of API changes. @zmb3 for feedback on how much we should worry about this.


We heavily use `gogoproto.embed` for generating objects with a consistent structure or interface.
We particularly use these with our `kind`, `version`, and `metadata` fields. This will be
difficult to mitigate without some heavy lifting. As such, I recommend the custom storage object
format listed above and the addition of `FromV*` functions in order to convert protobuf objects
into this object. For example:

```go
accessRequest := accessrequest.FromV1Proto(request.AccessRequest)
user := user.FromV6Proto(request.User)
```

#### Casting types (removing `gogoproto.casttype`)

We additionally rely heavily on `gogoproto.casttype`, which allows us to avoid doing explicit
casts or conversions within go code. We will have to take on this conversion work, unfortunately.

#### Other gogoproto tags

##### `gogoproto.nullable`

We will need to remove the `gogoproto.nullable` tags, which overall shouldn't be a huge impact.
We will need to rely on our `CheckAndSetDefaults` functions, which should generally handle this
today.

##### `gogoproto.stdtime`

We will need to migrate to using `timestamppb` directly instead of relying on gogo's `stdtime` tag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this migration something we can start doing now ? Would this be a wire-compatible change ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK all of these changes should be wire compatible. This is something that we can do now (I'm putting together a POC for it as we speak, actually).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not allowed to break protobuf wire compatibility anyway. 😎


### Proposed solution

#### Create internal objects

Currently, in `api/types` we define a large number of interfaces that are subsequently used to wrap
mdwn marked this conversation as resolved.
Show resolved Hide resolved
our protobuf created messages in a nicer, more user friendly way, abstracting away much of the protobuf
bits and pieces.

We should break this relationship and create actual internal objects for our interfaces. From here, we
should use conversion functions to translate between protobuf messages and our internal objects.

It's imperative that these objects are compatible with the current JSON/YAML representations in Teleport.
That is, the objects that are in Teleport today should unmarshal properly into our new internal objects.
Comment on lines +172 to +173
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we freeze the current resource types and require a version bump (to a "new world" version that uses protojson in the backend and a dedicated service for RPCs) for any future change, to avoid accidental mistakes slipping through, or do you think we can chug along for a while manually adding new fields to the conversion functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good idea. I'm less sold on the protojson purely for the reasons I described in #28386 (comment).


All objects should remain in `api/types` as they do today. (Open question: should this be the case? Should
we relocate objects?)

##### Example

Let's take `UserGroup` as an example, as it's a very simple message. `UserGroup`'s interface as it is
currently defined in `api/types`:

```go
// UserGroup specifies an externally sourced group.
type UserGroup interface {
ResourceWithLabels

// GetApplications will return a list of application IDs associated with the user group.
GetApplications() []string
// SetApplications will set the list of application IDs associated with the user group.
SetApplications([]string)
}
```

For this message, I propose we create an internal implementation for the above interface, along
with a builder to completely abstract the internal representation of the user group object:

```go
type UserGroupBuilder struct {
applications []string
}
mdwn marked this conversation as resolved.
Show resolved Hide resolved

func NewUserGroupBuilder() *UserGroupBuilder {
return &UserGroupBuilder{}
}

func (u *UserGroupBuilder) Applications(applications []string) *UserGroupBuilder {
u.applications = applications
return u
}

func (u *UserGroupBuilder) Build() (UserGroup, error) {
ug := &userGroupImpl{
spec: &userGroupSpec{
applications: u.applications
}
}

if err := ug.CheckAndSetDefaults(); err != nil {
return nil, trace.Wrap(err)
}

return ug, nil
}

type userGroupSpec struct {
Applications []string `json:"applications" yaml:"applications"`
}

type userGroupImpl struct {
// ResourceHeader will contain the version, kind, and metadata fields.
ResourceHeader

Spec *userGroupSpec `json:"spec" yaml:"spec"`
}

func (u *userGroupImpl) CheckAndSetDefaults() error {
...
}

func (u *userGroupImpl) GetApplications() string {
return u.Spec.Applications
}

func (u *userGroupImpl) SetApplications(applications []string) {
u.Spec.Applications = applications
}
...
```

To go between the protobuf message and the internal implementation, we should have:

```go
func FromV1(msg proto.UserGroupV1) (types.UserGroup, error) {
mdwn marked this conversation as resolved.
Show resolved Hide resolved
return NewUserGroupBuilder().
Applications(msg.GetApplications()).
Build()
}
```

##### Strategy

We would need to do this object by object, adding in tests to ensure that marshaling
is not broken between the current messages and the new structs.

I recommend the following path:

1. Create the new implementation object as described above.
2. Write tests that verify that the current messages marshal/unmarshal properly into
the new object. This will require creating message conversion functions as well.
3. Replace existing references to the object with the new builder.
4. The `gogoproto` extensions on the original message are now largely irrelevant, and
can be removed. This will likely require changes to the conversion functions
mentioned above.

### UX

There should be no difference in UX between Teleport pre-migration and post-migration. The process
must be transparent to the user.

### Implementation plan

TBD