Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic RESTMapper fails to load new Versions within a pre-existing Group on cache miss #2869

Closed
griffindvs opened this issue Jul 1, 2024 · 2 comments · Fixed by #2875
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@griffindvs
Copy link
Contributor

griffindvs commented Jul 1, 2024

I have a controller that installs other CRDs and controllers (via Operator bundles). After updating sigs.k8s.io/controller-runtime from v0.12.2 to v0.18.4, our controller is now hitting errors trying to create a resource of a given GVK: failed to get restmapping: w.example.com. The CRD in question does exist, and if the controller pod is restarted, the problem disappears. The CRD in question is installed during runtime of the controller, so it does not exist prior to initialization.

This occurs even in the follow situation:

  • CRD for x.example.com with Version v1 exists before installing my controller
  • My controller tries to use w.example.com with Version v2 and fails

It does not occur if:

  • CRD for x.example.com with Version v1 exists before installing my controller
  • CRD for y.example.com with Version v2 exists before installing my controller
  • My controller tries to use w.example.com with Version v2 and succeeds
    • This is notable because the Kind w does not exist with this GroupVersion before initializing the controller
    • In this scenario, differently from the above, this Version does exist in the Group before initializing the controller

I believe this is due to an error in the following chain:

We see the error message from:

func IsGVKNamespaced(gvk schema.GroupVersionKind, restmapper meta.RESTMapper) (bool, error) {
restmapping, err := restmapper.RESTMapping(schema.GroupKind{Group: gvk.Group, Kind: gvk.Kind})
if err != nil {
return false, fmt.Errorf("failed to get restmapping: %w", err)
}

(where we check if a GVK is namespaced, this one is). We can trace this to the restmapper.RESTMapping call:

func (m *mapper) RESTMapping(gk schema.GroupKind, versions ...string) (*meta.RESTMapping, error) {
	res, err := m.getMapper().RESTMapping(gk, versions...)
	if meta.IsNoMatchError(err) {
		if err := m.addKnownGroupAndReload(gk.Group, versions...); err != nil {
			return nil, err
		}
		res, err = m.getMapper().RESTMapping(gk, versions...)
	}

	return res, err
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L117-L127

If we assume that we hit meta.IsNoMatchError, addKnownGroupAndReload should be called with the Group but no Versions provided (since the original IsGVKNamespaced call included no Version).

If we continue to trace this to addKnownGroupAndReload:

func (m *mapper) addKnownGroupAndReload(groupName string, versions ...string) error {
  // versions will here be [""] if the forwarded Version value of
  // GroupVersionResource (in calling method) was not specified.
  if len(versions) == 1 && versions[0] == "" {
  versions = nil
  }
  
  // If no specific versions are set by user, we will scan all available ones for the API group.
  // This operation requires 2 requests: /api and /apis, but only once. For all subsequent calls
  // this data will be taken from cache.
  if len(versions) == 0 {
  apiGroup, err := m.findAPIGroupByName(groupName)
  if err != nil {
	  return err
  }
  if apiGroup != nil {
	  for _, version := range apiGroup.Versions {
		  versions = append(versions, version.Version)
	  }
  }
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L155-L175

We will enter the branch for if len(versions) == 0, where we get the apiGroup using findAPIGroupByName:

// findAPIGroupByNameLocked returns API group by its name.
func (m *mapper) findAPIGroupByName(groupName string) (*metav1.APIGroup, error) {
  // Looking in the cache first.
  {
  m.mu.RLock()
  group, ok := m.apiGroups[groupName]
  m.mu.RUnlock()
  if ok {
	  return group, nil
  }
}

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/apiutil/restmapper.go#L237

This will find the example.com Group in the cache, but the cached version will not include the newly installed Version. It will only include Version v1 from the CRD that was present prior to starting this controller. A new CRD with Version v2 was added later after the initialization.

I believe this was introduced in v0.15.0 when there was a switch to use the lazy RESTMapper: 935faeb

@alvaroaleman
Copy link
Member

/kind bug
Please feel free to submit a fix

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 3, 2024
@griffindvs
Copy link
Contributor Author

/assign @griffindvs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants