Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Get after Create using Client with Custom Resource #343

Closed
michaelkipper opened this issue Feb 28, 2019 · 22 comments
Closed

Cannot Get after Create using Client with Custom Resource #343

michaelkipper opened this issue Feb 28, 2019 · 22 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@michaelkipper
Copy link

Hi!

I'm tearing my hair out trying to debug this issue.

func testCreation(c client.Client) error {
	config := getTestCuscoServiceConfig()

	err := c.Create(context.Background(), &config)
	if err != nil {
		return errors.Wrap(err, "Could not create CuscoServiceConfig")
	}

	new_csc := v1.CuscoServiceConfig{}
	err = c.Get(context.Background(), client.ObjectKey{Name: "test-config", Namespace: "cusco"}, &new_csc)
	if err != nil {
		return errors.Wrap(err, "Could not get CuscoServiceConfig")
	}

	return nil
}

When run, I get this error:

Could not get CuscoServiceConfig: CuscoServiceConfig.cusco.shopify.io \"test-config\" not found"

Outside the program, though:

$ kubectl get cuscoserviceconfigs test-config -n cusco -o yaml
apiVersion: cusco.shopify.io/v1
kind: CuscoServiceConfig
metadata:
  creationTimestamp: "2019-02-28T15:35:09Z"
  generation: 1
  name: test-config
  namespace: cusco
  resourceVersion: "21614"
  selfLink: /apis/cusco.shopify.io/v1/namespaces/cusco/cuscoserviceconfigs/test-config
  uid: 6e8bb6e3-3b6e-11e9-932e-02423b5580eb
spec:
  init-container:
    enabled: false
    percentage: 75
  istio-version: 1.1.0
  proxy:
    enabled: true
    percentage: 25

Stepping through the code, my theory is that the Create call doesn't store the object in the cache, and the Get call only looks at the cache.

How should I go about debugging this?
Is this because CuscoServiceConfig is a CustomResource?

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Feb 28, 2019

In a controller, you always write to the API server and read from the cache (unless you manually construct a Client). You can't expect to read the object you wrote immediately (and, in fact, there's generally a bit of delay). What are you trying to accomplish with the immediate Get? You'll get back the created object as part of the call to create.

EDIT: clarification for posterity

@michaelkipper
Copy link
Author

@DirectXMan12: This was a snippet of something that's happening in an integration test that's failing. I would have expected read-after-write to be strongly consistent.

If this is not a guarantee, I can always drop a WaitForResource into the test.

@DirectXMan12
Copy link
Contributor

how did you get the client for the integration test?

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Feb 28, 2019

direct clients (client.New) should have the property you desire, since they always talk to the API server (I figured that was a snippet from a controller -- sorry for the confusion)

@michaelkipper
Copy link
Author

The client comes from manager.New(clientConfig, *managerOptions)

A rough snippet:

import (
	"sigs.k8s.io/controller-runtime/pkg/client/config"
	"sigs.k8s.io/controller-runtime/pkg/manager"
)

mgrOptions := &manager.Options{
	MetricsBindAddress: *metricsAddr,
	Scheme:             scheme.Scheme,
}
...
clientConfig, err := config.GetConfig()
mgr, err := manager.New(clientConfig, *managerOptions)
...
client := mgr.GetClient()

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Mar 4, 2019

ah, yeah, this is probably a bit of a confusing statement, but don't use the manager client in tests. The manager-provided client is designed to do the right thing for controllers by default (which is to read from caches, meaning that it's not strongly consistent), which means it probably does the wrong thing for tests (which almost certainly want strong consistency).

The suggested pattern is to construct a new client using client.New for tests (which will provide you with a direct API client that doesn't have any special behavior).

This is something that we need to document better, but generally, my *_suite_test.go files (the general suite setup for a particular package) look like:

var testenv *envtest.Environment
var cfg *rest.Config
var client client.Client

var _ = BeforeSuite(func() {
    logf.SetLogger(zap.LoggerTo(GinkgoWriter, true))

    testenv = &envtest.Environment{}

    var err error
    cfg, err = testenv.Start()
    Expect(err).NotTo(HaveOccurred())

    client, err = client.New(cfg, client.Options{Scheme: myscheme.Scheme})
    Expect(err).NotTo(HaveOccurred())
})

var _ = AfterSuite(func() {
    testenv.Stop()
})

(this is very similar to a lot of the test suite setup files in controller-runtime itself)

@shawn-hurley
Copy link

shawn-hurley commented Mar 4, 2019

Might be easier to document if we can instead tell folks to use manger.APIReader from this PR: #327

thoughts?

@DirectXMan12
Copy link
Contributor

yeah, once we get that merged, we can recommend that as well, although constructing a new client may be easier in some circumstances.

@DirectXMan12
Copy link
Contributor

(also, in case it wasn't clear, while it's fine (and generally preferable) to write tests expecting read-after-write consistency, you shouldn't do that for controllers. Kubernetes favors a kind-of 2-phase approach -- do your reads, process, do some writes, and return, letting the requeuing deal with the next cycle of reads if they're necessary, but don't try to read an object after you've written it).

@DirectXMan12
Copy link
Contributor

@michaelkipper does that answer your question?

@runzexia
Copy link

My first Reconciler completed the creation of the job and triggered the second Reconciler.
But when I get the job at the second Reconciler, there will be some time 404.
@DirectXMan12
user create my crd -> if job not exist -> get job -> create job

@DirectXMan12
Copy link
Contributor

if it's actually the create that's triggering the second reconciler, that shouldn't happen.

@cizixs
Copy link

cizixs commented Apr 29, 2019

In my case, I can not get the CRD resource, but list returns correct values.

// this works fine
myCRDList := &foov1beta1.FooList{}
g.Eventually(func() error { return c.List(context.TODO(), &client.ListOptions{}, myCRDList) }).Should(gomega.Succeed())

// this returns 404 not found error
foo := &foov1beta1.Foo{}
depKey := types.NamespacedName{Name: "foo-1")}
g.Eventually(func() error { return c.Get(context.TODO(), depKey, foo) }, timeout).Should(gomega.Succeed())

Is cache-reading relevant here?

@DirectXMan12

@DirectXMan12
Copy link
Contributor

you shouldn't be reading from the cache in a test -- don't use mgr.GetClient in your test methods. Instead, use client.New to get a direct client.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2019
@DirectXMan12
Copy link
Contributor

closing this as no response

@edwardstudy
Copy link

@DirectXMan12 Sorry about interrupting this closed issue.

I had the same situation that we can list resources but could not get the resource, and these actions were not happening after creations. Do we have any different path from local cache between List() and Get().

In my case, I can not get the CRD resource, but list returns correct values.

// this works fine
myCRDList := &foov1beta1.FooList{}
g.Eventually(func() error { return c.List(context.TODO(), &client.ListOptions{}, myCRDList) }).Should(gomega.Succeed())

// this returns 404 not found error
foo := &foov1beta1.Foo{}
depKey := types.NamespacedName{Name: "foo-1")}
g.Eventually(func() error { return c.Get(context.TODO(), depKey, foo) }, timeout).Should(gomega.Succeed())

Is cache-reading relevant here?

@DirectXMan12

@DirectXMan12
Copy link
Contributor

Do we have any different path from local cache between List() and Get().

No, that shouldn't happen. Is Foo namespaced?

Aside: I'd highly reccomend using a live client in the test bodies themselves, otherwise you'll get flaky tests (the standard caching client is fine in the controllers themselves, but for your test assertions, the live client is much better generally).

@edwardstudy
Copy link

edwardstudy commented Nov 7, 2019

@DirectXMan12 Hi, they're kube native resources. I could not get specified StatefulSet but can list them. And yes, them are namespaced resources. And it's in our reconciliation loop.

@DirectXMan12
Copy link
Contributor

since they're namespaced, you'll need depKey := types.NamespacedName{Name: "foo-1", Namespace: whateverTheNamespaceIs}

@edwardstudy
Copy link

@DirectXMan12 Sorry for late. correct my sentence that my request had right namespace. The point is that I could list them, but could not get it.

As the issue suggested, I got the resource from the direct client for a workaround.

@nobody4t
Copy link

nobody4t commented Jun 7, 2020

I have the same issue.
I created the controller, with clientset code generated. Its kubernetes version is 1.13.1.

Now I have my operator,based on operator-sdk 0.15.1. I try to get/list the controller CRD from my controller. But I can not. The operator uses this api. The result is empty, both of them as below:
List:

&{{ } {      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []} { false }}

Get:

&{{ } {   <nil>} []}

But I can use the clientset of the controller get the right result.

What confuses me is I can get/list all the CRD with kubectl, but not with this api. Not sure if I missed something?

matthchr added a commit to matthchr/azure-service-operator that referenced this issue Nov 5, 2021
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Nov 5, 2021
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Nov 5, 2021
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Nov 8, 2021
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
matthchr added a commit to matthchr/azure-service-operator that referenced this issue Nov 8, 2021
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
matthchr added a commit to Azure/azure-service-operator that referenced this issue Nov 8, 2021
* Reduce test flakiness
 * Use a KubeClient in EnvTest that always hits APIServer to avoid
   cache latency and inconsistency issues that can cause races and
   intermittent test failures. See
   kubernetes-sigs/controller-runtime#1464
   and
   kubernetes-sigs/controller-runtime#343
   for details.
 * Write Status after Spec. This ensures that tests waiting for a
   status update cannot possibly see it so fast that they go on
   to perform a write that conflicts with the Spec write of the
   controller.
 * Improve log messages to be clearer (aids in test debugging).
* Verify that finalizer is set before delete

Prior to performing any deletion actions (either monitoring or
deletion), verify that the finalizer is still set. If it's not set,
immediately allow the k8s resource to be deleted. Don't take any action
on the Azure resource in this instance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

9 participants