Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pods become deregistered then after re-register healthstatus reports still registering #2420

Closed
laurie-kepford opened this issue Dec 20, 2021 · 4 comments

Comments

@laurie-kepford
Copy link

laurie-kepford commented Dec 20, 2021

It seems that containers become unregistered, maybe because of a pod restart or a host gets terminated and the pod gets moved to the new host.

  1. we get 503 error
  2. look at target group for the pod and it shows no registered targets
  3. wait 5 to 10 minutes and pod shows up again. (alternatately manually add the pod IP address to the registered targets)
  4. Application is back up

However:

Result of the following command shows pod is still trying to register: (even many hours later)

kubectl get pod podname -o yaml -n namespace | grep -B7 'type: target-health'

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: null
    message: Target registration is in progress
    reason: Elb.RegistrationInProgress
    status: "True"
    type: target-health.elbv2.k8s.aws/k8s-podname-f9dc5ec48e

The pod eventually registers but this status stays the same.
This is a production system and I have 200 applications on the system and I see this happening every 4 to 6 hours for different apps. If left alone most of them fix themselves after 5 or 10 minutes but waiting 5 or 10 minutes is not an acceptable solution.

My environment:
EKS - kubernetes version 1.19
Rancher - version 2.6.1
Namespace has label:

   labels:
      elbv2.k8s.aws/pod-readiness-gate-inject: enabled

AWS LB controller 2.3

@kishorj
Copy link
Collaborator

kishorj commented Dec 20, 2021

@laurie-kepford, could you provide the following?

  • how you expose your application? NLB or ALB?
  • Target type - IP or instance
  • What is the pod churn rate? Does your application scale up/restart every 4 to 6 hours?
  • are the application pods terminated gracefully?

Also refer to a similar issue #2366 (comment)

@laurie-kepford
Copy link
Author

laurie-kepford commented Dec 20, 2021

how you expose your application? NLB or ALB?
ALB
Target type - IP or instance
IP
What is the pod churn rate? Does your application scale up/restart every 4 to 6 hours?
Pods do not churn during the day normally. However I did notice the following: There are 4 containers in the pod. Two of the 4 seem to be restarting. The pod itself shows as being up for 2 days. Could this be the issue?
are the application pods terminated gracefully?
The pod is not terminating. Containers in the pods re-starting.

I am going to deploy the latest version of my application tonight which should stop the containers from restarting.

@kishorj
Copy link
Collaborator

kishorj commented Dec 22, 2021

@laurie-kepford, if issue persists, do you mind opening a support ticket with AWS support. You could also email your cluster ARN to k8s-alb-controller-triage AT amazon.com.

@laurie-kepford
Copy link
Author

So we had a setting in our app that was causing one of the 4 containers inside the pod to restart. We have resolved that and this problem has now resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants