-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck in RevisionMissing and "Unknown" state #6265
Comments
We are working on the stuck in 'Unknown' problem. See #5076 Regarding your current problem,
|
I thought it was but then I just tried it again and it worked 😕. Seems to be intermittent.
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2019-12-20T17:07:46Z"
generation: 1
labels:
app: knative-service-2hgd6
serving.knative.dev/configuration: knative-service
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/revision: knative-service-2hgd6
serving.knative.dev/revisionUID: 3e310cb3-234b-11ea-bf77-42010a8000a9
serving.knative.dev/service: knative-service
name: knative-service-2hgd6-deployment
namespace: knative-1-49090
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Revision
name: knative-service-2hgd6
uid: 3e310cb3-234b-11ea-bf77-42010a8000a9
resourceVersion: "3733964"
selfLink: /apis/extensions/v1beta1/namespaces/knative-1-49090/deployments/knative-service-2hgd6-deployment
uid: 3e6110bf-234b-11ea-bf77-42010a8000a9
spec:
progressDeadlineSeconds: 120
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
serving.knative.dev/revisionUID: 3e310cb3-234b-11ea-bf77-42010a8000a9
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
traffic.sidecar.istio.io/includeOutboundIPRanges: '*'
creationTimestamp: null
labels:
app: knative-service-2hgd6
serving.knative.dev/configuration: knative-service
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/revision: knative-service-2hgd6
serving.knative.dev/revisionUID: 3e310cb3-234b-11ea-bf77-42010a8000a9
serving.knative.dev/service: knative-service
spec:
containers:
- env:
- name: PORT
value: "8080"
- name: K_REVISION
value: knative-service-2hgd6
- name: K_CONFIGURATION
value: knative-service
- name: K_SERVICE
value: knative-service
image: gcr.io/istio-testing/app@sha256:1691b71601c9ad4fe7a003cf295ae58bbc01ef753e393ac36acf1c03f6f53d56
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
httpGet:
path: /wait-for-drain
port: 8022
scheme: HTTP
name: user-container
ports:
- containerPort: 8080
name: user-port
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/log
name: knative-var-log
- env:
- name: SERVING_NAMESPACE
value: knative-1-49090
- name: SERVING_SERVICE
value: knative-service
- name: SERVING_CONFIGURATION
value: knative-service
- name: SERVING_REVISION
value: knative-service-2hgd6
- name: QUEUE_SERVING_PORT
value: "8012"
- name: CONTAINER_CONCURRENCY
value: "0"
- name: REVISION_TIMEOUT_SECONDS
value: "300"
- name: SERVING_POD
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SERVING_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: SERVING_LOGGING_CONFIG
value: |-
{
"level": "info",
"development": false,
"outputPaths": ["stdout"],
"errorOutputPaths": ["stderr"],
"encoding": "json",
"encoderConfig": {
"timeKey": "ts",
"levelKey": "level",
"nameKey": "logger",
"callerKey": "caller",
"messageKey": "msg",
"stacktraceKey": "stacktrace",
"lineEnding": "",
"levelEncoder": "",
"timeEncoder": "iso8601",
"durationEncoder": "",
"callerEncoder": ""
}
}
- name: SERVING_LOGGING_LEVEL
- name: SERVING_REQUEST_LOG_TEMPLATE
- name: SERVING_REQUEST_METRICS_BACKEND
value: prometheus
- name: TRACING_CONFIG_BACKEND
value: none
- name: TRACING_CONFIG_ZIPKIN_ENDPOINT
- name: TRACING_CONFIG_STACKDRIVER_PROJECT_ID
- name: TRACING_CONFIG_DEBUG
value: "false"
- name: TRACING_CONFIG_SAMPLE_RATE
value: "0.100000"
- name: USER_PORT
value: "8080"
- name: SYSTEM_NAMESPACE
value: knative-serving
- name: METRICS_DOMAIN
value: knative.dev/internal/serving
- name: USER_CONTAINER_NAME
value: user-container
- name: ENABLE_VAR_LOG_COLLECTION
value: "false"
- name: VAR_LOG_VOLUME_NAME
value: knative-var-log
- name: INTERNAL_VOLUME_PATH
value: /var/knative-internal
- name: SERVING_READINESS_PROBE
value: '{"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}'
- name: ENABLE_PROFILING
value: "false"
- name: SERVING_ENABLE_PROBE_REQUEST_LOG
value: "false"
image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:077d82a8f7b3f8c645e95abdf20acf1f1c5ea4d2215aa43ac707920914db5cf8
imagePullPolicy: IfNotPresent
name: queue-proxy
ports:
- containerPort: 8022
name: http-queueadm
protocol: TCP
- containerPort: 9090
name: queue-metrics
protocol: TCP
- containerPort: 9091
name: http-usermetric
protocol: TCP
- containerPort: 8012
name: queue-port
protocol: TCP
readinessProbe:
exec:
command:
- /ko-app/queue
- -probe-period
- "0"
failureThreshold: 3
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 25m
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 300
volumes:
- emptyDir: {}
name: knative-var-log
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2019-12-20T17:07:51Z"
lastUpdateTime: "2019-12-20T17:07:51Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2019-12-20T17:07:46Z"
lastUpdateTime: "2019-12-20T17:07:51Z"
message: ReplicaSet "knative-service-2hgd6-deployment-75757bb8b6" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
kind: List
metadata:
resourceVersion: ""
selfLink: ""
With the above info I grabbed the logs but forgot to copy them and the deployment got deleted so I lost the state 🙁. There were messages about
|
As far as reproducing, the steps are:
|
Having the same issue with serving 0.11.1 as well. |
I have automation with minikube that creates a bunch of resources right after the cluster is stood up (istio, knative, etc). I'm seeing similar symptoms with a ksvc. I can reproduce it consistently. I'm also using serving |
Update: After leaving the cluster running all day, the issue eventually came back. The workaround from above seems only reliable for preventing this from a consistent repro in a brand new install. Let me know if there's additional debug data needed. |
Update: I just tested with |
Same problem on v0.12.0 - totally a showstopper |
Same problem on v0.11.0,k8s v.15.4(binary setup),centos7.7.1908. |
/assign @tcnghia Assigning to Nghia as this affects our ability to smoke test with Istio, but if folks have an active repro of this, feel free to DM me on slack ( |
Same problem on v0.12.0,kubeadm(v.17.0),centos7.7.1908.
sample app info below
Revision describe info
|
It would be helpful to share |
|
I don't have an @google.com, @pivotal.io, or @redhat.com email address,my gmail is ysjjovo@gmail.com |
cc @vagababov KPA not becoming ready. Anyone should be able to sign up with an invite from slack.knative.dev 🤔 |
Can you post the logs from kubectl logs -n knative-serving $(kubectl get po -n knative-serving | egrep "autoscaler-[^h]" | cut -f1 -d' ') | less If it's crashing or not starting the file should be short. |
That's all logs,seems normal.
|
Yeah there's no crash here. |
All pods are in running status.
|
/cc @mattmoor |
I just tried another time,It worked!
But it's still unknown status with another Environment:
Compare to normal sample,its service section lack for 'service/stock-service-example-first' and 'service/stock-service-example-first-private',Is there some problem with k8s installation. |
I never said the autoscaler didn't start, I said the KPA didn't become ready. In fact, the resource is never even initialized (from above):
|
Why KPA didn't become ready? |
I saw some abnormal messages in k8s binary installation environment.
|
Could someone help me out? |
@vagababov ping on this since it's the KPA failing to become ready |
Probably the PA type annotation is missing. |
Thanks for your reply! [root@xxx samples]# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/stock-service-example-first-deployment-7bcd589f7b-kj5ss 1/2 Running 0 5m1s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 172.254.0.1 <none> 443/TCP 9d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/stock-service-example-first-deployment 0/1 1 0 5m1s
NAME DESIRED CURRENT READY AGE
replicaset.apps/stock-service-example-first-deployment-7bcd589f7b 1 1 0 5m1s
NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON
revision.serving.knative.dev/stock-service-example-first stock-service-example 1 Unknown Deploying
NAME URL READY REASON
route.serving.knative.dev/stock-service-example http://stock-service-example.default.example.com Unknown RevisionMissing
NAME LATESTCREATED LATESTREADY READY REASON
configuration.serving.knative.dev/stock-service-example stock-service-example-first Unknown
NAME URL LATESTCREATED LATESTREADY READY REASON
service.serving.knative.dev/stock-service-example http://stock-service-example.default.example.com stock-service-example-first Unknown RevisionMissing
[root@xxx samples]# kubectl get rev stock-service-example-first -oyaml
apiVersion: serving.knative.dev/v1
kind: Revision
metadata:
creationTimestamp: "2020-03-04T08:43:41Z"
generation: 1
labels:
serving.knative.dev/configuration: stock-service-example
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/service: stock-service-example
name: stock-service-example-first
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Configuration
name: stock-service-example
uid: e5bc55c3-e9e5-4408-89f7-16458c9eb1ff
resourceVersion: "1471608"
selfLink: /apis/serving.knative.dev/v1/namespaces/default/revisions/stock-service-example-first
uid: 9fd497b7-ceba-4d31-86cf-733dbc5bb929
spec:
containerConcurrency: 0
containers:
- env:
- name: RESOURCE
value: stock
image: dev.local/rest-api-go:0.11.0
imagePullPolicy: Never
name: user-container
readinessProbe:
httpGet:
path: /
port: 0
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources: {}
timeoutSeconds: 300
status:
conditions:
- lastTransitionTime: "2020-03-04T08:43:41Z"
reason: Deploying
severity: Info
status: Unknown
type: Active
- lastTransitionTime: "2020-03-04T08:43:41Z"
reason: Deploying
status: Unknown
type: ContainerHealthy
- lastTransitionTime: "2020-03-04T08:43:41Z"
reason: Deploying
status: Unknown
type: Ready
- lastTransitionTime: "2020-03-04T08:43:41Z"
reason: Deploying
status: Unknown
type: ResourcesAvailable
logUrl: http://localhost:8001/api/v1/namespaces/knative-monitoring/services/kibana-logging/proxy/app/kibana#/discover?_a=(query:(match:(kubernetes.labels.knative-dev%2FrevisionUID:(query:'9fd497b7-ceba-4d31-86cf-733dbc5bb929',type:phrase))))
observedGeneration: 1
[root@xxx samples]# kubectl get podautoscaler stock-service-example-first -oyaml
apiVersion: autoscaling.internal.knative.dev/v1alpha1
kind: PodAutoscaler
metadata:
creationTimestamp: "2020-03-04T08:43:41Z"
generation: 2
labels:
app: stock-service-example-first
serving.knative.dev/configuration: stock-service-example
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/revision: stock-service-example-first
serving.knative.dev/revisionUID: 9fd497b7-ceba-4d31-86cf-733dbc5bb929
serving.knative.dev/service: stock-service-example
name: stock-service-example-first
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Revision
name: stock-service-example-first
uid: 9fd497b7-ceba-4d31-86cf-733dbc5bb929
resourceVersion: "1471613"
selfLink: /apis/autoscaling.internal.knative.dev/v1alpha1/namespaces/default/podautoscalers/stock-service-example-first
uid: c8fdc33a-fd9e-4b93-a512-900b9f376b67
spec:
protocolType: http1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: stock-service-example-first-deployment |
As I presumed you don't have the annotation:
and hence PA is not initialized.
|
I tried to add annotation you mentioned.All things remain abnormal state.It seems that it dose not work.Use command
|
Can you get logs from the autoscaler as well? For some reason it's missing above. Finally |
You presumption is right!user container didn't start,because I used a wrong image
apiVersion: serving.knative.dev/v1
kind: Revision
metadata:
creationTimestamp: "2020-03-06T03:02:04Z"
generateName: helloworld-go-
generation: 1
labels:
serving.knative.dev/configuration: helloworld-go
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/service: helloworld-go
name: helloworld-go-vfsdz
namespace: default
ownerReferences:
- apiVersion: serving.knative.dev/v1alpha1
blockOwnerDeletion: true
controller: true
kind: Configuration
name: helloworld-go
uid: f003dc1f-fe8b-4a62-b7ca-e9ba07e28ece
resourceVersion: "1808216"
selfLink: /apis/serving.knative.dev/v1/namespaces/default/revisions/helloworld-go-vfsdz
uid: 70d2ecd0-cf74-470e-83bc-569f0467eb3c
spec:
containerConcurrency: 0
containers:
- env:
- name: TARGET
value: Go Sample v1
image: dev.local/helloworld-go
imagePullPolicy: Never
name: user-container
readinessProbe:
successThreshold: 1
tcpSocket:
port: 0
resources: {}
timeoutSeconds: 300
status:
conditions:
- lastTransitionTime: "2020-03-06T03:02:04Z"
reason: Deploying
severity: Info
status: Unknown
type: Active
- lastTransitionTime: "2020-03-06T03:02:04Z"
reason: Deploying
status: Unknown
type: ContainerHealthy
- lastTransitionTime: "2020-03-06T03:02:04Z"
reason: Deploying
status: Unknown
type: Ready
- lastTransitionTime: "2020-03-06T03:02:04Z"
reason: Deploying
status: Unknown
type: ResourcesAvailable
logUrl: http://localhost:8001/api/v1/namespaces/knative-monitoring/services/kibana-logging/proxy/app/kibana#/discover?_a=(query:(match:(kubernetes.labels.knative-dev%2FrevisionUID:(query:'70d2ecd0-cf74-470e-83bc-569f0467eb3c',type:phrase))))
observedGeneration: 1
|
Otherwise the system seems to be in order... |
thanks for bearing with us! Grabbing the webhook logs would be a good next step. Alternatively if this is something you can reproduce easily - if you could show the script that setup your cluster (or explain it) - we could try to reproduce it ourselves |
Thanks for your patience!
Steps to reproduce it below.
# Download and unpack Istio
export ISTIO_VERSION=1.4.3
curl -L https://git.io/getLatestIstio | sh -
cd istio-${ISTIO_VERSION}
# install the Istio CRDs
for i in install/kubernetes/helm/istio-init/files/crd*yaml; do kubectl apply -f $i; done
# Create istio-system namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: istio-system
labels:
istio-injection: disabled
EOF
# A lighter template, with just pilot/gateway
helm template --namespace=istio-system \
--set prometheus.enabled=false \
--set mixer.enabled=false \
--set mixer.policy.enabled=false \
--set mixer.telemetry.enabled=false \
`# Pilot doesn't need a sidecar.` \
--set pilot.sidecar=false \
--set pilot.resources.requests.memory=128Mi \
`# Disable galley (and things requiring galley).` \
--set galley.enabled=false \
--set global.useMCP=false \
`# Disable security / policy.` \
--set security.enabled=false \
--set global.disablePolicyChecks=true \
`# Disable sidecar injection.` \
--set sidecarInjectorWebhook.enabled=false \
--set global.proxy.autoInject=disabled \
--set global.omitSidecarInjectorConfigMap=true \
--set gateways.istio-ingressgateway.autoscaleMin=1 \
--set gateways.istio-ingressgateway.autoscaleMax=2 \
`# Set pilot trace sampling to 100%` \
--set pilot.traceSampling=100 \
--set global.mtls.auto=false \
install/kubernetes/helm/istio \
> ./istio-lean.yaml
kubectl apply -f istio-lean.yaml
kubectl get pods --namespace istio-system
yaml="https://github.com/ysjjovo/knative-tutorial/blob/master/install/3-knative/source/core/0.12.0/serving.yaml"
kubectl apply --selector knative.dev/crd-install=true -f $yaml
echo 'CRDS install completed!'
kubectl apply -f $yaml My knative installation is some how strange because of network blockade in china.
Then import it to my company internal environment and tag with 'pcr-sz.paic.com.cn/knative-releases/serving-cmd-queue:0.12.0'.
|
Sorry that last sentence there says it does not work on 1.17.3 but the line above says it works on 1.17.3. Can you clarify which version you're seeing the problem on? |
FYI - I tried the following with success: 1. cluster setupminikube start --kubernetes-version=v1.17.3 2. istio setupsame as above 3. knative setupkubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/serving/releases/download/v0.12.0/serving.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/v0.12.0/serving.yaml 4. knative service installationcat <<EOF | kubectl apply -f -
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
env:
- name: TARGET
value: "Go Sample v1"
EOF 5. wait for service to become ready$ kubectl get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
helloworld-go http://helloworld-go.default.example.com helloworld-go-mrwbr Unknown RevisionMissing
$ kubectl get ksvc
NAME URL LATESTCREATED LATESTREADY READY REASON
helloworld-go http://helloworld-go.default.example.com helloworld-go-mrwbr helloworld-go-mrwbr True |
I'am sorry for confusing you.I mean installing k8s from binaries does not work.kubeadm works fine.Maybe,there are some problems with k8s installation from binaries. |
@ysjjovo I'm going to close this out - if you have any updates with specific steps that reproduce this error feel free to re-open. |
I had this same issue on a bare-metal installation of K8s v1.15, after several hours of debugging, trying different Knative versions and log checking I found that my K8s installation had the |
@jsargiot were you using a tool to deploy K8s? or manually? |
Manually. Actually the missing admission controller was the |
I was getting an similar error while deploying an inference service, changing the |
In what area(s)?
What version of Knative?
v0.11.0
Expected Behavior
Service is created successfully
Actual Behavior
Resources stuck in various bad states:
Basically everything is unknown with no real indication about what is going wrong
Steps to Reproduce the Problem
I am trying to get some knative smoke tests integrated into Istio's tests so we don't break things accidentally. See PR istio/istio#19675. It seems fairly reproducible on a fresh cluster, running those steps.
I am probably doing something wrong, but all of the status messages and logs are not leading me in the right direction
The text was updated successfully, but these errors were encountered: