Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Metricbeat/Filebeat k8s manifests on Openshift #17516

Closed
ChrsMark opened this issue Apr 6, 2020 · 10 comments
Closed

Testing Metricbeat/Filebeat k8s manifests on Openshift #17516

ChrsMark opened this issue Apr 6, 2020 · 10 comments
Assignees
Labels
containers Related to containers use case Team:Platforms Label for the Integrations - Platforms team

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Apr 6, 2020

Meta issue to keep track of testing process/results of mb/fb k8s manifests on Openshift.
Related to #13054

Testing env:

$ minishift version 
minishift v1.34.2+83ebaab

Metricbeat

Version: docker.elastic.co/beats/metricbeat:7.6.2
All the testing follows the instructions of:
https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-kubernetes.html#_red_hat_openshift_configuration

Deployment manifest

Testing passed: ✅

This manifest launches Metricbeat as a k8s Deployment and its target is to collect metrics from kube-state-metrics.
Having kube-state-metrics deployed it works out of the box. Events are populated for at least the metricsets below:

- state_node
- state_deployment
- state_replicaset
- state_pod
- state_container

Daemonset manifest

Testing passed: ❌

  • system module: ✅
  • kubernetes module (kubelet API): ✅
  • kube-proxy: ❌

This manifest launches Metricbeat as a k8s Daemonset to all k8s nodes and its target is to collect system metrics using system module, kubelet metrics by accessing kubelet API as well as metrics from kube-proxy.

Issues:

access to data volume

oc -n kube-system logs -f metricbeat-zsfwt
2020-04-06T10:58:56.615Z        INFO    instance/beat.go:622    Home path: [/usr/share/metricbeat] Config path: [/usr/share/metricbeat] Data path: [/usr/share/metricbeat/data] Logs path: [/usr/share/metricbeat/logs]
2020-04-06T10:58:56.615Z        INFO    instance/beat.go:372    metricbeat stopped.
2020-04-06T10:58:56.616Z        ERROR   instance/beat.go:933    Exiting: Failed to create Beat meta file: open /usr/share/metricbeat/data/meta.json.new: permission denied

Resolved: ✅ by removing

added in #17429.

This could also be resolved by adding privileged: true under securityContext. Would this be sth that we want? cc @jsoriano @exekias
Pull request: #17606

unable to access Kubelet api

2020-04-06T11:17:26.250Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.pod: error doing HTTP request to fetch 'pod' Metricset data: error making http request: Get https://minishift:10250/stats/summary: x509: certificate is valid for localhost, 127.0.0.1, not minishift
2020-04-06T11:17:26.259Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.container: error doing HTTP request to fetch 'container' Metricset data: error making http request: Get https://minishift:10250/stats/summary: x509: certificate is valid for localhost, 127.0.0.1, not minishift

Resolved: ✅ by using NODE_NAME. It is already upstream with #17469

unable to retrieve metrics from kube-proxy

2020-04-06T11:35:37.554Z        INFO    module/wrapper.go:252   Error fetching data for metricset kubernetes.proxy: error getting processed metrics: error making http request: Get http://localhost:10249/metrics: dial tcp 127.0.0.1:10249: connect: connection refused

Notes:

  1. kube-proxy pod runs in kube-proxy namespace and not kube-system. Not that it would be the issue since we are trying to access host's network.
  2. What is actually running in this proxy:
kubectl -n kube-proxy exec -it kube-proxy-hj5zb /bin/bash           
[root@minishift origin]# curl http://0.0.0.0:10249/metrics
curl: (7) Failed connect to 0.0.0.0:10249; Connection refused
[root@minishift origin]# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  0.6 475736 28220 ?        Ssl  10:04   0:10 openshift start network --enable=proxy --listen=https://0.0.0.0:8444 --config=/etc/origin/node/node-config.yaml
root      5784  0.4  0.1  20244  7036 ?        Ss   11:55   0:00 /bin/bash
root      5831  0.0  0.0  55184  1844 ?        R+   11:55   0:00 ps aux

Info:

Containers:
  kube-proxy:
    Container ID:  docker://df082ba84972a35edbf051ee0be0d14bc07f8df2bb7ca827cfe391de0b43407c
    Image:         openshift/origin-control-plane:v3.11.0

Resolved: ❌, we have to remove it from the manifest since it has not impact.

Result:
"resolving" the issues above, everything works as expected:

  • system metrcisets are sending events
  • k8s metricsets are sending events

The above could be a validation check for CI.

Filebeat

Version: docker.elastic.co/beats/filebeat:7.6.2
All the testing follows the instructions of: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html

Daemonset manifest

Note that for minishift the default docker logging driver is journald. In order to by able to collect logs with Filebeat one need to start minishift with:

minishift start --docker-opt log-driver=json-file

Daemonset manifest

Testing passed: ✅

  1. Logs are being collected
  2. add_kubernetes_metadata enriches the events with k8s metadata.
  3. autodiscover with hints work
@ChrsMark ChrsMark self-assigned this Apr 6, 2020
@ChrsMark ChrsMark added containers Related to containers use case Team:Platforms Label for the Integrations - Platforms team labels Apr 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@ChrsMark
Copy link
Member Author

ChrsMark commented Apr 9, 2020

Heads up on this. After #17606 was merged the manifests can be successfully deployed on Openshift (minishift v1.34.2+83ebaab) except for the kube-proxy module (see description above for more info).

Metricbeat and Filebeat docs seem to be updated.

So maybe the testing can be considered successful for now on minishift (my next steps are to test on other installations too) and maybe we can open a separate issue for kube-proxy and add a comment for it in the manifest that is not supported on Openshift? @exekias wdyt?

@exekias
Copy link
Contributor

exekias commented Apr 15, 2020

Awesome work here! +1 to creating a different issue for kube-proxy. In any case I would keep this one open until you test with an Openshift installation.

I think this should also be a good opportunity to add some ideas here on what can we do for Openshift on top of our vanilla kubernetes monitoring.

@ChrsMark
Copy link
Member Author

ChrsMark commented Apr 21, 2020

Testing on GCP installation

Building a cluster with openshift-install.
Official docs: https://docs.openshift.com/container-platform/4.2/installing/installing_gcp/installing-gcp-customizations.html#installing-gcp-customizations

Installer Version:

./openshift-install 4.3.10
built from commit 0a7f198c1d5667152ef5982fd2fc0c22abc4336f
release image quay.io/openshift-release-dev/ocp-release@sha256:edb4364367cff4f751ffdc032bc830a469548f998127b523047a8dd518c472cd

Filebeat

Version: docker.elastic.co/beats/filebeat:7.6.2
All the testing follows the instructions of: https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html

Daemonset manifest

Testing passed: ✅

  • Logs are being collected
  • add_kubernetes_metadata enriches the events with k8s metadata.
  • add_cloud_metadata enriches the events with cloud metadata.
  • autodiscover with hints work

Metricbeat

Version: docker.elastic.co/beats/metricbeat:7.6.2
All the testing follows the instructions of: https://www.elastic.co/guide/en/beats/metricbeat/master/running-on-kubernetes.html

Deployment manifest

Testing passed: ✅

This manifest launches Metricbeat as a k8s Deployment and its target is to collect metrics from kube-state-metrics.
Having kube-state-metrics deployed it works out of the box. Events are populated for at least the metricsets below:

- state_node
- state_deployment
- state_replicaset
- state_pod
- state_container

Daemonset manifest

Testing passed: Partially ✅

  1. system module: ✅
  2. kubernetes module (kubelet API): Partially ✅
  3. kube-proxy: ❌. (it does not expose metrics)

Kubelet API issue:
2020-04-21T10:04:20.666Z INFO module/wrapper.go:252 Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get https://ocp-be-c5kjr-w-a-fqn92.c.elastic-observability.internal:10250/stats/summary: x509: certificate signed by unknown authority
Need to add ssl.verification_mode: "none".

curl -H "Authorization: Bearer $token" --cacert /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt -v https://${NODE_NAME}:10250/stats/summary
* About to connect() to ocp-be-c5kjr-w-b-jhqvn.c.elastic-observability.internal port 10250 (#0)
*   Trying 10.0.32.3...
* Connected to ocp-be-c5kjr-w-b-jhqvn.c.elastic-observability.internal (10.0.32.3) port 10250 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
  CApath: none
* Server certificate:
* 	subject: CN=system:node:ocp-be-c5kjr-w-b-jhqvn.c.elastic-observability.internal,O=system:nodes
* 	start date: Apr 16 09:07:00 2020 GMT
* 	expire date: May 16 08:58:24 2020 GMT
* 	common name: system:node:ocp-be-c5kjr-w-b-jhqvn.c.elastic-observability.internal
* 	issuer: CN=kube-csr-signer_@1587027822
* NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
[root@ocp-be-c5kjr-w-b-jhqvn metricbeat]# env | grep NODE_NAME
NODE_NAME=ocp-be-c5kjr-w-b-jhqvn.c.elastic-observability.internal

While on minishift:

[root@minishift metricbeat]# env | grep NODE_NAME
NODE_NAME=localhost

I think this has to do with the way the the k8s cluster is setup, for instance there have been similar certificate issues with kops installations.

@ChrsMark
Copy link
Member Author

ChrsMark commented Apr 21, 2020

Testing seems to be completed. So far docs for Filebeat and Metricbet seem to be updated and someone can deploy Metricbeat and Filebeat on Openshift without significant issues.

In order to consider it done for I propose:

  1. Add seperate issue for kube-proxy investigation (Kube-proxy support for Openshift #17863)
  2. Add comment in the manifest that kube-proxy is not supported right now on Openshift. (Improve Openshift documentation #17867)
  3. [question]: I don't know if we can rely on
ssl.certificate_authorities:
  - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt

since it seem to work on minishift out of the box but not on GCP installation. What would be the best option to have it documented?

@elastic/integrations-platforms would like a review on the above proposal.

@jsoriano
Copy link
Member

since it seem to work on minishift out of the box but not on GCP installation. What would be the best option to have it documented?

Is the CA available in other place for GCP installations? I think we could add a comment about this in the manifest, as we have the comment for the service-ca.crt.
I would try to avoid recommending the use of ssl.verification_mode: "none".

@ChrsMark
Copy link
Member Author

@jsoriano the secrets' directory of serviceaccount looks like this:

[root@ocp-be-c5kjr-w-b-jhqvn metricbeat]# ls /var/run/secrets/kubernetes.io/serviceaccount/
ca.crt  namespace  service-ca.crt  token

Trying to access the API manually with:

curl -H "Authorization: Bearer $token" \
--cacert  /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt \
-v https://${NODE_NAME}:10250/stats/summary

I get:

* NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
* Peer's Certificate issuer is not recognized.
* Closing connection 0
curl: (60) Peer's Certificate issuer is not recognized.

@jsoriano
Copy link
Member

@ChrsMark is it possible that the CA is available in other place (not as a Kubernetes secret, but in some GCP endpoint)? Did you try to use the other ca.crt?

Though according to this unanswered question this may be a common issue 🤔 https://access.redhat.com/discussions/4841441

@ChrsMark
Copy link
Member Author

After digging into it along with @jsoriano's help it comes out that the proper CA to use (Openshift GCP installer specific) is the one that can be found in kubelet-serving-ca configmap:

Name:         kubelet-serving-ca
Namespace:    openshift-kube-apiserver
Labels:       <none>
Annotations:  <none>

Data
====
ca-bundle.crt:

Steps to fix the issue:

  1. kubectl get configmap kubelet-serving-ca --namespace=openshift-kube-apiserver --export -o yaml | kubectl apply --namespace=kube-system -f -
  2. Mount the configmap properly
    volumeMounts:
       - name: kubelet-ca
          mountPath: /usr/share/metricbeat/kubelet-ca
          readOnly: true
    volumes:
      - name: kubelet-ca
        configMap:
          defaultMode: 0600
          name: kubelet-serving-ca
  1. Make use of the ca-bundle.crt
      ssl.certificate_authorities:
        - /usr/share/metricbeat/kubelet-ca/ca-bundle.crt

@exekias wondering if we should add it in the docs at https://www.elastic.co/guide/en/beats/metricbeat/master/running-on-kubernetes.html as an extra section-note (since we have this info) and make the comment at https://github.com/elastic/beats/blob/master/deploy/kubernetes/metricbeat-kubernetes.yaml#L83 more flexible mentioning that CA can be available in confifmaps too according to each specific installation approach.

@ChrsMark
Copy link
Member Author

ChrsMark commented Apr 24, 2020

Since the testing has been completed and all the relevant content has been updated in docs and manifests we can close it. This testing should be streamlined in the future, as issued on #17962.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

No branches or pull requests

5 participants