Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certification verification failed when connecting to kube-apiserver endpoint #930

Open
marcusportmann opened this issue Apr 24, 2020 · 11 comments

Comments

@marcusportmann
Copy link
Contributor

marcusportmann commented Apr 24, 2020

Hello,

I have successfully managed to deploy the postgres-operator. When I apply the minimal-postgres-manifest.yaml file to create the minimal cluster the logs for the pod are filled with the following error messages.

2020-04-23 22:06:16,894 ERROR: ObjectCache.run MaxRetryError("HTTPSConnectionPool(host='10.254.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=application%3Dspilo%2Ccluster-name%3Dacid-minimal-cluster (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))",)
2020-04-23 22:06:16,896 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dspilo%2Ccluster-name%3Dacid-minimal-cluster
2020-04-23 22:06:16,900 ERROR: ObjectCache.run MaxRetryError("HTTPSConnectionPool(host='10.254.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints?labelSelector=application%3Dspilo%2Ccluster-name%3Dacid-minimal-cluster (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))",)

I believe that that this is because one of the components, possibly Patroni, is unable to connect to the kube-apiserver endpoint because it cannot verify the API server certificate. I have searched through the documentation but cannot find a way to specify the client or CA certificate to use.

Is there some way to do this?

@FxKu
Copy link
Member

FxKu commented May 12, 2020

@marcusportmann did you already has a look at using custom TLS certificates? That's a new feature of the operator we released last week.

@khmarochos
Copy link

Excuse me, may I ask one question regarding a similar issue?

I've got a root CA (self-signed) and an intermediate CA (signed by the root CA which is self-signed). I have a K8S cluster (a bare metal one) which uses the intermediate+root certificate bundle as it's own CA, so its kube-apiserver's certificate is signed by my intermediate CA. Everything works fine, but there's something that obstructs Postgres Operator from working. :-(

Here's what I see in the pod's logfile:

2022-05-02 18:43:24,866 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)': /api/v1/namespaces/default/endpoints/kubernetes
2022-05-02 18:43:24,879 ERROR: Failed to get "kubernetes" endpoint from https://10.233.0.1:443: MaxRetryError("HTTPSConnectionPool(host='10.233.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints/kubernetes (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))",)

Of course, I have read the manual which describes custom TLS certificates, so I created a secret and have mentioned this secret in the manifest of my postgresqls.acid.zalan.do object:

spec:
  tls: secretName: postgres-tls
  caFile: ca.crt

So, the pod which is created by the operator seems to be properly configured:

spec:
  containers:
  - env:
    - name: SSL_CERTIFICATE_FILE
      value: /tls/tls.crt
    - name: SSL_PRIVATE_KEY_FILE
      value: /tls/tls.key
    - name: SSL_CA_FILE
      value: /tls/ca.crt

Alas, something seems to be wrong, as Python doesn't trust the kube-apiserver's certificate.

What could it be, what do you think?

Thanks in advance for any suggestions!

@khmarochos
Copy link

One more remark if you don't mind...

When I exec bash in the container where patroni works, I can easily run curl --cacert /tls/ca.crt https://10.233.0.1:443/ and there are no difficulties with establishing a TLS-session.

When I run curl without the --cacert parameter, it doesn't work, but when I mention the CA certificate (which is an X.509 bundle of the intermediate and the root certificates located), curl works fine.

What could be wrong from the patroni's point of view?

@khmarochos
Copy link

Well, now I see where the problem is...

When one sets the spec.tls parameter in a postgresqls.acid.zalan.do object, it actually doesn't help. This parameter sets a custom certificate which is used for internal communications, not for communicating with kube-apiserver.

As I understand, patroni gets the ca.crt from the Secret which is referred in the ServiceAccount. Because kubernetes sets /var/run/secrets/kubernetes.io/serviceaccount/ca.crt and it's being used by default. The problem is that it consists only a "shortened" version of the bundle (the intermediate CA certificate itself, without the root CA certificate). When patroni tries to establish a TLS-session with kube-apiserver, it fails, because the SSL library which is being used is too strict, it's willing to check the full chain, it doesn't trust the issuer if it doesn't trust the issuer's certificate's issuer.

To make sure that the problem could be fixed by giving patroni the full CA bundle, I edited /home/postgres/postgres.yml inside of the pod - I added kubernetes.cacert with the following value: /tls/ca.crt (this file contained the full version of my CA bundle). Then I restarted patroni and now it works perectly. So pity that editing the patroni's configuration manually is not a solution at all... :-)

How can I ask postgres-operator to give the full version of the CA bundle? Is that possible to make postgres-operator to put some custom data to /var/run/secrets/kubernetes.io/serviceaccount/ca.crt (well, I actually understand that it's being set by kubernetes, so it might be impossible). Is that possible to make postgres-operator adding the kubernetes.cacert to /home/postgres/postgres.yml, so it would address to another file instead of /var/run/secrets/kubernetes.io/serviceaccount/ca.crt? In other words, how can I provide patroni with the full version of my CA bundle?

Thank you all.

@khmarochos
Copy link

I'm not sure that anyone pay attention to all these comments in an old issue, so I opened a new one (#1877). I hope, that's OK.

@thangamani-arun
Copy link

@Melnik13 I suggest you to use addtionaVolumes: feature for your use case. But it is limited to Postgres/sidecar Pods only.

@vponnathota
Copy link

can any help on the below backup fails to minio s3:

k logs logical-backup-amz-time-27961560--1-5qnmw -n zalando
IPv4
API Endpoint: https://10.96.0.1:443/api/v1
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14170 0 14170 0 0 1153k 0 --:--:-- --:--:-- --:--:-- 1257k
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167 100 167 0 0 15181 0 --:--:-- --:--:-- --:--:-- 15181
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 167 100 167 0 0 27833 0 --:--:-- --:--:-- --:--:-- 27833
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 19760 0 19760 0 0 2756k 0 --:--:-- --:--:-- --:--:-- 2756k

  • '[' s3 == az ']'
  • dump
  • /usr/lib/postgresql/14/bin/pg_dumpall
  • compress
  • pigz
  • upload
  • case $LOGICAL_BACKUP_PROVIDER in
    ++ estimate_size
    ++ /usr/lib/postgresql/14/bin/psql -tqAc 'select sum(pg_database_size(datname)::numeric) from pg_database;'
  • aws_upload 9344540
  • declare -r EXPECTED_SIZE=9344540
    ++ date +%s
  • PATH_TO_BACKUP=s3://amztimedb/spilo/amz-time/9c439c23-7c1d-4dd0-8ed9-7027bde17825/logical_backups/1677693605.sql.gz
  • args=()
  • [[ ! -z 9344540 ]]
  • args+=("--expected-size=$EXPECTED_SIZE")
  • [[ ! -z https://minio.minio-abc.svc.cluster.local:443 ]]
  • args+=("--endpoint-url=$LOGICAL_BACKUP_S3_ENDPOINT")
  • [[ ! -z '' ]]
  • [[ ! -z '' ]]
  • aws s3 cp - s3://amztimedb/spilo/amz-time/9c439c23-7c1d-4dd0-8ed9-7027bde17825/logical_backups/1677693605.sql.gz --expected-size=9344540 --endpoint-url=https://minio.minio-abc.svc.cluster.local:443
    pg_dump: warning: there are circular foreign-key constraints on this table:
    pg_dump: hypertable
    pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
    pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
    pg_dump: warning: there are circular foreign-key constraints on this table:
    pg_dump: chunk
    pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
    pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
    pg_dump: warning: there are circular foreign-key constraints on this table:
    pg_dump: continuous_agg
    pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
    pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
    /usr/local/lib/python3.6/dist-packages/OpenSSL/_util.py:6: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography (40.0) will be the last to support Python 3.6.
    from cryptography.hazmat.bindings.openssl.binding import Binding
    upload failed: - to s3://amztimedb/spilo/amz-time/9c439c23-7c1d-4dd0-8ed9-7027bde17825/logical_backups/1677693605.sql.gz SSL validation failed for https://minio.minio-abc.svc.cluster.local:443/amztimedb/spilo/amz-time/9c439c23-7c1d-4dd0-8ed9-7027bde17825/logical_backups/1677693605.sql.gz?uploads [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

@vponnathota
Copy link

vponnathota commented Mar 1, 2023

dockerImage: ghcr.io/zalando/spilo-15:2.1-p9 and using postgres : 14
there is no issue with minio certificate here...minio setup working fine.

I have tried below solution still getting the same SSL certificate verify errors.

additionalVolumes:

  • name: minio-ca-certificate
    mountPath: /var/log/minio
    targetContainers: []
    volumeSource:
    secret:
    secretName: minio-ca-certificate
    • name : WALG_S3_CA_CERT_FILE
      value: "/var/log/minio/ca.crt"

@D1StrX
Copy link

D1StrX commented Mar 2, 2023

Haven't tried it myself, but is the CA of the certificate trusted within the container?
Perhaps let's encrypt certificate brings better luck

@vponnathota
Copy link

Haven't tried it myself, but is the CA of the certificate trusted within the container? Perhaps let's encrypt certificate brings better luck

yes, I believe CA of the certificate is trusted with in the container. if not can you please share steps how to encrypt the certificate.. I don't want to go with untrusted or disable SSL options...can you please help me here any working example for logical backup scheduled and uploading to minio s3 bucket successfully using CRD manifest...all my config i have used CRD based approach didn't used configmap. I saw couple of examples for AWS with s3 using configmap that didn't helped me here.

@D1StrX
Copy link

D1StrX commented Apr 16, 2023

Haven't tried it myself, but is the CA of the certificate trusted within the container? Perhaps let's encrypt certificate brings better luck

yes, I believe CA of the certificate is trusted with in the container. if not can you please share steps how to encrypt the certificate.. I don't want to go with untrusted or disable SSL options...can you please help me here any working example for logical backup scheduled and uploading to minio s3 bucket successfully using CRD manifest...all my config i have used CRD based approach didn't used configmap. I saw couple of examples for AWS with s3 using configmap that didn't helped me here.

Running into the same issue. While the operator-tls secret public.crt is mounted as ca into a pod (not specific to zalando), it doesn't trust the server cert.

Which is kinda weird.. the cert chain should be Kubernetes CA cert -> operator-tls -> minio tenant cert. And the Kubernetes CA is trusted by default in a pod. If I am correct. But it isn't in one certificate chain.

So I am probably gonna seek for another/better solution. Perhaps something external or like Vault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants