Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLError: FileNotFoundError: [Errno 2] No such file or directory #1236

Closed
bendavis78 opened this issue Aug 20, 2020 · 21 comments
Closed

SSLError: FileNotFoundError: [Errno 2] No such file or directory #1236

bendavis78 opened this issue Aug 20, 2020 · 21 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@bendavis78
Copy link

bendavis78 commented Aug 20, 2020

I'm getting an intermittent error on my production Django website (running under UWSGI) when the kubernetes client library is called:

Traceback (most recent call last):                                                
  File "/srv/hive/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 353, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir, ca_cert_data)           
FileNotFoundError: [Errno 2] No such file or directory 

The strange thing is, I can simply restart my server and the problem goes away, but after a few hours I see the same error again.

I added some extra logging to the load_verify_locations function in urllib3/util/ssl_.py to see what file it's trying to open. The ca_certs argument is passing in a temp file, /tmp/tmp2xng8a6e. The other two arguments are None.

So, why a temp file, and why is it missing? Is there a reason why the temp file would get deleted? I filed an issue with urllib3, but they said the temporary file doesn't happen on the urllib3 side.

I'm using kubernetes python client version 11.0.0. My OpenSSL version is 1.0.2g.

Here's the full exception with traceback:

Aug 20 10:36:05 ERROR django.request: Internal Server Error: /code/projects/test-project/
Traceback (most recent call last):
  File "/srv/hive/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 353, in ssl_wrap_socket
    context.load_verify_locations(ca_certs, ca_cert_dir, ca_cert_data)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 976, in _validate_conn
    conn.connect()
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connection.py", line 370, in connect
    ssl_context=context,
  File "/srv/hive/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 355, in ssl_wrap_socket
    raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/hive/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/srv/hive/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/srv/hive/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/srv/hive/lib/python3.6/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/srv/hive/src/hive/ide/views.py", line 67, in ide
    utils.init_workspace(project)
  File "/srv/hive/src/hive/ide/utils.py", line 252, in init_workspace
    get_user_storage_resource(project.user).apply()
  File "/srv/hive/src/hive/ide/utils.py", line 79, in apply
    responses.append(item.apply())
  File "/srv/hive/src/hive/ide/utils.py", line 221, in apply
    if self.exists:
  File "/srv/hive/src/hive/ide/utils.py", line 165, in exists
    self._call('read')
  File "/srv/hive/src/hive/ide/utils.py", line 152, in _call
    return func(**kwargs)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 18854, in read_namespaced_persistent_volume_claim
    (data) = self.read_namespaced_persistent_volume_claim_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 18945, in read_namespaced_persistent_volume_claim_with_http_info
    collection_formats=collection_formats)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 366, in request
    headers=headers)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/rest.py", line 241, in GET
    query_params=query_params)
  File "/srv/hive/lib/python3.6/site-packages/kubernetes/client/rest.py", line 214, in request
    headers=headers)
  File "/srv/hive/lib/python3.6/site-packages/urllib3/request.py", line 76, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/srv/hive/lib/python3.6/site-packages/urllib3/request.py", line 97, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/srv/hive/lib/python3.6/site-packages/urllib3/poolmanager.py", line 336, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 765, in urlopen
    **response_kw
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 765, in urlopen
    **response_kw
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 765, in urlopen
    **response_kw
  File "/srv/hive/lib/python3.6/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/srv/hive/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='ide-ideresourcegroup-207977-dbdb0faf.hcp.centralus.azmk8s.io', port=443): Max retries exceeded with url: /api/v1/namespaces/ide/persistentvolumeclaims/workspace-storage-5p2lm91r (Caused by SSLError(FileNotFoundError(2, 'No such file or directory'),))
@bendavis78 bendavis78 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 20, 2020
@bendavis78
Copy link
Author

bendavis78 commented Aug 21, 2020

After digging through the code, I can see that the tempfile is created because certificate-authority-data is defined in my kubeconfig. The only reason I can find that this tempfile would be removed would be by the _cleanup_temp_files function defined in config/kube_config.py. However, this function is only called from atexit. My server app is running under uwsgi, so I'm not sure the internals of how processes are managed. Either way, it seems if the process is existing, there is something that is still referencing the temp file.

In the meantime, I'm changing my kubeconfig to use certificate-authority, which points to a file, instead of certificate-authority-data. We'll see if that works, but this could still pose a problem for other things that rely on config data that is stored in a file.

@roycaihw
Copy link
Member

cc @yliaog

@yliaog
Copy link
Contributor

yliaog commented Sep 3, 2020

it looks that you have already root caused the problem. after changing to certificate-authority, do you still see the issue?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 1, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wb14123
Copy link

wb14123 commented May 8, 2021

I'm still having this issue. Use certificate-authority is just a workaround, not a fix. Why it still uses the tmp file even the program is restarted?

/reopen

@k8s-ci-robot
Copy link
Contributor

@wb14123: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

I'm still having this issue. Use certificate-authority is just a workaround, not a fix. Why it still uses the tmp file even the program is restarted?

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rathinikunj
Copy link

i also saw this problem and because of this issue i was able to debug why i saw it. I had a code which was doing CRUDs on multiple kubernetes clusters, while that code was still running i went in the /tmp folder of and deleted everything, include all the tmp*******( eg: tmp2xng8a6e) files which were created. This led to failure in some CRUD APIs

Although i do not know why these files are created.

@yliaog
Copy link
Contributor

yliaog commented Jun 2, 2021

/reopen

@k8s-ci-robot
Copy link
Contributor

@yliaog: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 2, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ba-work
Copy link

ba-work commented Nov 29, 2021

This should be reopened, this issue makes this lib incompatible with spec-compliant kubeconfigs.

More selfishly, it is incompatible with kubeconfigs that are served from Rancher's cluster API at /v3/clusters/<cluster id>?action=generateKubeconfig since those have the CA data embedded.

@ba-work
Copy link

ba-work commented Nov 29, 2021

/reopen

@yliaog
Copy link
Contributor

yliaog commented Nov 29, 2021

/reopen

@ba-work do you have time to send a PR to fix it?

@ba-work
Copy link

ba-work commented Nov 29, 2021

@yliaog I think this probably just a case of me misusing the lib or not understanding the scope properly. I had a function to setup a connection to a cluster that returned a kubernetes.client.CoreV1Api() and once I took that logic out of the function and put it in the caller it worked just fine.
If you have any guidance on this I would welcome it. This is what didn't work (but did if insecure-skip-tls-verify):

def kubernetes_config():
    '''
        Sets up kubernetes connection
    '''
    in_cluster = os.getenv('IN_CLUSTER', 'true').lower() in ('true', '1', 't', 'yes', 'y')
    if in_cluster:
        kubernetes.config.load_incluster_config()
    else:
        kube_config = os.getenv('KUBECONFIG', '/etc/kubernetes/kuebconfig.yaml')
        logging.info('The kubeconfig file has been set to: %s', kube_config)
        kubernetes.config.load_kube_config(kube_config)

    return kubernetes.client.CoreV1Api(), kubernetes.watch.Watch()

Then outside of the function I had something like this:

kubernetes_api, kubernetes_watch = kubernetes_config()

And accessing it like this would cause the above issue.

@yutinghua
Copy link

yutinghua commented Jan 6, 2022

I have read the source code, and we can solve the problem by transfer certificate-authority-data to a certificate-authority file like this:

  1. transfer "certificate-authority-data" to a file
    image
    ehoc -n $certificate-authority-data | base64 -d > your_cert_file

  2. use certificate-authority field with your_cert_file
    image

@PaulFurtado
Copy link

I have opened a new issue for this with more details: #1782

Ultimately, modern linux servers run things like tmpwatch that clean old files from /tmp, so the client cannot expect a temp file to live forever.

@lyphilip
Copy link

I have read the source code, and we can solve the problem by transfer certificate-authority-data to a certificate-authority file like this:

  1. transfer "certificate-authority-data" to a file
    image
    ehoc -n $certificate-authority-data | base64 -d > your_cert_file
  2. use certificate-authority field with your_cert_file
    image

hello from the other side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests