Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_kubeflow_workloads fails after "Waiting for Job test-kubeflow/test-kubeflow to complete (status == active)" for about 40 minutes #48

Closed
orfeas-k opened this issue Nov 14, 2023 · 1 comment

Comments

@orfeas-k
Copy link
Contributor

Running driver/test_kubeflow_workloads.py::test_kubeflow_workloads on a self-hosted runner fails after waiting for about 40 minutes

INFO     utils:utils.py:76 Waiting for Job test-kubeflow/test-kubeflow to complete (status == active)
INFO     utils:utils.py:40 Retrying in 32 seconds (attempts: 81)
INFO     httpx:_client.py:1013 HTTP Request: GET https://10.38.217.2:16443/apis/batch/v1/namespaces/test-kubeflow/jobs/test-kubeflow "HTTP/1.1 200 OK"
INFO     test_kubeflow_workloads:test_kubeflow_workloads.py:123 Fetching Job logs...
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbe6d1fda30>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/pytest/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbe6d1fdd30>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/pytest/

We also see the following during teardown although this shouldn't influence the test's execution

  File "/home/ubuntu/charmed-kubeflow-uats/driver/utils.py", line 91, in delete_job
    client.delete(Job, name=job_name, namespace=namespace)
AttributeError: module 'test_kubeflow_workloads' has no attribute 'delete'

Environment

  • Self-hosted runner
  • Microk8s 1.24
  • Juju 2.9
  • CKF 1.7/stable

Logs

=================================== FAILURES ===================================
___________________________ test_kubeflow_workloads ____________________________
Traceback (most recent call last):
  File "/home/ubuntu/charmed-kubeflow-uats/driver/test_kubeflow_workloads.py", line 116, in test_kubeflow_workloads
    wait_for_job(lightkube_client, JOB_NAME, NAMESPACE)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/home/ubuntu/charmed-kubeflow-uats/driver/utils.py", line 72, in wait_for_job
    raise ValueError(f"Job {namespace}/{job_name} failed!")
ValueError: Job test-kubeflow/test-kubeflow failed!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_callers.py", line 152, in _multicall
    return outcome.get_result()
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_result.py", line 114, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_hooks.py", line 493, in __call__
    return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_manager.py", line 115, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_callers.py", line 152, in _multicall
    return outcome.get_result()
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_result.py", line 114, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/pluggy/_callers.py", line 77, in _multicall
    res = hook_impl.function(*args)
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/home/ubuntu/charmed-kubeflow-uats/driver/test_kubeflow_workloads.py", line 118, in test_kubeflow_workloads
    pytest.fail(
  File "/home/ubuntu/charmed-kubeflow-uats/.tox/kubeflow/lib/python3.10/site-packages/_pytest/outcomes.py", line 198, in fail
    raise Failed(msg=reason, pytrace=pytrace)
Failed: Something went wrong while running Job test-kubeflow/test-kubeflow. Please inspect the attached logs for more info...
@orfeas-k orfeas-k changed the title test_kubeflow_workloads fails after Waiting for Job test-kubeflow/test-kubeflow to complete (status == active) for about 40 minutes test_kubeflow_workloads fails after "Waiting for Job test-kubeflow/test-kubeflow to complete (status == active)" for about 40 minutes Nov 14, 2023
@orfeas-k orfeas-k changed the title test_kubeflow_workloads fails after "Waiting for Job test-kubeflow/test-kubeflow to complete (status == active)" for about 40 minutes test_kubeflow_workloads fails after "Waiting for Job test-kubeflow/test-kubeflow to complete (status == active)" for about 40 minutes Nov 14, 2023
@orfeas-k
Copy link
Contributor Author

Closing this since this was a misunderstanding on my side. test_kubeflow_workloads only fails because the UAT notebooks that run from the job that it spins up also fail. No action is required for fixing test_kubeflow_workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant