Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from time to time job executor is not liked by API server and gets 'Connection reset by peer' #211

Open
konstan opened this issue Oct 13, 2021 · 0 comments

Comments

@konstan
Copy link
Contributor

konstan commented Oct 13, 2021

Job executor resumes correctly after 30 sec.

Server and job executor logs at the time of the error.

API server:

2021-10-13 10:27:01,496 INFO  - 200 (3 ms) PUT /api/subscription-config [ - ] ?
2021-10-13 10:27:06,589 DEBUG - GET /api/infrastructure-service/fb618879-69ff-4781-879b-aabd74ccb1d3 [ - ] ?
2021-10-13 10:27:06,592 INFO  - 200 (3 ms) GET /api/infrastructure-service/fb618879-69ff-4781-879b-aabd74ccb1d3 [ - ] ?
2021-10-13 10:27:06,628 DEBUG - POST /api/infrastructure-service/fb618879-69ff-4781-879b-aabd74ccb1d3/terminate [ - ] ?
2021-10-13 10:27:06,656 INFO  - Added job/caeecbd0-eb45-4084-97a9-585d42d613c6, zookeeper path /job/entries/entry-050-0000043584.
2021-10-13 10:27:06,713 INFO  - 202 (85 ms) POST /api/infrastructure-service/fb618879-69ff-4781-879b-aabd74ccb1d3/terminate [ - ] ?
2021-10-13 10:27:06,832 DEBUG - PUT /api/infrastructure-service-group [ - ] ?
2021-10-13 10:27:06,836 INFO  - 200 (4 ms) PUT /api/infrastructure-service-group [ - ] ?
2021-10-13 10:27:06,949 DEBUG - PUT /api/infrastructure-service [ - ] ?
2021-10-13 10:27:06,956 INFO  - 200 (7 ms) PUT /api/infrastructure-service [ - ] ?
2021-10-13 10:27:07,855 DEBUG - PUT /api/deployment [ - ] ?

job executor:

2021-10-13 10:27:06,670 - ERROR - job.py:83 - Fatal error when trying to retrieve job/caeecbd0-eb45-4084-97a9-585d42d613c6! Put it back in queue. Will go back to work after 30s.
2021-10-13 10:27:06,671 - ERROR - job.py:86 - ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1368, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 317, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.10/http/client.py", line 278, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py", line 410, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1368, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 317, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.10/http/client.py", line 278, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/nuvla/job/job.py", line 63, in _init
    self.cimi_job = self.get_cimi_job(self.id)
  File "/usr/local/lib/python3.10/site-packages/nuvla/job/job.py", line 121, in get_cimi_job
    return self.api.get(job_uri)
  File "/usr/local/lib/python3.10/site-packages/nuvla/api/api.py", line 451, in get
    resp_json = self._cimi_get(resource_id=resource_id, params=kwargs)
  File "/usr/local/lib/python3.10/site-packages/nuvla/api/api.py", line 426, in _cimi_get
    return self._cimi_request('GET', uri, params=params)
  File "/usr/local/lib/python3.10/site-packages/nuvla/api/api.py", line 398, in _cimi_request
    response = self.session.request(method, endpoint,
  File "/usr/local/lib/python3.10/site-packages/nuvla/api/api.py", line 171, in request
    response = self._request(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/nuvla/api/api.py", line 168, in _request
    return super(SessionStore, self).request(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2021-10-13 10:27:36,707 - INFO - executor.py:65 - Got new job/caeecbd0-eb45-4084-97a9-585d42d613c6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant