Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry GetExecBatchResults on ApiExceptions caused by GSB Errors #588

Merged
merged 10 commits into from
Aug 11, 2021

Conversation

azawlocki
Copy link
Contributor

@azawlocki azawlocki commented Aug 10, 2021

Fixes #587

Implements the following behaviour for handling ApiExceptions
raised by get_exec_batch_results():

call get_exec_batch_results()
if the call raises ApiException:
    if the activity is terminated:
        raise BatchError("Activity terminated by provider")
    if the exception is due to GSB endpoint not found:
        wait for 3 seconds
        repeat
    else:
        re-raise the exception

When repeating, the maximum number of attempts is 3.

@azawlocki azawlocki requested a review from a team August 10, 2021 10:49
yapapi/rest/activity.py Outdated Show resolved Hide resolved
yapapi/rest/activity.py Outdated Show resolved Hide resolved
raise err
num_tries -= 1
if num_tries:
_log.debug("Retrying ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the message was not finished.

Copy link
Contributor Author

@azawlocki azawlocki Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Thanks, fixed in c6356d9.


while num_tries:
try:
results = await self._activity._api.get_exec_batch_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with changes in #548 , which should be merged first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#548 was a PR to master, and the current PR is to b0.6. When we will merge changes in b0.6 to master we'll have to merge those two sets of changes, but no need to worry about this now. I don't think we plan to backport #548 to b0.6, do we?

Copy link
Contributor

@filipgolem filipgolem Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar PR will be a part of the next yajsapi release. IMO it would be great to backport #548.

@mfranciszkiewicz
Copy link
Contributor

mfranciszkiewicz commented Aug 10, 2021

In place of 500s I get

[2021-08-10T17:53:49.010+0200 DEBUG yapapi.rest] Cannot query activity state
Traceback (most recent call last):
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/rest/activity.py", line 210, in _get_results
    results = await self._activity._api.get_exec_batch_results(
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/ya_activity/api_client.py", line 205, in __call_api
    raise e
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/ya_activity/api_client.py", line 193, in __call_api
    response_data = await self.request(
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/ya_activity/rest.py", line 192, in GET
    return await self.request(
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/ya_activity/rest.py", line 180, in request
    raise ApiException(http_resp=r)
ya_activity.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: <CIMultiDictProxy('Content-Length': '213', 'Content-Type': 'application/json', 'Date': 'Tue, 10 Aug 2021 15:53:48 GMT')>
HTTP response body: {"message":"GSB error: Remote service at `/net/0xb921cb630d07152c73640a175555ed970f5413c4/exeunit/7fcffb7659624af8be0be43eb1c567b5/GetExecBatchResults` error: GSB failure: Bad request: endpoint address not found"}


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/rest/activity.py", line 187, in _activity_terminated
    state_list = await self._activity.state().state  # type: ignore
AttributeError: 'coroutine' object has no attribute 'state'
[2021-08-10T17:53:49.011+0200 DEBUG yapapi.rest] Destroying activity 7fcffb7659624af8be0be43eb1c567b5 on error:
Traceback (most recent call last):
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/executor/__init__.py", line 357, in start_worker
    await self._engine.process_batches(agreement.id, act, batch_generator)
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/engine.py", line 562, in process_batches
    item = await command_generator.athrow(*sys.exc_info())
  File "/home/mf/Projects/yapapi/tests/drone/drone.py", line 48, in worker
    yield ctx.commit(timeout=timedelta(minutes=10))
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/engine.py", line 556, in process_batches
    results = await get_batch_results()
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/engine.py", line 537, in get_batch_results
    async for evt_ctx in remote:
  File "/home/mf/Projects/yapapi/venv/lib/python3.9/site-packages/yapapi/rest/activity.py", line 240, in __aiter__
    results = results[last_idx:]
TypeError: 'NoneType' object is not subscriptable

yapapi/rest/activity.py Outdated Show resolved Hide resolved
@azawlocki azawlocki merged commit 4bd79db into b0.6 Aug 11, 2021
@azawlocki azawlocki deleted the az/repeat-on-gsb-error branch August 11, 2021 16:38
azawlocki added a commit that referenced this pull request Aug 24, 2021
* Delete unused temp files and close unpublished files in GftpStorageProvider (#543)

* Delete unused temp files and close unpublished URLs in GftpStorageProvider
* Add unit tests for GftpStorageProvider
* Ensure newly created temp files are closed before being used
  If we keep a file open we won't be able to delete it on Windows
* Address code review suggestions
Co-authored-by: Kuba Mazurek <zakaprov@gmail.com>

* Add team mention to goth nightly failure messages (#575)

* Read env var to determine if GftpProvider shoud call `gftp close` (#573)

* Read env var to determine if GftpProvider shoud call `gftp close`

* Add info on YAPAPI_USE_GFTP_CLOSE env var to README.md

* Fix upper bound for expiraio time in no-offers warning (#582)

* Check gftp version to determine if `gftp close URL` should be used (#583)

* Check gftp version to determine if `gftp close URL` should be used

* Add `# type: ignore` to `import semantic_version`

* Add unit test for various env var/gftp version combinations

* Retry GetExecBatchResults on ApiExceptions caused by GSB Errors (#588)

* Implement re-tries for ApiExceptions caused by GSB Errors

* Endpoint -> endpoint

* Apply fixes after code review

* Fixes after code review: part II

* debug -> warning

* Improve logs when activity is prematurely terminated on the provider

* Formatting

* Raise BatchError when an activity is terminated by the provider

* Add unit tests for PollingBatch behavior when GSB errors occur

Co-authored-by: filipgolem <44880692+filipgolem@users.noreply.github.com>
Co-authored-by: Filip <filip@golem.network>

* Bump version to 0.6.3-alpha.0 (#589)

* Add semantic-version to dependencies, bump version to 0.6.3-alpha.1 (#591)

* Bump version to 0.6.3 (#592)

* Fix stable branch name regex in nightly workflow (#593)

Co-authored-by: Kuba Mazurek <jakub.mazurek@golem.network>
Co-authored-by: filipgolem <44880692+filipgolem@users.noreply.github.com>
Co-authored-by: Filip <filip@golem.network>
@azawlocki azawlocki linked an issue Aug 24, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants