Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test which provokes abort-during-read during 'run_in_transaction'. #3663

Merged
merged 2 commits into from
Aug 10, 2017

Conversation

tseaver
Copy link
Contributor

@tseaver tseaver commented Jul 24, 2017

Uses #3615 as a base. ca441f7 is the only change from that PR.

@tseaver tseaver added api: spanner Issues related to the Spanner API. testing labels Jul 24, 2017
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 24, 2017
@tseaver tseaver added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 24, 2017
@tseaver
Copy link
Contributor Author

tseaver commented Jul 24, 2017

This test can deadlock: I got interrupted while debugging and blew my stack. :(

@tseaver
Copy link
Contributor Author

tseaver commented Jul 25, 2017

@dhermes, @bjwatson, @lukesneeringer OK, I have the test passing, but to do so I have to straddle the fact that the GAPIC / gRPC bits sometimes raise a _Rendezvous w/ status code ABORTED, and sometimes a GaxError with such a _Rendezvous as their cause. I think the APIs should raise one or the other, not a mix-and-match.

Specifics:

  • ABORTED returned from the server during a Read API call is raised as a _Rendezvous.
    ABORTEDreturned from the server during aCommitAPI call is raised as aGaxError`.

My hack to get the read-abort bit passing:

@@ -312,7 +334,12 @@ def _delay_until_retry(exc, deadline):
     :type deadline: float
     :param deadline: maximum timestamp to continue retrying the transaction.
     """
-    if exc_to_code(exc.cause) != StatusCode.ABORTED:
+    if isinstance(exc, GrpcRendezvous):
+        cause = exc
+    else:
+        cause = exc.cause
+
+    if exc_to_code(cause) != StatusCode.ABORTED:
         raise
 
     now = time.time()
@@ -320,7 +347,7 @@ def _delay_until_retry(exc, deadline):
     if now >= deadline:
         raise
 
-    delay = _get_retry_delay(exc)
+    delay = _get_retry_delay(cause)
     if delay is not None:
 
         if now + delay > deadline:
@@ -330,7 +357,7 @@ def _delay_until_retry(exc, deadline):
 # pylint: enable=misplaced-bare-raise
 
 
-def _get_retry_delay(exc):
+def _get_retry_delay(cause):
     """Helper for :func:`_delay_until_retry`.
 
     :type exc: :class:`google.gax.errors.GaxError`
@@ -339,7 +366,7 @@ def _get_retry_delay(exc):
     :rtype: float
     :returns: seconds to wait before retrying the transaction.
     """
-    metadata = dict(exc.cause.trailing_metadata())
+    metadata = dict(cause.trailing_metadata())
     retry_info_pb = metadata.get('google.rpc.retryinfo-bin')
     if retry_info_pb is not None:
         retry_info = RetryInfo()

@tseaver
Copy link
Contributor Author

tseaver commented Jul 25, 2017

Turns out my issue is the same as #3562.

@tseaver tseaver force-pushed the spanner-systest-txn_read_abort branch from ca441f7 to f916edc Compare July 28, 2017 19:40
@tseaver tseaver added the status: blocked Resolving the issue is dependent on other work. label Jul 28, 2017
@tseaver
Copy link
Contributor Author

tseaver commented Jul 28, 2017

I've rebased, fixed the deadlock / hang, and added a separate commit (f916edc) with my workaround for #3562 so that I can verify the tests pass on Circle. I propose waiting for a real fix and then removing that commit.

@tseaver
Copy link
Contributor Author

tseaver commented Jul 28, 2017

The test failure is for coverage of branches added in the to-be-backed-out commit (f916edc).

@theacodes
Copy link
Contributor

@tseaver once #3738 is in, you can use google.api.core.exceptions.from_grpc_exception to map the grpc.RpcError to a Google API exception. You shouldn't need to catch _Rendezvous directly, instead, catch grpc.RpcError or grpc.Call (if you need the metadata directly).

@bjwatson bjwatson added release blocking Required feature/issue must be fixed prior to next release. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed status: blocked Resolving the issue is dependent on other work. labels Aug 7, 2017
@bjwatson
Copy link

bjwatson commented Aug 8, 2017

@tseaver I removed the blocked label since it looks like @jonparrott unblocked this.

@tseaver
Copy link
Contributor Author

tseaver commented Aug 8, 2017

@jonparrott Note that for the ABORT error, I'm not propagating the error, but using it to trigger a retry (I do need access to the trailing metadata to pick out the retry interval).

@bjwatson Am I still supposed to be catching GaxError as well as grpc.Call here?

@theacodes
Copy link
Contributor

@tseaver yes, in this case catch GaxError and grpc.Call.

@tseaver
Copy link
Contributor Author

tseaver commented Aug 8, 2017

@jonparrott Why are we not fixing the streaming iterator stuff to return only one kind of error?

@theacodes
Copy link
Contributor

@tseaver we are, but it's O(weeks) away, and changing this PR to catch grpc.Call instead of GrpcRendezvous is a fine compromise for now- in the near term, it'll unblock spanner. Once google.api.core.grpc exists and is used by gapic, it will not raise this error which will mean a small change will need to be made here (simplifying this to just catch a google.api.core.exception subclass).

@tseaver
Copy link
Contributor Author

tseaver commented Aug 9, 2017

@jonparrott Note that one cannot catch grpc.Call (it doesn't derive from BaseException).

Given that the change is intended to be temporary, I will merge with a # pragma: NO COVER on the branch for if isinstance(exc, GrpcRendezvous), and add an issue to remove that once the underling normalization fix has rolled out.

@tseaver tseaver force-pushed the spanner-systest-txn_read_abort branch from f916edc to cf48b5e Compare August 9, 2017 22:16
@tseaver tseaver merged commit 1fcc1a4 into master Aug 10, 2017
@tseaver tseaver deleted the spanner-systest-txn_read_abort branch August 10, 2017 01:19
landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 21, 2017
landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 22, 2017
landrito pushed a commit to landrito/google-cloud-python that referenced this pull request Aug 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. cla: yes This human has signed the Contributor License Agreement. do not merge Indicates a pull request not ready for merge, due to either quality or timing. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. release blocking Required feature/issue must be fixed prior to next release. testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants