Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

tseaver · 2019-05-08T17:54:40Z

Similar to #5928, but the failure occurs while re-modifying the instance's app profile.

___________ TestInstanceAdminAPI.test_create_instance_w_two_clusters ___________
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7f7c299b70d0>
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
deadline = 10, on_error = None
    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.
        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.
        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.
        Returns:
            Any: the return value of the target function.
        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None
        last_exc = None
        for sleep in sleep_generator:
            try:
>               return target()
../api_core/google/api_core/retry.py:179:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
    def _done_or_raise(self):
        """Check if the future is done and raise if it's not."""
        if not self.done():
>           raise _OperationNotComplete()
E           google.api_core.future.polling._OperationNotComplete
../api_core/google/api_core/future/polling.py:81: _OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
timeout = 10
    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.
        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return
        retry_ = self._retry.with_deadline(timeout)
        try:
>           retry_(self._done_or_raise)()
../api_core/google/api_core/future/polling.py:101:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
    @general_helpers.wraps(func)
    def retry_wrapped_func(*args, **kwargs):
        """A wrapper that calls target function with retry."""
        target = functools.partial(func, *args, **kwargs)
        sleep_generator = exponential_sleep_generator(
            self._initial, self._maximum, multiplier=self._multiplier
        )
        return retry_target(
            target,
            self._predicate,
            sleep_generator,
            self._deadline,
>           on_error=on_error,
        )
../api_core/google/api_core/retry.py:270:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7f7c299b70d0>
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
deadline = 10, on_error = None
    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.
        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.
        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.
        Returns:
            Any: the return value of the target function.
        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None
        last_exc = None
        for sleep in sleep_generator:
            try:
                return target()
            # pylint: disable=broad-except
            # This function explicitly must deal with broad exceptions.
            except Exception as exc:
                if not predicate(exc):
                    raise
                last_exc = exc
                if on_error is not None:
                    on_error(exc)
            now = datetime_helpers.utcnow()
            if deadline_datetime is not None and deadline_datetime < now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
>                   last_exc,
                )
../api_core/google/api_core/retry.py:199:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = None, from_value = _OperationNotComplete()
>   ???
E   google.api_core.exceptions.RetryError: Deadline of 10.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>), last exception:
<string>:3: RetryError
During handling of the above exception, another exception occurred:
self = <tests.system.TestInstanceAdminAPI testMethod=test_create_instance_w_two_clusters>
    def test_create_instance_w_two_clusters(self):
        from google.cloud.bigtable import enums
        from google.cloud.bigtable.table import ClusterState
        _PRODUCTION = enums.Instance.Type.PRODUCTION
        ALT_INSTANCE_ID = "dif" + unique_resource_id("-")
        instance = Config.CLIENT.instance(
            ALT_INSTANCE_ID, instance_type=_PRODUCTION, labels=LABELS
        )
        ALT_CLUSTER_ID_1 = ALT_INSTANCE_ID + "-c1"
        ALT_CLUSTER_ID_2 = ALT_INSTANCE_ID + "-c2"
        LOCATION_ID_2 = "us-central1-f"
        STORAGE_TYPE = enums.StorageType.HDD
        cluster_1 = instance.cluster(
            ALT_CLUSTER_ID_1,
            location_id=LOCATION_ID,
            serve_nodes=SERVE_NODES,
            default_storage_type=STORAGE_TYPE,
        )
        cluster_2 = instance.cluster(
            ALT_CLUSTER_ID_2,
            location_id=LOCATION_ID_2,
            serve_nodes=SERVE_NODES,
            default_storage_type=STORAGE_TYPE,
        )
        operation = instance.create(clusters=[cluster_1, cluster_2])
        # Make sure this instance gets deleted after the test case.
        self.instances_to_delete.append(instance)
        # We want to make sure the operation completes.
        operation.result(timeout=10)
        # Create a new instance instance and make sure it is the same.
        instance_alt = Config.CLIENT.instance(ALT_INSTANCE_ID)
        instance_alt.reload()
        self.assertEqual(instance, instance_alt)
        self.assertEqual(instance.display_name, instance_alt.display_name)
        self.assertEqual(instance.type_, instance_alt.type_)
        clusters, failed_locations = instance_alt.list_clusters()
        self.assertEqual(failed_locations, [])
        clusters.sort(key=lambda x: x.name)
        alt_cluster_1, alt_cluster_2 = clusters
        self.assertEqual(cluster_1.location_id, alt_cluster_1.location_id)
        self.assertEqual(alt_cluster_1.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_1.serve_nodes, alt_cluster_1.serve_nodes)
        self.assertEqual(
            cluster_1.default_storage_type, alt_cluster_1.default_storage_type
        )
        self.assertEqual(cluster_2.location_id, alt_cluster_2.location_id)
        self.assertEqual(alt_cluster_2.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_2.serve_nodes, alt_cluster_2.serve_nodes)
        self.assertEqual(
            cluster_2.default_storage_type, alt_cluster_2.default_storage_type
        )
        # Test list clusters in project via 'client.list_clusters'
        clusters, failed_locations = Config.CLIENT.list_clusters()
        self.assertFalse(failed_locations)
        found = set([cluster.name for cluster in clusters])
        self.assertTrue(
            {alt_cluster_1.name, alt_cluster_2.name, Config.CLUSTER.name}.issubset(
                found
            )
        )
        temp_table_id = "test-get-cluster-states"
        temp_table = instance.table(temp_table_id)
        temp_table.create()
        result = temp_table.get_cluster_states()
        ReplicationState = enums.Table.ReplicationState
        expected_results = [
            ClusterState(ReplicationState.STATE_NOT_KNOWN),
            ClusterState(ReplicationState.INITIALIZING),
            ClusterState(ReplicationState.PLANNED_MAINTENANCE),
            ClusterState(ReplicationState.UNPLANNED_MAINTENANCE),
            ClusterState(ReplicationState.READY),
        ]
        cluster_id_list = result.keys()
        self.assertEqual(len(cluster_id_list), 2)
        self.assertIn(ALT_CLUSTER_ID_1, cluster_id_list)
        self.assertIn(ALT_CLUSTER_ID_2, cluster_id_list)
        for clusterstate in result.values():
            self.assertIn(clusterstate, expected_results)
        # Test create app profile with multi_cluster_routing policy
        app_profiles_to_delete = []
        description = "routing policy-multy"
        app_profile_id_1 = "app_profile_id_1"
        routing = enums.RoutingPolicyType.ANY
        self._test_create_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            ignore_warnings=True,
        )
        app_profiles_to_delete.append(app_profile_id_1)
        # Test list app profiles
        self._test_list_app_profiles_helper(instance, [app_profile_id_1])
        # Test modify app profile app_profile_id_1
        # routing policy to single cluster policy,
        # cluster -> ALT_CLUSTER_ID_1,
        # allow_transactional_writes -> disallowed
        # modify description
        description = "to routing policy-single"
        routing = enums.RoutingPolicyType.SINGLE
        self._test_modify_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_1,
            allow_transactional_writes=False,
        )
        # Test modify app profile app_profile_id_1
        # cluster -> ALT_CLUSTER_ID_2,
        # allow_transactional_writes -> allowed
        self._test_modify_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
            allow_transactional_writes=True,
            ignore_warnings=True,
        )
        # Test create app profile with single cluster routing policy
        description = "routing policy-single"
        app_profile_id_2 = "app_profile_id_2"
        routing = enums.RoutingPolicyType.SINGLE
        self._test_create_app_profile_helper(
            app_profile_id_2,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
            allow_transactional_writes=False,
        )
        app_profiles_to_delete.append(app_profile_id_2)
        # Test list app profiles
        self._test_list_app_profiles_helper(
            instance, [app_profile_id_1, app_profile_id_2]
        )
        # Test modify app profile app_profile_id_2 to
        # allow transactional writes
        # Note: no need to set ``ignore_warnings`` to True
        # since we are not restrictings anything with this modification.
        self._test_modify_app_profile_helper(
            app_profile_id_2,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
>           allow_transactional_writes=True,
        )
tests/system.py:409:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/system.py:613: in _test_modify_app_profile_helper
    operation.result(timeout=10)
../api_core/google/api_core/future/polling.py:122: in result
    self._blocking_poll(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
timeout = 10
    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.
        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return
        retry_ = self._retry.with_deadline(timeout)
        try:
            retry_(self._done_or_raise)()
        except exceptions.RetryError:
            raise concurrent.futures.TimeoutError(
>               "Operation did not complete within the designated " "timeout."
            )
E           concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
../api_core/google/api_core/future/polling.py:104: TimeoutError

Poke-and-hope. Closes #7900.

tseaver added testing api: bigtable Issues related to the Bigtable API. type: process A process-related concern. May include testing, release, or the like. flaky labels May 8, 2019

tseaver mentioned this issue May 8, 2019

Spanner: add client_info support to client. #7878

Merged

This was referenced May 15, 2019

Omnibus: pin new core 1.0.0 #7993

Merged

Release bigtable 0.33.0 #8002

Merged

tseaver mentioned this issue Jun 18, 2019

Bigtable: Increase timeout for app profile update operation. #8417

Merged

tseaver closed this as completed in #8417 Jun 19, 2019

tseaver added a commit that referenced this issue Jun 19, 2019

Increase timeout for app profile update operation. (#8417)

ed2366c

Poke-and-hope. Closes #7900.

JustinBeckwith assigned tseaver Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

tseaver commented May 8, 2019

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

Comments

tseaver commented May 8, 2019