Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

Closed
tseaver opened this issue May 8, 2019 · 0 comments · Fixed by #8417
Closed

Bigtable: 'test_create_instance_with_two_clusters' flakes modifying profile. #7900

tseaver opened this issue May 8, 2019 · 0 comments · Fixed by #8417
Assignees
Labels
api: bigtable Issues related to the Bigtable API. flaky testing type: process A process-related concern. May include testing, release, or the like.

Comments

@tseaver
Copy link
Contributor

tseaver commented May 8, 2019

Similar to #5928, but the failure occurs while re-modifying the instance's app profile.

From this Kokoro failure:

___________ TestInstanceAdminAPI.test_create_instance_w_two_clusters ___________
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7f7c299b70d0>
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
deadline = 10, on_error = None
    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.
        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.
        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.
        Returns:
            Any: the return value of the target function.
        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None
        last_exc = None
        for sleep in sleep_generator:
            try:
>               return target()
../api_core/google/api_core/retry.py:179:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
    def _done_or_raise(self):
        """Check if the future is done and raise if it's not."""
        if not self.done():
>           raise _OperationNotComplete()
E           google.api_core.future.polling._OperationNotComplete
../api_core/google/api_core/future/polling.py:81: _OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
timeout = 10
    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.
        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return
        retry_ = self._retry.with_deadline(timeout)
        try:
>           retry_(self._done_or_raise)()
../api_core/google/api_core/future/polling.py:101:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
    @general_helpers.wraps(func)
    def retry_wrapped_func(*args, **kwargs):
        """A wrapper that calls target function with retry."""
        target = functools.partial(func, *args, **kwargs)
        sleep_generator = exponential_sleep_generator(
            self._initial, self._maximum, multiplier=self._multiplier
        )
        return retry_target(
            target,
            self._predicate,
            sleep_generator,
            self._deadline,
>           on_error=on_error,
        )
../api_core/google/api_core/retry.py:270:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>)
predicate = <function if_exception_type.<locals>.if_exception_type_predicate at 0x7f7c299b70d0>
sleep_generator = <generator object exponential_sleep_generator at 0x7f7c297d3a98>
deadline = 10, on_error = None
    def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
        """Call a function and retry if it fails.
        This is the lowest-level retry helper. Generally, you'll use the
        higher-level retry helper :class:`Retry`.
        Args:
            target(Callable): The function to call and retry. This must be a
                nullary function - apply arguments with `functools.partial`.
            predicate (Callable[Exception]): A callable used to determine if an
                exception raised by the target should be considered retryable.
                It should return True to retry or False otherwise.
            sleep_generator (Iterable[float]): An infinite iterator that determines
                how long to sleep between retries.
            deadline (float): How long to keep retrying the target.
            on_error (Callable): A function to call while processing a retryable
                exception.  Any error raised by this function will *not* be
                caught.
        Returns:
            Any: the return value of the target function.
        Raises:
            google.api_core.RetryError: If the deadline is exceeded while retrying.
            ValueError: If the sleep generator stops yielding values.
            Exception: If the target raises a method that isn't retryable.
        """
        if deadline is not None:
            deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
                seconds=deadline
            )
        else:
            deadline_datetime = None
        last_exc = None
        for sleep in sleep_generator:
            try:
                return target()
            # pylint: disable=broad-except
            # This function explicitly must deal with broad exceptions.
            except Exception as exc:
                if not predicate(exc):
                    raise
                last_exc = exc
                if on_error is not None:
                    on_error(exc)
            now = datetime_helpers.utcnow()
            if deadline_datetime is not None and deadline_datetime < now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
>                   last_exc,
                )
../api_core/google/api_core/retry.py:199:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = None, from_value = _OperationNotComplete()
>   ???
E   google.api_core.exceptions.RetryError: Deadline of 10.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f7c280cee80>>), last exception:
<string>:3: RetryError
During handling of the above exception, another exception occurred:
self = <tests.system.TestInstanceAdminAPI testMethod=test_create_instance_w_two_clusters>
    def test_create_instance_w_two_clusters(self):
        from google.cloud.bigtable import enums
        from google.cloud.bigtable.table import ClusterState
        _PRODUCTION = enums.Instance.Type.PRODUCTION
        ALT_INSTANCE_ID = "dif" + unique_resource_id("-")
        instance = Config.CLIENT.instance(
            ALT_INSTANCE_ID, instance_type=_PRODUCTION, labels=LABELS
        )
        ALT_CLUSTER_ID_1 = ALT_INSTANCE_ID + "-c1"
        ALT_CLUSTER_ID_2 = ALT_INSTANCE_ID + "-c2"
        LOCATION_ID_2 = "us-central1-f"
        STORAGE_TYPE = enums.StorageType.HDD
        cluster_1 = instance.cluster(
            ALT_CLUSTER_ID_1,
            location_id=LOCATION_ID,
            serve_nodes=SERVE_NODES,
            default_storage_type=STORAGE_TYPE,
        )
        cluster_2 = instance.cluster(
            ALT_CLUSTER_ID_2,
            location_id=LOCATION_ID_2,
            serve_nodes=SERVE_NODES,
            default_storage_type=STORAGE_TYPE,
        )
        operation = instance.create(clusters=[cluster_1, cluster_2])
        # Make sure this instance gets deleted after the test case.
        self.instances_to_delete.append(instance)
        # We want to make sure the operation completes.
        operation.result(timeout=10)
        # Create a new instance instance and make sure it is the same.
        instance_alt = Config.CLIENT.instance(ALT_INSTANCE_ID)
        instance_alt.reload()
        self.assertEqual(instance, instance_alt)
        self.assertEqual(instance.display_name, instance_alt.display_name)
        self.assertEqual(instance.type_, instance_alt.type_)
        clusters, failed_locations = instance_alt.list_clusters()
        self.assertEqual(failed_locations, [])
        clusters.sort(key=lambda x: x.name)
        alt_cluster_1, alt_cluster_2 = clusters
        self.assertEqual(cluster_1.location_id, alt_cluster_1.location_id)
        self.assertEqual(alt_cluster_1.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_1.serve_nodes, alt_cluster_1.serve_nodes)
        self.assertEqual(
            cluster_1.default_storage_type, alt_cluster_1.default_storage_type
        )
        self.assertEqual(cluster_2.location_id, alt_cluster_2.location_id)
        self.assertEqual(alt_cluster_2.state, enums.Cluster.State.READY)
        self.assertEqual(cluster_2.serve_nodes, alt_cluster_2.serve_nodes)
        self.assertEqual(
            cluster_2.default_storage_type, alt_cluster_2.default_storage_type
        )
        # Test list clusters in project via 'client.list_clusters'
        clusters, failed_locations = Config.CLIENT.list_clusters()
        self.assertFalse(failed_locations)
        found = set([cluster.name for cluster in clusters])
        self.assertTrue(
            {alt_cluster_1.name, alt_cluster_2.name, Config.CLUSTER.name}.issubset(
                found
            )
        )
        temp_table_id = "test-get-cluster-states"
        temp_table = instance.table(temp_table_id)
        temp_table.create()
        result = temp_table.get_cluster_states()
        ReplicationState = enums.Table.ReplicationState
        expected_results = [
            ClusterState(ReplicationState.STATE_NOT_KNOWN),
            ClusterState(ReplicationState.INITIALIZING),
            ClusterState(ReplicationState.PLANNED_MAINTENANCE),
            ClusterState(ReplicationState.UNPLANNED_MAINTENANCE),
            ClusterState(ReplicationState.READY),
        ]
        cluster_id_list = result.keys()
        self.assertEqual(len(cluster_id_list), 2)
        self.assertIn(ALT_CLUSTER_ID_1, cluster_id_list)
        self.assertIn(ALT_CLUSTER_ID_2, cluster_id_list)
        for clusterstate in result.values():
            self.assertIn(clusterstate, expected_results)
        # Test create app profile with multi_cluster_routing policy
        app_profiles_to_delete = []
        description = "routing policy-multy"
        app_profile_id_1 = "app_profile_id_1"
        routing = enums.RoutingPolicyType.ANY
        self._test_create_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            ignore_warnings=True,
        )
        app_profiles_to_delete.append(app_profile_id_1)
        # Test list app profiles
        self._test_list_app_profiles_helper(instance, [app_profile_id_1])
        # Test modify app profile app_profile_id_1
        # routing policy to single cluster policy,
        # cluster -> ALT_CLUSTER_ID_1,
        # allow_transactional_writes -> disallowed
        # modify description
        description = "to routing policy-single"
        routing = enums.RoutingPolicyType.SINGLE
        self._test_modify_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_1,
            allow_transactional_writes=False,
        )
        # Test modify app profile app_profile_id_1
        # cluster -> ALT_CLUSTER_ID_2,
        # allow_transactional_writes -> allowed
        self._test_modify_app_profile_helper(
            app_profile_id_1,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
            allow_transactional_writes=True,
            ignore_warnings=True,
        )
        # Test create app profile with single cluster routing policy
        description = "routing policy-single"
        app_profile_id_2 = "app_profile_id_2"
        routing = enums.RoutingPolicyType.SINGLE
        self._test_create_app_profile_helper(
            app_profile_id_2,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
            allow_transactional_writes=False,
        )
        app_profiles_to_delete.append(app_profile_id_2)
        # Test list app profiles
        self._test_list_app_profiles_helper(
            instance, [app_profile_id_1, app_profile_id_2]
        )
        # Test modify app profile app_profile_id_2 to
        # allow transactional writes
        # Note: no need to set ``ignore_warnings`` to True
        # since we are not restrictings anything with this modification.
        self._test_modify_app_profile_helper(
            app_profile_id_2,
            instance,
            routing_policy_type=routing,
            description=description,
            cluster_id=ALT_CLUSTER_ID_2,
>           allow_transactional_writes=True,
        )
tests/system.py:409:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/system.py:613: in _test_modify_app_profile_helper
    operation.result(timeout=10)
../api_core/google/api_core/future/polling.py:122: in result
    self._blocking_poll(timeout=timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <google.api_core.operation.Operation object at 0x7f7c280cee80>
timeout = 10
    def _blocking_poll(self, timeout=None):
        """Poll and wait for the Future to be resolved.
        Args:
            timeout (int):
                How long (in seconds) to wait for the operation to complete.
                If None, wait indefinitely.
        """
        if self._result_set:
            return
        retry_ = self._retry.with_deadline(timeout)
        try:
            retry_(self._done_or_raise)()
        except exceptions.RetryError:
            raise concurrent.futures.TimeoutError(
>               "Operation did not complete within the designated " "timeout."
            )
E           concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
../api_core/google/api_core/future/polling.py:104: TimeoutError
@tseaver tseaver added testing api: bigtable Issues related to the Bigtable API. type: process A process-related concern. May include testing, release, or the like. flaky labels May 8, 2019
This was referenced May 15, 2019
tseaver added a commit that referenced this issue Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API. flaky testing type: process A process-related concern. May include testing, release, or the like.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant