Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoization with Locking #15396

Closed
john-bodley opened this issue Jun 25, 2021 · 1 comment
Closed

Memoization with Locking #15396

john-bodley opened this issue Jun 25, 2021 · 1 comment
Assignees
Labels
inactive Inactive for >= 30 days

Comments

@john-bodley
Copy link
Member

john-bodley commented Jun 25, 2021

Is your feature request related to a problem? Please describe.

As mentioned in the Apache Superset Contributor Meetup on 2021-06-25 at Airbnb we implemented a custom memoization function (memoize_with_user_lock) which on a per-user basis provides global locking (based on a cache key) to reduce the Presto query load—as it relates to the latest_partition call. This ensures that numerous instances of the same query are not simultaneously sent to the underlying engine (which may or may not dedupe) if the function call is not cached. This is often the case with calls to latest_partition which has a 60 second timeout per here.

Though the cache is now at the per user level to prevent deadlocks (necessary for our implementation given our Presto cluster queue is faceted by user and thus there's no guarantee that the queries will be executed globally in a FIFO manner) we noticed a significant decrease in the number of queries when users invoked dashboard filters.

Describe the solution you'd like

I'm not saying this solution will work globally, but I felt there was merit in sharing the code. Note this is implemented for Redis which provides locking.

from functools import wraps
from typing import Any, Callable, Optional

from flask import g
from redis.lock import Lock

from superset import cache_manager


def memoize_with_user_lock(
    timeout: Optional[int] = None,
) -> Callable[..., Any]:
    """
    Decorator for memoization which leverages per user global locking to prevent
    simultaneous execution.

    :param timeout: If set, will cache for that amount of time (in seconds)
    :returns: A memoize decorator
    """

    def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
        @wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Callable[..., Any]:
            """
            Cache the result of a function, via memoization, taking its arguments and
            the user into account in the cache key.

            Rather than re-implementing the somewhat non-trivial Flask-Caching
            memoization logic we simply re-wrap the memoized function and serialize the
            calls if uncached to prevent simultaneous execution.

            :see: https://github.com/sh4nks/flask-caching
            """

            @cache_manager.data_cache.memoize(
                make_name=lambda fname: str((fname, g.user.username)),
                timeout=timeout,
            )
            def memoized_func(*args: Any, **kwargs: Any) -> Callable[..., Any]:
                return func(*args, **kwargs)

            # The cache key associated with the non-memoized function, which is also
            # used for locking, is additionally keyed by user to prevent deadlocks.
            cache_key = memoized_func.make_cache_key(
                memoized_func.uncached, *args, **kwargs
            )

            # First check whether the Flask-Caching memoized function is cached. Note
            # the cached value is never `None` by construction.
            rv = cache_manager.data_cache.get(cache_key)

            if rv is None:

                # Per user sequential execution of the Flask-Caching memoized function.
                with Lock(
                    cache_manager.data_cache.cache._write_client,  # pylint: disable=protected-access
                    cache_key,
                ):
                    # Calling the Flask-Caching memoized function caches the result.
                    rv = memoized_func(*args, **kwargs)

            return rv

        return wrapper

    return decorator
@stale
Copy link

stale bot commented May 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days
Projects
None yet
Development

No branches or pull requests

2 participants