Memoization with Locking #15396

john-bodley · 2021-06-25T19:39:54Z

Is your feature request related to a problem? Please describe.

As mentioned in the Apache Superset Contributor Meetup on 2021-06-25 at Airbnb we implemented a custom memoization function (memoize_with_user_lock) which on a per-user basis provides global locking (based on a cache key) to reduce the Presto query load—as it relates to the latest_partition call. This ensures that numerous instances of the same query are not simultaneously sent to the underlying engine (which may or may not dedupe) if the function call is not cached. This is often the case with calls to latest_partition which has a 60 second timeout per here.

Though the cache is now at the per user level to prevent deadlocks (necessary for our implementation given our Presto cluster queue is faceted by user and thus there's no guarantee that the queries will be executed globally in a FIFO manner) we noticed a significant decrease in the number of queries when users invoked dashboard filters.

Describe the solution you'd like

I'm not saying this solution will work globally, but I felt there was merit in sharing the code. Note this is implemented for Redis which provides locking.

from functools import wraps
from typing import Any, Callable, Optional

from flask import g
from redis.lock import Lock

from superset import cache_manager


def memoize_with_user_lock(
    timeout: Optional[int] = None,
) -> Callable[..., Any]:
    """
    Decorator for memoization which leverages per user global locking to prevent
    simultaneous execution.

    :param timeout: If set, will cache for that amount of time (in seconds)
    :returns: A memoize decorator
    """

    def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
        @wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Callable[..., Any]:
            """
            Cache the result of a function, via memoization, taking its arguments and
            the user into account in the cache key.

            Rather than re-implementing the somewhat non-trivial Flask-Caching
            memoization logic we simply re-wrap the memoized function and serialize the
            calls if uncached to prevent simultaneous execution.

            :see: https://github.com/sh4nks/flask-caching
            """

            @cache_manager.data_cache.memoize(
                make_name=lambda fname: str((fname, g.user.username)),
                timeout=timeout,
            )
            def memoized_func(*args: Any, **kwargs: Any) -> Callable[..., Any]:
                return func(*args, **kwargs)

            # The cache key associated with the non-memoized function, which is also
            # used for locking, is additionally keyed by user to prevent deadlocks.
            cache_key = memoized_func.make_cache_key(
                memoized_func.uncached, *args, **kwargs
            )

            # First check whether the Flask-Caching memoized function is cached. Note
            # the cached value is never `None` by construction.
            rv = cache_manager.data_cache.get(cache_key)

            if rv is None:

                # Per user sequential execution of the Flask-Caching memoized function.
                with Lock(
                    cache_manager.data_cache.cache._write_client,  # pylint: disable=protected-access
                    cache_key,
                ):
                    # Calling the Flask-Caching memoized function caches the result.
                    rv = memoized_func(*args, **kwargs)

            return rv

        return wrapper

    return decorator

The text was updated successfully, but these errors were encountered:

stale · 2022-05-01T12:46:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

john-bodley assigned betodealmeida and unassigned betodealmeida Jun 25, 2021

john-bodley mentioned this issue Jul 1, 2021

DASHBOARD_CACHE feature not clearing the cache as expected #15518

Closed

3 tasks

stale bot added the inactive Inactive for >= 30 days label May 1, 2022

john-bodley closed this as completed Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoization with Locking #15396

Memoization with Locking #15396

john-bodley commented Jun 25, 2021 •

edited

Loading

stale bot commented May 1, 2022

Memoization with Locking #15396

Memoization with Locking #15396

Comments

john-bodley commented Jun 25, 2021 • edited Loading

stale bot commented May 1, 2022

john-bodley commented Jun 25, 2021 •

edited

Loading