You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As mentioned in the Apache Superset Contributor Meetup on 2021-06-25 at Airbnb we implemented a custom memoization function (memoize_with_user_lock) which on a per-user basis provides global locking (based on a cache key) to reduce the Presto query load—as it relates to the latest_partition call. This ensures that numerous instances of the same query are not simultaneously sent to the underlying engine (which may or may not dedupe) if the function call is not cached. This is often the case with calls to latest_partition which has a 60 second timeout per here.
Though the cache is now at the per user level to prevent deadlocks (necessary for our implementation given our Presto cluster queue is faceted by user and thus there's no guarantee that the queries will be executed globally in a FIFO manner) we noticed a significant decrease in the number of queries when users invoked dashboard filters.
Describe the solution you'd like
I'm not saying this solution will work globally, but I felt there was merit in sharing the code. Note this is implemented for Redis which provides locking.
fromfunctoolsimportwrapsfromtypingimportAny, Callable, Optionalfromflaskimportgfromredis.lockimportLockfromsupersetimportcache_managerdefmemoize_with_user_lock(
timeout: Optional[int] =None,
) ->Callable[..., Any]:
""" Decorator for memoization which leverages per user global locking to prevent simultaneous execution. :param timeout: If set, will cache for that amount of time (in seconds) :returns: A memoize decorator """defdecorator(func: Callable[..., Any]) ->Callable[..., Any]:
@wraps(func)defwrapper(*args: Any, **kwargs: Any) ->Callable[..., Any]:
""" Cache the result of a function, via memoization, taking its arguments and the user into account in the cache key. Rather than re-implementing the somewhat non-trivial Flask-Caching memoization logic we simply re-wrap the memoized function and serialize the calls if uncached to prevent simultaneous execution. :see: https://github.com/sh4nks/flask-caching """@cache_manager.data_cache.memoize(make_name=lambdafname: str((fname, g.user.username)),timeout=timeout, )defmemoized_func(*args: Any, **kwargs: Any) ->Callable[..., Any]:
returnfunc(*args, **kwargs)
# The cache key associated with the non-memoized function, which is also# used for locking, is additionally keyed by user to prevent deadlocks.cache_key=memoized_func.make_cache_key(
memoized_func.uncached, *args, **kwargs
)
# First check whether the Flask-Caching memoized function is cached. Note# the cached value is never `None` by construction.rv=cache_manager.data_cache.get(cache_key)
ifrvisNone:
# Per user sequential execution of the Flask-Caching memoized function.withLock(
cache_manager.data_cache.cache._write_client, # pylint: disable=protected-accesscache_key,
):
# Calling the Flask-Caching memoized function caches the result.rv=memoized_func(*args, **kwargs)
returnrvreturnwrapperreturndecorator
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.
Is your feature request related to a problem? Please describe.
As mentioned in the Apache Superset Contributor Meetup on 2021-06-25 at Airbnb we implemented a custom memoization function (
memoize_with_user_lock
) which on a per-user basis provides global locking (based on a cache key) to reduce the Presto query load—as it relates to thelatest_partition
call. This ensures that numerous instances of the same query are not simultaneously sent to the underlying engine (which may or may not dedupe) if the function call is not cached. This is often the case with calls tolatest_partition
which has a 60 second timeout per here.Though the cache is now at the per user level to prevent deadlocks (necessary for our implementation given our Presto cluster queue is faceted by user and thus there's no guarantee that the queries will be executed globally in a FIFO manner) we noticed a significant decrease in the number of queries when users invoked dashboard filters.
Describe the solution you'd like
I'm not saying this solution will work globally, but I felt there was merit in sharing the code. Note this is implemented for Redis which provides locking.
The text was updated successfully, but these errors were encountered: