-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: enable consistent etag across workers and force no-cache for dashboards #11137
Conversation
0fd9f77
to
32047e7
Compare
@@ -51,6 +51,7 @@ def etag_cache( | |||
check_perms: Callable[..., Any], | |||
get_last_modified: Optional[Callable[..., Any]] = None, | |||
skip: Optional[Callable[..., Any]] = None, | |||
must_revalidate: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially always assume must_revalidate
if get_last_modified
is provided. Because validation only makes sense when there is a way to programmatically get a resource's the "real" last modified time.
superset/views/core.py
Outdated
@@ -1599,12 +1606,13 @@ def publish( # pylint: disable=no-self-use | |||
|
|||
@has_access | |||
@etag_cache( | |||
0, | |||
CACHE_DEFAULT_TIMEOUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting max_age
to 0
will set cache expiration to a far future date (1 year from now), which is unlikely to be what we actually need unless we are serving static resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for Dashboard's etag, it will prefer to be force expired in far future, since cache will be validated per user request (with dashboard's last modified_on). CACHE_DEFAULT_TIMEOUT
may used by the Superset system for other purpose, for example, backend query results cache age, it probably only 1 day or 36 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed it back to 0
.
Codecov Report
@@ Coverage Diff @@
## master #11137 +/- ##
==========================================
- Coverage 65.86% 61.63% -4.23%
==========================================
Files 826 826
Lines 38976 39017 +41
Branches 3669 3667 -2
==========================================
- Hits 25671 24050 -1621
- Misses 13195 14786 +1591
- Partials 110 181 +71
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
superset/views/core.py
Outdated
@@ -171,6 +171,13 @@ class Superset(BaseSupersetView): # pylint: disable=too-many-public-methods | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it's safe for this to be guessable? Or should we encode it with the shared secret instead?
I don't really know, but throwing it out there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a scenario where this will cause security issues. But I guess it does make it easier to invalidate cache if we encode it with some shared secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For dashboard requests: For explore_json requests: cc @betodealmeida and @bkyryliuk since you are interested in this topic. |
…hboards (apache#11137) * fix: enable consistent etag across workers * Use CACHE_DEFAULT_TIMEOUT instead of 0 * Change timeout to 0 and set expires header even for no-cache * Reduce number of if branches to appease Pylint * Fix mypy error
SUMMARY
This PR fixes a bug in #7032 and #10963 where different ETags are generated for requests to different running Superset instances even when the underlying data are the same. Currently new ETags will be generated every time Flask is restarted or when there are multiple WSGI workers running at the same time (either via gunicorn or some other tools).
Also added
cache-control: no-cache
for dashboards so that the browser always asks the server to validate data freshness, instead of serving disk cache sometimes.Problem
When running Superset behind Gunicorn with multiple workers, each worker instance generates different ETags in
etag_cache
, making the ETags much less useful because when visiting the same page, users would often hit a different worker on each request, then constantly invalidate previous Etags from other workers.This is because
flask_caching
memoization does not work well with class instance methods. In_memoize_make_cache_key
, positional arguments are used to generate cache keys, but for a class instance method, the first argument will always beself
, which by default is formatted to a unique string with memory locations. E.g.Solution
Give
superset.views.core.Superset
a determinate__repr__
string so that the cache keys generated byflask_caching
can be more stable. I'm using app version string since it makes sense to invalidate the cache every time Superset is upgraded.So instead of generating the following cache key update string here:
it will generates something like this:
BEFORE/AFTER SCREENSHOTS
Before
Second visit to a dashboard page uses disk cache if not a refresh:
A refresh request (almost) always returns 200.
After
Second and subsequent visits to a dashboard page (regardless whether is a refresh or not) correctly make a roundtrip request to the server and the server correctly returns HTTP 304 (not modified).
TEST PLAN
ADDITIONAL INFORMATION
Some caveats:
__repr__
is used in other places, but I doubt they will require memory locations and version string is not enough.VERSION_STRING
andVERSION_SHA
, i.e., not tied to Git SHAs or official Superset versions (e.g. when build the exportedincubator-superset
outside of Git, it will use the default 0.999.0dev frompackage.json
), users might be served with stale cache if: