Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable consistent etag across workers and force no-cache for dashboards #11137

Merged
merged 5 commits into from
Oct 6, 2020

Conversation

ktmud
Copy link
Member

@ktmud ktmud commented Oct 2, 2020

SUMMARY

This PR fixes a bug in #7032 and #10963 where different ETags are generated for requests to different running Superset instances even when the underlying data are the same. Currently new ETags will be generated every time Flask is restarted or when there are multiple WSGI workers running at the same time (either via gunicorn or some other tools).

Also added cache-control: no-cache for dashboards so that the browser always asks the server to validate data freshness, instead of serving disk cache sometimes.

Problem

When running Superset behind Gunicorn with multiple workers, each worker instance generates different ETags in etag_cache, making the ETags much less useful because when visiting the same page, users would often hit a different worker on each request, then constantly invalidate previous Etags from other workers.

This is because flask_caching memoization does not work well with class instance methods. In _memoize_make_cache_key, positional arguments are used to generate cache keys, but for a class instance method, the first argument will always be self, which by default is formatted to a unique string with memory locations. E.g.

<superset.views.core.Superset object at 0x7ffb67416a50>

Solution

Give superset.views.core.Superset a determinate __repr__ string so that the cache keys generated by flask_caching can be more stable. I'm using app version string since it makes sense to invalidate the cache every time Superset is upgraded.

So instead of generating the following cache key update string here:

superset.views.core.Superset.dashboard('<superset.views.core.Superset object at 0x7ffb67416a50>', '15634')OrderedDict()

it will generates something like this:

superset.views.core.Superset.dashboard('Superset.views.core.Superset@v0.999.0dev', '15634')OrderedDict()

BEFORE/AFTER SCREENSHOTS

Before

Second visit to a dashboard page uses disk cache if not a refresh:

A refresh request (almost) always returns 200.

After

Second and subsequent visits to a dashboard page (regardless whether is a refresh or not) correctly make a roundtrip request to the server and the server correctly returns HTTP 304 (not modified).

TEST PLAN

  1. Start Superset with gunicorn and multiple workers
     gunicorn \
       --bind 0.0.0.0:8089 \
       --workers 5 \
       "superset.app:create_app()"
    
  2. Visit the dashboard page and refresh the page multiple times. DO NOT disable browser cache.
  3. Previously you will almost always see HTTP 200 responses, with this fix, you should see HTTP 304 since the second request and most subsequent page loads are going to be much faster.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

Some caveats:

  1. Not sure __repr__ is used in other places, but I doubt they will require memory locations and version string is not enough.
  2. For custom Superset builds with fixed VERSION_STRING and VERSION_SHA, i.e., not tied to Git SHAs or official Superset versions (e.g. when build the exported incubator-superset outside of Git, it will use the default 0.999.0dev from package.json), users might be served with stale cache if:
    1. It has been less than CACHE_DEFAULT_TIMEOUT (default to 1 day) since user's last visit to a dashboard page
    2. A new Superset version is deployed and some code related to the dashboard view has changed
    3. The dashboard and related slices haven't been updated.

@ktmud ktmud force-pushed the consistent-etag branch 2 times, most recently from 0fd9f77 to 32047e7 Compare October 2, 2020 00:56
@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 2, 2020
@@ -51,6 +51,7 @@ def etag_cache(
check_perms: Callable[..., Any],
get_last_modified: Optional[Callable[..., Any]] = None,
skip: Optional[Callable[..., Any]] = None,
must_revalidate: Optional[bool] = False,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially always assume must_revalidate if get_last_modified is provided. Because validation only makes sense when there is a way to programmatically get a resource's the "real" last modified time.

@@ -1599,12 +1606,13 @@ def publish( # pylint: disable=no-self-use

@has_access
@etag_cache(
0,
CACHE_DEFAULT_TIMEOUT,
Copy link
Member Author

@ktmud ktmud Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting max_age to 0 will set cache expiration to a far future date (1 year from now), which is unlikely to be what we actually need unless we are serving static resources.

Copy link

@graceguo-supercat graceguo-supercat Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for Dashboard's etag, it will prefer to be force expired in far future, since cache will be validated per user request (with dashboard's last modified_on). CACHE_DEFAULT_TIMEOUT may used by the Superset system for other purpose, for example, backend query results cache age, it probably only 1 day or 36 hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed it back to 0.

@codecov-commenter
Copy link

Codecov Report

Merging #11137 into master will decrease coverage by 4.22%.
The diff coverage is 14.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11137      +/-   ##
==========================================
- Coverage   65.86%   61.63%   -4.23%     
==========================================
  Files         826      826              
  Lines       38976    39017      +41     
  Branches     3669     3667       -2     
==========================================
- Hits        25671    24050    -1621     
- Misses      13195    14786    +1591     
- Partials      110      181      +71     
Flag Coverage Δ
#cypress ?
#javascript 62.21% <ø> (+<0.01%) ⬆️
#python 61.29% <14.28%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/utils/decorators.py 49.25% <0.00%> (-1.52%) ⬇️
superset/views/core.py 74.01% <50.00%> (-0.29%) ⬇️
superset-frontend/src/SqlLab/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/setup/setupColors.js 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/chart/ChartContainer.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/setup/setupFormatters.js 0.00% <0.00%> (-100.00%) ⬇️
... and 183 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a72082...66b9556. Read the comment docs.

@@ -171,6 +171,13 @@ class Superset(BaseSupersetView): # pylint: disable=too-many-public-methods

logger = logging.getLogger(__name__)

def __repr__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it's safe for this to be guessable? Or should we encode it with the shared secret instead?

I don't really know, but throwing it out there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a scenario where this will cause security issues. But I guess it does make it easier to invalidate cache if we encode it with some shared secret.

Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 2, 2020
@graceguo-supercat
Copy link

graceguo-supercat commented Oct 2, 2020

For dashboard requests:
We set header Cache-Control: no-cache
If you enable dashboard etag header feature, dashboard metadata will be stored in browser cache. But for every new dashboard request, browser still sends out request to server-side to validate if cache is still valid, to make sure no other users changed dashboard since last visit. If server-side response 304, browser will use cached data. otherwise, server-side will response 200 with newer version of dashboard metadata. so this feature only save server-side processing time but still need a roundtrip.

For explore_json requests:
We set header Cache-Control: public
If you use GET request for explore_json, query results will be cached in browser, and browser will not send requests to server-side until the cache expired (set by Expire header). This is current behavior (per #7032), this PR and #10963 will not change it.

cc @betodealmeida and @bkyryliuk since you are interested in this topic.

@graceguo-supercat graceguo-supercat merged commit 38dccf9 into apache:master Oct 6, 2020
graceguo-supercat pushed a commit to graceguo-supercat/superset that referenced this pull request Oct 8, 2020
graceguo-supercat pushed a commit that referenced this pull request Oct 8, 2020
@ktmud ktmud mentioned this pull request Oct 12, 2020
6 tasks
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
…hboards (apache#11137)

* fix: enable consistent etag across workers

* Use CACHE_DEFAULT_TIMEOUT instead of 0

* Change timeout to 0 and set expires header even for no-cache

* Reduce number of if branches to appease Pylint

* Fix mypy error
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants