Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement cache invalidation api #10761

Merged
merged 6 commits into from
Sep 15, 2020

Conversation

bkyryliuk
Copy link
Member

@bkyryliuk bkyryliuk commented Sep 2, 2020

SUMMARY

Implements a way to invalidate cached data for the provide datasources.

TEST PLAN

  • unit tests
  • local testing

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@bkyryliuk bkyryliuk marked this pull request as draft September 2, 2020 05:05
superset/cache/api.py Outdated Show resolved Hide resolved
return self.response_400(message="Request is incorrect")
except ValidationError as error:
return self.response_400(
message=_("Request is incorrect: %(error)s", error=error.messages)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just use: return self.response_400(message=error.messages)

@bkyryliuk bkyryliuk force-pushed the bogdan/cache_endpoints branch 2 times, most recently from 210cffd to 2028348 Compare September 3, 2020 04:49
@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2020

Codecov Report

Merging #10761 into master will decrease coverage by 0.32%.
The diff coverage is 96.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10761      +/-   ##
==========================================
- Coverage   61.32%   60.99%   -0.33%     
==========================================
  Files         803      806       +3     
  Lines       37927    38002      +75     
  Branches     3561     3561              
==========================================
- Hits        23258    23180      -78     
- Misses      14483    14636     +153     
  Partials      186      186              
Flag Coverage Δ
#javascript 61.59% <ø> (+<0.01%) ⬆️
#python 60.65% <96.77%> (-0.52%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/cachekeys/api.py 95.91% <95.91%> (ø)
superset/app.py 80.30% <100.00%> (+0.15%) ⬆️
superset/cachekeys/schemas.py 100.00% <100.00%> (ø)
superset/charts/schemas.py 100.00% <100.00%> (ø)
superset/db_engines/hive.py 0.00% <0.00%> (-85.72%) ⬇️
superset/db_engine_specs/hive.py 53.90% <0.00%> (-30.08%) ⬇️
superset/db_engine_specs/presto.py 70.85% <0.00%> (-10.77%) ⬇️
superset/connectors/base/views.py 71.42% <0.00%> (-3.58%) ⬇️
superset/examples/world_bank.py 97.10% <0.00%> (-2.90%) ⬇️
superset/examples/birth_names.py 97.36% <0.00%> (-2.64%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a9ae81...115de31. Read the comment docs.

@bkyryliuk bkyryliuk force-pushed the bogdan/cache_endpoints branch 9 times, most recently from 942ede8 to f119c85 Compare September 4, 2020 16:33
@@ -76,6 +76,14 @@ def GET_FEATURE_FLAGS_FUNC(ff):
REDIS_PORT = os.environ.get("REDIS_PORT", "6379")
REDIS_CELERY_DB = os.environ.get("REDIS_CELERY_DB", 2)
REDIS_RESULTS_DB = os.environ.get("REDIS_RESULTS_DB", 3)
REDIS_CACHE_DB = os.environ.get("REDIS_CACHE_DB", 4)

CACHE_CONFIG = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for cache to work on CI

@bkyryliuk bkyryliuk marked this pull request as ready for review September 4, 2020 17:24
@bkyryliuk
Copy link
Member Author

@willbarrett , @dpgaspar - ready for review

@bkyryliuk bkyryliuk changed the title feat: [WIP] implement cache invalidation api feat: implement cache invalidation api Sep 4, 2020
@@ -0,0 +1,98 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the route /api/v1/cachekey/invalidate is not ideal. I'm struggling to find something good though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I had the same struggles, definitely open to suggestions.
/api/v1/cache/invalidate could be slightly better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DELETE /api/v1/cachekey/<cachekey>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be too much struggle to fetch cache keys though the API and then issue deletes for those.
I think we could expose GET / DELETE later once those use cases will arise

@statsd_metrics
def invalidate(self) -> Response:
"""
Takes a list of datasources, finds the associated cache records and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note (outside the scope of this PR), we'll be renaming datasource to dataset. I think it's mostly done in the UI layer, but has yet to be done in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I can do it in the separate PR if that works for you, would rename it in API & CacheKey model for consistency.

@bkyryliuk bkyryliuk force-pushed the bogdan/cache_endpoints branch 2 times, most recently from 4bfc2d3 to 49e0eea Compare September 10, 2020 15:07
Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments


class CacheRestApi(BaseSupersetModelRestApi):
datamodel = SQLAInterface(CacheKey)
resource_name = "cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this was renamed to cache, evaluate if placing everything on cachekeys still makes sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to cachekey, I am happy with either way naming it cachekey or cache
I was thinking about moving it into separate class from the cache key, but most of the instrumentation lives in BaseSupersetModelRestApi that requires to model.

.all()
)
if cache_keys:
cache_manager.cache.delete_many(*[c.cache_key for c in cache_keys])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • if a cache delete_many fails, we may end up with delete keys on the cache that still exist on CacheKey will a next run of delete_many fail if it can't find a key?
  • the same applies if a delete CacheKey fails, would also be safer to try/except and rollback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_many doesn't fail, only returns true / false if the keys were deleted. e.g. if the key was not found it will return false. Updated CacheKey logic to delete is a single statement & rollback if fails -> that would be more robust.

deleting case & keeping records is not too bad as there still a need to garbage collect those and cache is not a persistent store.

superset/cachekeys/schemas.py Outdated Show resolved Hide resolved
@@ -69,6 +71,39 @@ def get_resp(
return resp.data.decode("utf-8")


def post_assert_metric(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same but on a different place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes same code, moved it into this function for reuse in the pytests

tests/cachekeys/api_tests.py Show resolved Hide resolved
@bkyryliuk bkyryliuk force-pushed the bogdan/cache_endpoints branch 2 times, most recently from 115de31 to fc4f9d2 Compare September 11, 2020 16:08
@bkyryliuk bkyryliuk force-pushed the bogdan/cache_endpoints branch 2 times, most recently from 682007a to ecb9672 Compare September 11, 2020 16:37
Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want all the ModelRestApi route methods to be available? If not include include_route_methods = {"invalidate",} also note this will create a new specific permission can invalidate on Cache key.
Check https://localhost:8080/swagger/v1

@bkyryliuk
Copy link
Member Author

Do you want all the ModelRestApi route methods to be available? If not include include_route_methods = {"invalidate",} also note this will create a new specific permission can invalidate on Cache key.
Check https://localhost:8080/swagger/v1

addressed, thanks

@bkyryliuk
Copy link
Member Author

image

@bkyryliuk bkyryliuk merged commit 9c420d6 into apache:master Sep 15, 2020
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
* Add cache endpoints

* Implement cache endpoint

* Tests and address feedback

* Set cache config

* Address feedback

* Expose only invalidate endpoint

Co-authored-by: bogdan kyryliuk <bogdankyryliuk@dropbox.com>
@benceorlai
Copy link

Roadmap: apache-superset/superset-roadmap#74


@expose("/invalidate", methods=["POST"])
@event_logger.log_this
@protect()
Copy link
Member

@ktmud ktmud Jan 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this API is not actually protected---it's probably missing a @permission_name("edit"). Anyone can trigger cache key invalidation with a cURL:

curl -X POST --data '{"datasource_uids": ["3__table"]}' --header "Content-Type: application/json" "http://localhost:8088/api/v1/cachekey/invalidate"

Is this by design?

cc @bkyryliuk

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.38.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.38.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants