Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(caching): Restructure and improve caching docs #22687

Merged
merged 3 commits into from
Jan 13, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 41 additions & 19 deletions docs/docs/installation/cache.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,46 @@ version: 1

## Caching

Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes. Configuring caching is as easy as providing a custom cache config in your
`superset_config.py` that complies with [the Flask-Caching specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).
Flask-Caching supports various caching backends, including Redis, Memcached, SimpleCache (in-memory), or the
local filesystem. Custom cache backends are also supported. See [here](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) for specifics.
The following cache configurations can be customized:
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`
- SQL Lab query results (optional): `RESULTS_BACKEND`. See [Async Queries via Celery](/docs/installation/async-queries-celery) for details
Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes. Flask-Caching supports various caching backends, including Redis (recommended), Memcached, SimpleCache (in-memory), or the local filesystem. [Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) are also supported.

Caching can be configured by providing a dictionaries in
`superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching).

The following cache configurations can be customized in this way:
- Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`.
- Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG`
- Metadata cache (optional): `CACHE_CONFIG`
- Charting data queried from datasets (optional): `DATA_CACHE_CONFIG`

For example, to configure the filter state cache using redis:

```python
FILTER_STATE_CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': 'superset_filter_cache',
'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}
```

For chart data, Superset goes up a “timeout search path”, from the chart's configuration
to the dataset’s, the database’s, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have no idea what this means and I wouldn't guess most first-time users would either. I maintained it because it was in the existing docs, but I think it's either in need of further clarification by someone who understands it or should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reidab what this is trying to convey is the fact that you can override the cache timeout in the chart, dataset and database (checked in that order), and if none of those is defined, it will use the default cache timeout defined in the cache config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro Thanks! That helps to clarify it. I hadn't understood that this was referring to just the timeout, but it makes sense now. I rewrote it a bit to be more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reidab awesome, thanks for helping clarify this!


### Dependencies

In order to use dedicated cache stores, additional python libraries must be installed

- For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.

These libraries can be installed using pip.

### Fallback Metastore Cache

Note, that some form of Filter State and Explore caching are required. If either of these caches are undefined, Superset falls back to using a built-in cache that stores data in the metadata database. While it is recommended to use a dedicated cache, the built-in cache can also be used to cache other data.

Please note, that Dashboard and Explore caching is required. If these caches are undefined, Superset falls back to using a built-in cache that stores data
in the metadata database. While it is recommended to use a dedicated cache, the built-in cache can also be used to cache other data.
For example, to use the built-in cache to store chart data, use the following config:

```python
Expand All @@ -30,17 +57,12 @@ DATA_CACHE_CONFIG = {
}
```

- Redis (recommended): we recommend the [redis](https://pypi.python.org/pypi/redis) Python package
- Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as
`python-memcached` does not handle storing binary data correctly.

Both of these libraries can be installed using pip.
### SQL Lab Query Results

For chart data, Superset goes up a “timeout search path”, from a slice's (chart's) configuration
to the dataset’s, the database’s, then ultimately falls back to the global default
defined in `DATA_CACHE_CONFIG`.
Caching for SQL Lab query results is used when async queries are enabled and is configured using `RESULTS_BACKEND`.

## Celery beat
Note that this configuration does not use a flask-caching dictionary for its configuration, but instead requires a cachelib object.
See [Async Queries via Celery](/docs/installation/async-queries-celery) for details.

### Caching Thumbnails

Expand Down