-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce dashboard bootstrap payload #9284
Reduce dashboard bootstrap payload #9284
Conversation
I love this and fully support the idea, but will leave ppl with more python expertise to approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass comments. This is definitely a move in a good direction. I anticipate there will be regressions due to this, but they should be easy to fix, and in the grand scheme of things definitely worth doing.
superset/connectors/base/models.py
Outdated
METRIC_FORM_DATA_PARAMS = [ | ||
"metric", | ||
"timeseries_limit_metric", | ||
"size", | ||
"x", | ||
"y", | ||
"metric_2", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few missing here that can be found in METRIC_KEYS
in viz.py
, namely percent_metrics
and secondary_metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that percent_metrics
is referenced in the function below because it's an array of metrics and not an individual one. It looks like I missed secondary_metric though. I'll refactor so that those are included in a constant as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, my bad!
superset/connectors/base/models.py
Outdated
COLUMN_FORM_DATA_PARAMS = [ | ||
"columns", | ||
"all_columns", | ||
"all_columns_x", | ||
"groupby", | ||
"order_by_cols", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two that I believe are missing here: series
and entity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, good catch!
superset/connectors/base/models.py
Outdated
@staticmethod | ||
def get_metric_name_from_metric( | ||
metric: Union[str, Dict[str, Any]] | ||
) -> Optional[str]: | ||
if isinstance(metric, str): | ||
return metric | ||
|
||
if isinstance(metric, dict) and metric.get("metric_name"): | ||
return metric.get("metric_name") | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong here, but isn't this function already implemented in from superset.utils.core import get_metric_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think so, that function gets the "label" field off a metric instead of the "metric_name" one. And I don't see the label
param anywhere in the BaseMetric
class, only the metric_name
one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember causing a regression a while back with similar reasoning as I did here. I didn't bother doing a deep dive into understanding how these differ from each other then, but it seems I need to, as I don't want to make the same mistake a third time. Is there a known reason why these metrics are different from the ones that appear in viz.py
or sqla/models.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked further, and it seems like the metric field is a string if it's a metric_name
, but an object with a label param if it's an adhoc_metric
. Although I'm not sure if these labels would actually show up in the datasource, I'll add logic to handle them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weird thing is what you're describing sounds more or less exactly like the other metrics that get_metric_name
deals with. We don't need to deal with that now as it's outside the scope of this PR, but we definitely need to figure this out in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably a confusion between "metrics" in form data and the "metric" model in Superset's backend. I think the "metrics" in form data are either metric.metric_name
where metric
is the metric model or an AdHocMetric
in the JS frontend, which might not have any association with the metric model (i'm uncertain here).
We could probably improve this by standardizing on a single typedef/naming scheme and then migrating all the chart form_data, but this isn't an easy thing to do (and will only get harder once charts are more pluggable and we can't guarantee what type of data might be in form_data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been digesting this today, and I still think something isn't right here. If you look at viz.py
, it's full of references to metrics in form_data. Copy/paste from viz.py
: utils.get_metric_names(self.form_data.get("metrics") or [])
. Are these really regular ad-hoc metrics that have been generated by Superset controls? I put a break point on viz.py
with a viz with an ad-hoc metric, and that had the metric name in label
, not metric_name
. Is there a chance that we have different metric signature conventions in different parts of the codebase?
superset/connectors/base/models.py
Outdated
METRIC_FORM_DATA_PARAMS = [ | ||
"metric", | ||
"timeseries_limit_metric", | ||
"size", | ||
"x", | ||
"y", | ||
"metric_2", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superset/views/core.py
Outdated
@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id): | |||
dash = qry.one_or_none() | |||
if not dash: | |||
abort(404) | |||
datasources = set() | |||
datasources = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a defaultdict(lambda: defaultdict(list))
? This means the logic in lines 1795 - 1801 could be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this actually helps anything because we need to keep multiple fields in the dict of dicts, the datasource and a list of slices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works, does it?
datasources = defaultdict(lambda: { "slices": [] })
for slc in dash.slices:
datasource = slc.datasource
if slc:
datasources[datasource.uid]["datasource"] = datasource
datasources[datasource.uid]["slices"].append(slc)
Or you can use datasource
itself as the key
datasources = defaultdict(list)
for slc in dash.slices:
if slc.datasource:
datasources[slc.datasource].append(scl)
would need to update future loops accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the datasource can be the key as it's not hashable. I think your first solution will work though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably create a hash function in the datasource class?
def ___hash__(self):
return self.uid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've added a __hash__
and an __eq__
function as according to https://docs.python.org/3/glossary.html#term-hashable
a1694fd
to
8903783
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed comments
superset/connectors/base/models.py
Outdated
METRIC_FORM_DATA_PARAMS = [ | ||
"metric", | ||
"timeseries_limit_metric", | ||
"size", | ||
"x", | ||
"y", | ||
"metric_2", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that percent_metrics
is referenced in the function below because it's an array of metrics and not an individual one. It looks like I missed secondary_metric though. I'll refactor so that those are included in a constant as well.
superset/connectors/base/models.py
Outdated
@staticmethod | ||
def get_metric_name_from_metric( | ||
metric: Union[str, Dict[str, Any]] | ||
) -> Optional[str]: | ||
if isinstance(metric, str): | ||
return metric | ||
|
||
if isinstance(metric, dict) and metric.get("metric_name"): | ||
return metric.get("metric_name") | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually think so, that function gets the "label" field off a metric instead of the "metric_name" one. And I don't see the label
param anywhere in the BaseMetric
class, only the metric_name
one
superset/connectors/base/models.py
Outdated
COLUMN_FORM_DATA_PARAMS = [ | ||
"columns", | ||
"all_columns", | ||
"all_columns_x", | ||
"groupby", | ||
"order_by_cols", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, good catch!
superset/views/core.py
Outdated
@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id): | |||
dash = qry.one_or_none() | |||
if not dash: | |||
abort(404) | |||
datasources = set() | |||
datasources = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this actually helps anything because we need to keep multiple fields in the dict of dicts, the datasource and a list of slices
8903783
to
aef331a
Compare
superset/connectors/base/models.py
Outdated
|
||
# pull out all required metrics from the form_data | ||
for param in METRICS_FORM_DATA_PARAMS: | ||
if form_data.get(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is probably not needed. for metric in form_data.get(param, [])
will just loop through an empty loop, which is pretty safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this just in case form_data[param]
equals ''
or 0
or something else not iterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I think we can do for metric in (form_data.get(param) or [])
and that should do the trick. Will modify and remove the if statement
superset/views/core.py
Outdated
# Filter out unneeded fields from the datasource payload | ||
for key, value in datasources.items(): | ||
datasource_data = value["datasource"].data_for_slices(value["slices"]) | ||
datasources[key] = datasource_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just
datasources[key] = value["datasource"].data_for_slices(value["slices"])
The line length is close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yup. that's definitely better. i had a refactor but didn't clean this up
superset/views/core.py
Outdated
@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id): | |||
dash = qry.one_or_none() | |||
if not dash: | |||
abort(404) | |||
datasources = set() | |||
datasources = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works, does it?
datasources = defaultdict(lambda: { "slices": [] })
for slc in dash.slices:
datasource = slc.datasource
if slc:
datasources[datasource.uid]["datasource"] = datasource
datasources[datasource.uid]["slices"].append(slc)
Or you can use datasource
itself as the key
datasources = defaultdict(list)
for slc in dash.slices:
if slc.datasource:
datasources[slc.datasource].append(scl)
would need to update future loops accordingly.
superset/connectors/base/models.py
Outdated
for param in METRIC_FORM_DATA_PARAMS: | ||
metric = form_data.get(param) | ||
metric_names.add(utils.get_metric_name(metric)) | ||
|
||
# pull out all required columns from the form_data | ||
if form_data.get("adhoc_filters"): | ||
for filter_ in form_data.get("adhoc_filters", []): | ||
if filter_["clause"] == "WHERE" and filter_.get("subject"): | ||
column_names.add(filter_.get("subject")) | ||
|
||
for param in COLUMNS_FORM_DATA_PARAMS: | ||
if form_data.get(param): | ||
for column in form_data.get(param, []): | ||
column_names.add(column) | ||
|
||
for param in COLUMN_FORM_DATA_PARAMS: | ||
if form_data.get(param): | ||
column_names.add(form_data.get(param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can loop throw form_data
keys and check whether it's in any of the consts instead.
metric_names = []
column_names = []
for slc in slices:
# pull out all required columns from the form_data
for param, value in slc.form_data.items():
if param == 'adhoc_filters':
column_names += [
filter_.get("subject") for filter_ in value
if filter_["clause"] == "WHERE" and filter_.get("subject")
]
elif param in METRICS_FORM_DATA_PARAMS:
metric_names += map(utils.get_metric_name, value)
# extra logic to handle adhoc metrics with column references
column_names += [
metric["column"].get("column_name")
for metric in value
if isinstance(metric, dict) and isinstance(metric["column"])
]
elif param in METRIC_FORM_DATA_PARAMS:
metric_names.append(utils.get_metric_name(value))
elif param in COLUMNS_FORM_DATA_PARAMS:
column_names += value
elif param in COLUMN_FORM_DATA_PARAMS:
column_names.append(value)
metric_names = set(metric_names)
column_names = set(column_names)
Btw the consts should probably be set()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
form_data
can have a large number of parameters in it, I think it's a bit more efficient (if a couple extra lines of code) to only iterate through the ones we care about.
However, I went through the loop again and cleaned up some unnecessary checks
Awesome work, @etr2460 ! Added some style nits. |
aef331a
to
4652853
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud adressed comments
superset/views/core.py
Outdated
# Filter out unneeded fields from the datasource payload | ||
for key, value in datasources.items(): | ||
datasource_data = value["datasource"].data_for_slices(value["slices"]) | ||
datasources[key] = datasource_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yup. that's definitely better. i had a refactor but didn't clean this up
superset/views/core.py
Outdated
@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id): | |||
dash = qry.one_or_none() | |||
if not dash: | |||
abort(404) | |||
datasources = set() | |||
datasources = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the datasource can be the key as it's not hashable. I think your first solution will work though
superset/connectors/base/models.py
Outdated
|
||
# pull out all required metrics from the form_data | ||
for param in METRICS_FORM_DATA_PARAMS: | ||
if form_data.get(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this just in case form_data[param]
equals ''
or 0
or something else not iterable
superset/connectors/base/models.py
Outdated
|
||
# pull out all required metrics from the form_data | ||
for param in METRICS_FORM_DATA_PARAMS: | ||
if form_data.get(param): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I think we can do for metric in (form_data.get(param) or [])
and that should do the trick. Will modify and remove the if statement
superset/connectors/base/models.py
Outdated
for param in METRIC_FORM_DATA_PARAMS: | ||
metric = form_data.get(param) | ||
metric_names.add(utils.get_metric_name(metric)) | ||
|
||
# pull out all required columns from the form_data | ||
if form_data.get("adhoc_filters"): | ||
for filter_ in form_data.get("adhoc_filters", []): | ||
if filter_["clause"] == "WHERE" and filter_.get("subject"): | ||
column_names.add(filter_.get("subject")) | ||
|
||
for param in COLUMNS_FORM_DATA_PARAMS: | ||
if form_data.get(param): | ||
for column in form_data.get(param, []): | ||
column_names.add(column) | ||
|
||
for param in COLUMN_FORM_DATA_PARAMS: | ||
if form_data.get(param): | ||
column_names.add(form_data.get(param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
form_data
can have a large number of parameters in it, I think it's a bit more efficient (if a couple extra lines of code) to only iterate through the ones we care about.
However, I went through the loop again and cleaned up some unnecessary checks
Codecov Report
@@ Coverage Diff @@
## master #9284 +/- ##
=======================================
Coverage 58.90% 58.90%
=======================================
Files 373 373
Lines 12026 12026
Branches 2953 2953
=======================================
Hits 7084 7084
Misses 4763 4763
Partials 179 179 Continue to review full report at Codecov.
|
870a5b3
to
09a799a
Compare
@john-bodley @villebro this is ready for another review |
superset/connectors/base/models.py
Outdated
|
||
for param in METRIC_FORM_DATA_PARAMS: | ||
metric = form_data.get(param) | ||
metric_names.add(utils.get_metric_name(metric)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be similar logic for ad-hoc metrics here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly. i'll add it
note that this has been pretty difficult to test, and even if a column is missing for a chart, I think the only implication is that we might lose custom formatting or a custom name for a column. It shouldn't break the entire chart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etr2460 for an internal project to deal with scalar (metric) and list (metrics) I created a utility method,
def get_iterable(x: Any) -> List:
"""
Get an iterable (list) representation of the object.
:param x: The object
:returns: An iterable representation
"""
return x if isinstance(x, list) else [x]
thus you could combine the METRICS_FORM_DATA_PARAMS
and METRIC_FORM_DATA_PARAMS
lists (DRY) and do something of the form,
for param in ["metrics", "metric", ...]:
for metric in get_iterable(form_data.get(param)):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
superset/connectors/base/models.py
Outdated
metric_names.add(utils.get_metric_name(metric)) | ||
|
||
# extra logic to handle adhoc metrics with column references | ||
if isinstance(metric, dict) and metric.get("column"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the is_adhoc_metric
method here to improve readability and adhering to the DRY principle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo good catch
superset/connectors/base/models.py
Outdated
verbose_map = {"__timestamp": "Time"} | ||
verbose_map.update( | ||
{ | ||
o["metric_name"]: o["verbose_name"] or o["metric_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use metric
or m
rather than o
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, this was copied from above, but agree a real name is better
superset/connectors/base/models.py
Outdated
) | ||
verbose_map.update( | ||
{ | ||
o["column_name"]: o["verbose_name"] or o["column_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use column
or col
rather than o
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, this was copied from above, but agree a real name is better
superset/views/core.py
Outdated
|
||
if config["ENABLE_ACCESS_REQUEST"]: | ||
for datasource in datasources: | ||
for datasource in datasources.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Remove the .keys()
as this is implicit when iterating over a dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
someday i'll learn python
superset/views/core.py
Outdated
@@ -1807,6 +1809,11 @@ def dashboard(self, dashboard_id): | |||
"superset/request_access/?" f"dashboard_id={dash.id}&" | |||
) | |||
|
|||
# Filter out unneeded fields from the datasource payload | |||
datasources_payload = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this a dictionary comprehension,
datasources_payload = {
datasource.uid: datasource.data_for_slices(slices) for datasource, slices in datasources.items()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
09a799a
to
d3f734e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley comments addressed
superset/connectors/base/models.py
Outdated
metric_names.add(utils.get_metric_name(metric)) | ||
|
||
# extra logic to handle adhoc metrics with column references | ||
if isinstance(metric, dict) and metric.get("column"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo good catch
superset/connectors/base/models.py
Outdated
|
||
for param in METRIC_FORM_DATA_PARAMS: | ||
metric = form_data.get(param) | ||
metric_names.add(utils.get_metric_name(metric)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly. i'll add it
note that this has been pretty difficult to test, and even if a column is missing for a chart, I think the only implication is that we might lose custom formatting or a custom name for a column. It shouldn't break the entire chart
superset/connectors/base/models.py
Outdated
verbose_map = {"__timestamp": "Time"} | ||
verbose_map.update( | ||
{ | ||
o["metric_name"]: o["verbose_name"] or o["metric_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, this was copied from above, but agree a real name is better
superset/connectors/base/models.py
Outdated
) | ||
verbose_map.update( | ||
{ | ||
o["column_name"]: o["verbose_name"] or o["column_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, this was copied from above, but agree a real name is better
superset/views/core.py
Outdated
|
||
if config["ENABLE_ACCESS_REQUEST"]: | ||
for datasource in datasources: | ||
for datasource in datasources.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
someday i'll learn python
superset/views/core.py
Outdated
@@ -1807,6 +1809,11 @@ def dashboard(self, dashboard_id): | |||
"superset/request_access/?" f"dashboard_id={dash.id}&" | |||
) | |||
|
|||
# Filter out unneeded fields from the datasource payload | |||
datasources_payload = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@etr2460 it seems like you're tested this locally a bit and I know that @villebro has some questions/concerns but it would be good to potentially understand what type of regressions we may expect if say the current logic wasn't watertight given the various nuances of how entities are defined at the visualization and form-data level. |
So I took a look to see what code on the frontend depends on various properties on the datasource. You can see all the places charts depend on the datasource here: https://github.com/apache-superset/superset-ui-plugins/search?p=1&q=datasource&unscoped_q=datasource From what I can see, the datasource is used primarily for two reasons:
All of the instances i've tested haven't broken any charts entirely, so I think the main source of regressions would be formatting regressions (e.g. rendering 0.03 instead of 3% or "metric_name_not_pretty" instead of "Pretty Metric Name"). I think that we could merge this now and deploy internally, being ready to revert fast if needed or patch with any minor issues that show up. Alternatively I could put this behind a feature flag and only enable it in our deployment to start if we thought that was safer. Up to @john-bodley and @villebro (and anyone else who has input) |
Like I said, this feature has obvious performance upside, but I expect some small regressions to come along with it. While they should be easy to fix, they could be annoying for anyone who prefers stability over performance. I really like the idea of putting this behind a feature flag for now, as it gives us the best of both worlds. |
d3f734e
to
f8e3cb7
Compare
@villebro feature flag added! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll be activating this flag on my devbox too, I can pitch in if I surface any weird bugs.
@etr2460 could you address my comments before merging? |
f8e3cb7
to
db6ffff
Compare
db6ffff
to
11fded8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
CATEGORY
Choose one
SUMMARY
Currently, when loading a dashboard we send down the complete data payload for all datasources used within the dashboard. This is somewhat of a waste, because in the dashboard view the entire datasource isn't needed, only the pieces required to render the slices within the dashboard (as opposed to the explore view, where we need the entire datasource). When datasources have hundreds or thousands of columns or metrics, this can get out of hand quite quickly; we have a datasource that ends up being about 6MB of uncompressed JSON.
This PR adds a new function to the datasource model that filters the full data to only pieces required for a set of slices on a dashboard. This consists of:
When determining what columns and metrics were required to render charts properly, I referenced the form_data definition from CONTRIBUTING.md to ensure that we included all the required columns and metrics for all the various charts.
TEST PLAN
New unit tests. Also load the world bank dashboard before and after the change. See it render the same way, but require significantly less data
Before:
After:
ADDITIONAL INFORMATION
REVIEWERS
to: @john-bodley @villebro @mistercrunch @kristw