Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce dashboard bootstrap payload #9284

Merged
merged 1 commit into from
Mar 13, 2020

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Mar 11, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Currently, when loading a dashboard we send down the complete data payload for all datasources used within the dashboard. This is somewhat of a waste, because in the dashboard view the entire datasource isn't needed, only the pieces required to render the slices within the dashboard (as opposed to the explore view, where we need the entire datasource). When datasources have hundreds or thousands of columns or metrics, this can get out of hand quite quickly; we have a datasource that ends up being about 6MB of uncompressed JSON.

This PR adds a new function to the datasource model that filters the full data to only pieces required for a set of slices on a dashboard. This consists of:

  • Removing unneeded columns
  • Removing unneeded metrics
  • Removing unneeded columns and metrics from the verbose_map
  • Removing the datasource description

When determining what columns and metrics were required to render charts properly, I referenced the form_data definition from CONTRIBUTING.md to ensure that we included all the required columns and metrics for all the various charts.

TEST PLAN

New unit tests. Also load the world bank dashboard before and after the change. See it render the same way, but require significantly less data

Before:
Screen Shot 2020-03-11 at 11 44 56 AM
Screen Shot 2020-03-11 at 11 45 10 AM

After:
Screen Shot 2020-03-11 at 11 43 43 AM
Screen Shot 2020-03-11 at 11 44 10 AM

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

to: @john-bodley @villebro @mistercrunch @kristw

@kristw
Copy link
Contributor

kristw commented Mar 11, 2020

I love this and fully support the idea, but will leave ppl with more python expertise to approve.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments. This is definitely a move in a good direction. I anticipate there will be regressions due to this, but they should be easy to fix, and in the grand scheme of things definitely worth doing.

Comment on lines 30 to 40
METRIC_FORM_DATA_PARAMS = [
"metric",
"timeseries_limit_metric",
"size",
"x",
"y",
"metric_2",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few missing here that can be found in METRIC_KEYS in viz.py, namely percent_metrics and secondary_metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 this could be a good resource for the relevant params.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that percent_metrics is referenced in the function below because it's an array of metrics and not an individual one. It looks like I missed secondary_metric though. I'll refactor so that those are included in a constant as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, my bad!

Comment on lines 39 to 50
COLUMN_FORM_DATA_PARAMS = [
"columns",
"all_columns",
"all_columns_x",
"groupby",
"order_by_cols",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two that I believe are missing here: series and entity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, good catch!

Comment on lines 233 to 248
@staticmethod
def get_metric_name_from_metric(
metric: Union[str, Dict[str, Any]]
) -> Optional[str]:
if isinstance(metric, str):
return metric

if isinstance(metric, dict) and metric.get("metric_name"):
return metric.get("metric_name")

return None
Copy link
Member

@villebro villebro Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong here, but isn't this function already implemented in from superset.utils.core import get_metric_name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think so, that function gets the "label" field off a metric instead of the "metric_name" one. And I don't see the label param anywhere in the BaseMetric class, only the metric_name one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember causing a regression a while back with similar reasoning as I did here. I didn't bother doing a deep dive into understanding how these differ from each other then, but it seems I need to, as I don't want to make the same mistake a third time. Is there a known reason why these metrics are different from the ones that appear in viz.py or sqla/models.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked further, and it seems like the metric field is a string if it's a metric_name, but an object with a label param if it's an adhoc_metric. Although I'm not sure if these labels would actually show up in the datasource, I'll add logic to handle them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing is what you're describing sounds more or less exactly like the other metrics that get_metric_name deals with. We don't need to deal with that now as it's outside the scope of this PR, but we definitely need to figure this out in the long run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably a confusion between "metrics" in form data and the "metric" model in Superset's backend. I think the "metrics" in form data are either metric.metric_name where metric is the metric model or an AdHocMetric in the JS frontend, which might not have any association with the metric model (i'm uncertain here).

We could probably improve this by standardizing on a single typedef/naming scheme and then migrating all the chart form_data, but this isn't an easy thing to do (and will only get harder once charts are more pluggable and we can't guarantee what type of data might be in form_data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been digesting this today, and I still think something isn't right here. If you look at viz.py, it's full of references to metrics in form_data. Copy/paste from viz.py: utils.get_metric_names(self.form_data.get("metrics") or []). Are these really regular ad-hoc metrics that have been generated by Superset controls? I put a break point on viz.py with a viz with an ad-hoc metric, and that had the metric name in label, not metric_name. Is there a chance that we have different metric signature conventions in different parts of the codebase?

Comment on lines 30 to 40
METRIC_FORM_DATA_PARAMS = [
"metric",
"timeseries_limit_metric",
"size",
"x",
"y",
"metric_2",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 this could be a good resource for the relevant params.

@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id):
dash = qry.one_or_none()
if not dash:
abort(404)
datasources = set()
datasources = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a defaultdict(lambda: defaultdict(list)) ? This means the logic in lines 1795 - 1801 could be simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this actually helps anything because we need to keep multiple fields in the dict of dicts, the datasource and a list of slices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, does it?

        datasources = defaultdict(lambda: { "slices": [] })
        for slc in dash.slices:
            datasource = slc.datasource
            if slc:
                datasources[datasource.uid]["datasource"] = datasource
                datasources[datasource.uid]["slices"].append(slc)

Or you can use datasource itself as the key

        datasources = defaultdict(list)
        for slc in dash.slices:
            if slc.datasource:
                datasources[slc.datasource].append(scl)

would need to update future loops accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the datasource can be the key as it's not hashable. I think your first solution will work though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably create a hash function in the datasource class?

def ___hash__(self):
    return self.uid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've added a __hash__ and an __eq__ function as according to https://docs.python.org/3/glossary.html#term-hashable

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from a1694fd to 8903783 Compare March 11, 2020 21:44
Copy link
Member Author

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed comments

Comment on lines 30 to 40
METRIC_FORM_DATA_PARAMS = [
"metric",
"timeseries_limit_metric",
"size",
"x",
"y",
"metric_2",
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that percent_metrics is referenced in the function below because it's an array of metrics and not an individual one. It looks like I missed secondary_metric though. I'll refactor so that those are included in a constant as well.

Comment on lines 233 to 248
@staticmethod
def get_metric_name_from_metric(
metric: Union[str, Dict[str, Any]]
) -> Optional[str]:
if isinstance(metric, str):
return metric

if isinstance(metric, dict) and metric.get("metric_name"):
return metric.get("metric_name")

return None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think so, that function gets the "label" field off a metric instead of the "metric_name" one. And I don't see the label param anywhere in the BaseMetric class, only the metric_name one

Comment on lines 39 to 50
COLUMN_FORM_DATA_PARAMS = [
"columns",
"all_columns",
"all_columns_x",
"groupby",
"order_by_cols",
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, good catch!

@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id):
dash = qry.one_or_none()
if not dash:
abort(404)
datasources = set()
datasources = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this actually helps anything because we need to keep multiple fields in the dict of dicts, the datasource and a list of slices

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from 8903783 to aef331a Compare March 11, 2020 22:12

# pull out all required metrics from the form_data
for param in METRICS_FORM_DATA_PARAMS:
if form_data.get(param):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is probably not needed. for metric in form_data.get(param, []) will just loop through an empty loop, which is pretty safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this just in case form_data[param] equals '' or 0 or something else not iterable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I think we can do for metric in (form_data.get(param) or []) and that should do the trick. Will modify and remove the if statement

# Filter out unneeded fields from the datasource payload
for key, value in datasources.items():
datasource_data = value["datasource"].data_for_slices(value["slices"])
datasources[key] = datasource_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just

datasources[key] = value["datasource"].data_for_slices(value["slices"])

The line length is close.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yup. that's definitely better. i had a refactor but didn't clean this up

@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id):
dash = qry.one_or_none()
if not dash:
abort(404)
datasources = set()
datasources = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, does it?

        datasources = defaultdict(lambda: { "slices": [] })
        for slc in dash.slices:
            datasource = slc.datasource
            if slc:
                datasources[datasource.uid]["datasource"] = datasource
                datasources[datasource.uid]["slices"].append(slc)

Or you can use datasource itself as the key

        datasources = defaultdict(list)
        for slc in dash.slices:
            if slc.datasource:
                datasources[slc.datasource].append(scl)

would need to update future loops accordingly.

Comment on lines 263 to 279
for param in METRIC_FORM_DATA_PARAMS:
metric = form_data.get(param)
metric_names.add(utils.get_metric_name(metric))

# pull out all required columns from the form_data
if form_data.get("adhoc_filters"):
for filter_ in form_data.get("adhoc_filters", []):
if filter_["clause"] == "WHERE" and filter_.get("subject"):
column_names.add(filter_.get("subject"))

for param in COLUMNS_FORM_DATA_PARAMS:
if form_data.get(param):
for column in form_data.get(param, []):
column_names.add(column)

for param in COLUMN_FORM_DATA_PARAMS:
if form_data.get(param):
column_names.add(form_data.get(param))
Copy link
Member

@ktmud ktmud Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can loop throw form_data keys and check whether it's in any of the consts instead.

        metric_names = []
        column_names = []

        for slc in slices:
            # pull out all required columns from the form_data
            for param, value in slc.form_data.items():
                if param == 'adhoc_filters':
                    column_names += [
                        filter_.get("subject") for filter_ in value
                        if filter_["clause"] == "WHERE" and filter_.get("subject")
                    ]
                elif param in METRICS_FORM_DATA_PARAMS:
                    metric_names += map(utils.get_metric_name, value)
                    # extra logic to handle adhoc metrics with column references
                    column_names += [
                        metric["column"].get("column_name")
                        for metric in value
                        if isinstance(metric, dict) and isinstance(metric["column"])
                    ]
                elif param in METRIC_FORM_DATA_PARAMS:
                    metric_names.append(utils.get_metric_name(value))
                elif param in COLUMNS_FORM_DATA_PARAMS:
                    column_names += value
                elif param in COLUMN_FORM_DATA_PARAMS:
                    column_names.append(value)

        metric_names = set(metric_names)
        column_names = set(column_names)

Btw the consts should probably be set().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

form_data can have a large number of parameters in it, I think it's a bit more efficient (if a couple extra lines of code) to only iterate through the ones we care about.

However, I went through the loop again and cleaned up some unnecessary checks

@ktmud
Copy link
Member

ktmud commented Mar 11, 2020

Awesome work, @etr2460 ! Added some style nits.

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from aef331a to 4652853 Compare March 12, 2020 00:32
Copy link
Member Author

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud adressed comments

# Filter out unneeded fields from the datasource payload
for key, value in datasources.items():
datasource_data = value["datasource"].data_for_slices(value["slices"])
datasources[key] = datasource_data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yup. that's definitely better. i had a refactor but didn't clean this up

@@ -1788,25 +1788,40 @@ def dashboard(self, dashboard_id):
dash = qry.one_or_none()
if not dash:
abort(404)
datasources = set()
datasources = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the datasource can be the key as it's not hashable. I think your first solution will work though


# pull out all required metrics from the form_data
for param in METRICS_FORM_DATA_PARAMS:
if form_data.get(param):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this just in case form_data[param] equals '' or 0 or something else not iterable


# pull out all required metrics from the form_data
for param in METRICS_FORM_DATA_PARAMS:
if form_data.get(param):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I think we can do for metric in (form_data.get(param) or []) and that should do the trick. Will modify and remove the if statement

Comment on lines 263 to 279
for param in METRIC_FORM_DATA_PARAMS:
metric = form_data.get(param)
metric_names.add(utils.get_metric_name(metric))

# pull out all required columns from the form_data
if form_data.get("adhoc_filters"):
for filter_ in form_data.get("adhoc_filters", []):
if filter_["clause"] == "WHERE" and filter_.get("subject"):
column_names.add(filter_.get("subject"))

for param in COLUMNS_FORM_DATA_PARAMS:
if form_data.get(param):
for column in form_data.get(param, []):
column_names.add(column)

for param in COLUMN_FORM_DATA_PARAMS:
if form_data.get(param):
column_names.add(form_data.get(param))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

form_data can have a large number of parameters in it, I think it's a bit more efficient (if a couple extra lines of code) to only iterate through the ones we care about.

However, I went through the loop again and cleaned up some unnecessary checks

@codecov-io
Copy link

codecov-io commented Mar 12, 2020

Codecov Report

Merging #9284 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9284   +/-   ##
=======================================
  Coverage   58.90%   58.90%           
=======================================
  Files         373      373           
  Lines       12026    12026           
  Branches     2953     2953           
=======================================
  Hits         7084     7084           
  Misses       4763     4763           
  Partials      179      179           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 724b8a3...d3f734e. Read the comment docs.

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch 2 times, most recently from 870a5b3 to 09a799a Compare March 12, 2020 17:58
@etr2460
Copy link
Member Author

etr2460 commented Mar 12, 2020

@john-bodley @villebro this is ready for another review


for param in METRIC_FORM_DATA_PARAMS:
metric = form_data.get(param)
metric_names.add(utils.get_metric_name(metric))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be similar logic for ad-hoc metrics here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. i'll add it

note that this has been pretty difficult to test, and even if a column is missing for a chart, I think the only implication is that we might lose custom formatting or a custom name for a column. It shouldn't break the entire chart

Copy link
Member

@john-bodley john-bodley Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etr2460 for an internal project to deal with scalar (metric) and list (metrics) I created a utility method,

def get_iterable(x: Any) -> List:
   """
   Get an iterable (list) representation of the object.
   
   :param x: The object
   :returns: An iterable representation
   """
 
  return x if isinstance(x, list) else [x]

thus you could combine the METRICS_FORM_DATA_PARAMS and METRIC_FORM_DATA_PARAMS lists (DRY) and do something of the form,

for param in ["metrics", "metric", ...]:
    for metric in get_iterable(form_data.get(param)):
        ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

metric_names.add(utils.get_metric_name(metric))

# extra logic to handle adhoc metrics with column references
if isinstance(metric, dict) and metric.get("column"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the is_adhoc_metric method here to improve readability and adhering to the DRY principle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo good catch

verbose_map = {"__timestamp": "Time"}
verbose_map.update(
{
o["metric_name"]: o["verbose_name"] or o["metric_name"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use metric or m rather than o?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this was copied from above, but agree a real name is better

)
verbose_map.update(
{
o["column_name"]: o["verbose_name"] or o["column_name"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use column or col rather than o?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this was copied from above, but agree a real name is better


if config["ENABLE_ACCESS_REQUEST"]:
for datasource in datasources:
for datasource in datasources.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Remove the .keys() as this is implicit when iterating over a dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someday i'll learn python

@@ -1807,6 +1809,11 @@ def dashboard(self, dashboard_id):
"superset/request_access/?" f"dashboard_id={dash.id}&"
)

# Filter out unneeded fields from the datasource payload
datasources_payload = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a dictionary comprehension,

datasources_payload = {
    datasource.uid: datasource.data_for_slices(slices) for datasource, slices in datasources.items()
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from 09a799a to d3f734e Compare March 12, 2020 20:45
Copy link
Member Author

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley comments addressed

metric_names.add(utils.get_metric_name(metric))

# extra logic to handle adhoc metrics with column references
if isinstance(metric, dict) and metric.get("column"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oo good catch


for param in METRIC_FORM_DATA_PARAMS:
metric = form_data.get(param)
metric_names.add(utils.get_metric_name(metric))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. i'll add it

note that this has been pretty difficult to test, and even if a column is missing for a chart, I think the only implication is that we might lose custom formatting or a custom name for a column. It shouldn't break the entire chart

verbose_map = {"__timestamp": "Time"}
verbose_map.update(
{
o["metric_name"]: o["verbose_name"] or o["metric_name"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this was copied from above, but agree a real name is better

)
verbose_map.update(
{
o["column_name"]: o["verbose_name"] or o["column_name"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this was copied from above, but agree a real name is better


if config["ENABLE_ACCESS_REQUEST"]:
for datasource in datasources:
for datasource in datasources.keys():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someday i'll learn python

@@ -1807,6 +1809,11 @@ def dashboard(self, dashboard_id):
"superset/request_access/?" f"dashboard_id={dash.id}&"
)

# Filter out unneeded fields from the datasource payload
datasources_payload = {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@john-bodley
Copy link
Member

@etr2460 it seems like you're tested this locally a bit and I know that @villebro has some questions/concerns but it would be good to potentially understand what type of regressions we may expect if say the current logic wasn't watertight given the various nuances of how entities are defined at the visualization and form-data level.

@etr2460
Copy link
Member Author

etr2460 commented Mar 12, 2020

So I took a look to see what code on the frontend depends on various properties on the datasource. You can see all the places charts depend on the datasource here: https://github.com/apache-superset/superset-ui-plugins/search?p=1&q=datasource&unscoped_q=datasource

From what I can see, the datasource is used primarily for two reasons:

  1. To add special number formatting to a metric as in: https://github.com/apache-superset/superset-ui-plugins/blob/17a5f64312326e708a81bab4a45578fd181b2e1e/packages/superset-ui-legacy-preset-chart-big-number/src/BigNumber/transformProps.js#L88-L92
  2. To format the name of a metric differently as in: https://github.com/apache-superset/superset-ui-plugins/blob/be604e24fd9c7ac715e448673c56bed40c5344ff/packages/superset-ui-legacy-plugin-chart-partition/src/transformProps.js#L43

All of the instances i've tested haven't broken any charts entirely, so I think the main source of regressions would be formatting regressions (e.g. rendering 0.03 instead of 3% or "metric_name_not_pretty" instead of "Pretty Metric Name"). I think that we could merge this now and deploy internally, being ready to revert fast if needed or patch with any minor issues that show up. Alternatively I could put this behind a feature flag and only enable it in our deployment to start if we thought that was safer. Up to @john-bodley and @villebro (and anyone else who has input)

@villebro
Copy link
Member

Like I said, this feature has obvious performance upside, but I expect some small regressions to come along with it. While they should be easy to fix, they could be annoying for anyone who prefers stability over performance. I really like the idea of putting this behind a feature flag for now, as it gives us the best of both worlds.

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from d3f734e to f8e3cb7 Compare March 13, 2020 01:02
@etr2460
Copy link
Member Author

etr2460 commented Mar 13, 2020

@villebro feature flag added!

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll be activating this flag on my devbox too, I can pitch in if I surface any weird bugs.

@john-bodley
Copy link
Member

@etr2460 could you address my comments before merging?

@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from f8e3cb7 to db6ffff Compare March 13, 2020 17:57
@etr2460 etr2460 force-pushed the erik-ritter--less-bootstrap-data branch from db6ffff to 11fded8 Compare March 13, 2020 18:41
Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@etr2460 etr2460 merged commit f80fadf into apache:master Mar 13, 2020
etr2460 pushed a commit to airbnb/superset-fork that referenced this pull request Mar 13, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants