Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] ECharts rows count indicator pill is lying #26402

Closed
2 of 3 tasks
rumbin opened this issue Jan 3, 2024 · 7 comments
Closed
2 of 3 tasks

[Regression] ECharts rows count indicator pill is lying #26402

rumbin opened this issue Jan 3, 2024 · 7 comments

Comments

@rumbin
Copy link
Contributor

rumbin commented Jan 3, 2024

The rows count indicator pill is lying for (timeseries?) ECharts (Line Chart, Bar Chart, Area Chart,...) when grouping by dimensions.
In my understanding this is caused by the result set being pivoted across the chosen dimensions and counting the rows only after the pivot operation.

This bug is causing users to misjudge the influence of the applied row limit, as it conceals the fact that the row limit has been reached!

I labeled this issue as a regression, as I cannot remember having encountered this issue for, e.g. the old Time series line/bar charts. But I haven't checked, if this issue was introduced by ECharts or if it used to work there as well.

How to reproduce the bug

  1. Use a sufficiently large dataset, e.g., this MWE for Postgres/Snowflake:
with


---- Postgres:
-- spine as (
--     select
--         d.ts
--     from generate_series(
--         '2024-01-01'::timestamp 
--         , '2024-01-31'::timestamp
--         , '1 day'::interval
--     ) d(ts)
-- )

---- Snowflake:
spine as (
    select
        dateadd(day, seq4(), '2024-01-01') as ts
    from table(generator(rowcount => 31))
)


, vals as (
    select
        *
    from (values
        ('a', 1)
        , ('b', 2)
        , ('c', 3)
        , ('d', 4)
    ) v(series,val)
)

select
  *
from
  spine
  , vals
  1. Save the query as a virtual dataset
  2. Create a time series EChart, e.g. a Line Chart
  3. Use ts as x-axis, series as the dimension and avg(val) as the metric
  4. set ROW LIMIT to 100
  5. run the query

Expected results

The row indicator pill must display 100 rows on a red background, in order to indicate that we ran into the ROW LIMIT.

We can observe the expected behavior when switching to a Pivot Table chart:

image

Notice how the RESULTS table shows the data in un-pivoted representation.

Actual results

  • We observe gaps in some of the series and discover n/a values in the RESULTS table.
  • Notice that the rows count indicator claims that the result set consists of merely 31 rows, i.e. far away from the applied ROW LIMIT
  • Also notice how the RESULTS table shows the data in pivoted representation.

image

Environment

(please complete the following information):

  • browser type and version: [should not be relevant here]
  • superset version: 3.0.3rc2
  • python version: [should not be relevant here]
  • node.js version: [should not be relevant here]
  • any feature flags active:
FEATURE_FLAGS = {

    "THUMBNAILS": True,
    "THUMBNAILS_SQLA_LISTENERS": True, # does this flag still exist?
    "SQLLAB_BACKEND_PERSISTENCE": True,
    "ENABLE_TEMPLATE_PROCESSING": True,
    "DASHBOARD_CROSS_FILTERS": True,
    "ALLOW_ADHOC_SUBQUERY": True,
    "ALERT_REPORTS": True,
    "GENERIC_CHART_AXES": True, # default now?
     "CACHE_IMPERSONATION": True,
    "DRILL_TO_DETAIL": True,
    "DRILL_BY": True,
    "ENABLE_JAVASCRIPT_CONTROLS": True,

}

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Somewhat related issues:

@rumbin
Copy link
Contributor Author

rumbin commented Jan 3, 2024

Just checked: Superset 2.0.1 was also affected by this bug.

@rumbin
Copy link
Contributor Author

rumbin commented Jan 17, 2024

Is there really nobody feeling like this is a big issue regarding trustworthiness of Superset-generated charts?

@rusackas , @villebro, I hate to tag you here, but I have a feeling this is under your radar.

@yousoph
Copy link
Member

yousoph commented Feb 5, 2024

In your example under Actual Results, are you saying that 31 should also be highlighted red? If so, I agree

@rumbin
Copy link
Contributor Author

rumbin commented Feb 6, 2024

@yousoph, I don't know what's the best way to deal with it.

  • Either keep the pivoted rows count (31 in my example) and colour the pill red, since the query ran into the row limit,
  • Or display the original number of rows returned by the query (100) and colour the pill red.

Both can be somewhat confusing to the user, as the pivoting happens without the user explicitly asking for it. Tho the user may be confused that the limit is set to 100 and the indicator warms that we ran into the limit with only 31 rows.
Or the user may be confused that only 31 rows are displayed in the result set when the pill indicates 100 rows returned.

Anyway, I think technically the 2nd option is cleaner and less (often) confusing.

@rusackas
Copy link
Member

rusackas commented Feb 6, 2024

@rumbin would you want to join our next Superset Town Hall and discuss this? I think there are a lot of intricacies to it... it might be a bit much to hash out in typed form.

I like the as-is row count limit (query results) being displayed and triggering the red limit warning. This is partly because the limit you're encountering is based on the chart executing it's logic. If you change charts, you're still subject to these limitations. The row limit is indeed being hit, so it's not fundamentally wrong.

Also, in certain circumstances, if you set the limit to be 100 pivoted rows, that could be some gigantic number of database rows, which would cause all sorts of problems. But... it's a good thing to know, and a reasonable thing to have its own limit control for. I think having a limit for things lie this (number of pivoted rows, number of bars on a bar chart, number of clusters on a map) is useful, and should probably have a means to trigger a separate limit warning alongside the row limit one.

So... I think there's a lot of design discussion to go into here. Which might turn into a non-trivial engineering project. I'm not sure who has the bandwidth to take that on in their roadmap if we align on the design, but its sounds like a SIP :)

@yousoph
Copy link
Member

yousoph commented Apr 1, 2024

#27700

@rusackas
Copy link
Member

Closing this as resolved by the linked PR. Holler if it needs any further attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants