fix: report with timeout chart #16674

graceguo-supercat · 2021-09-11T22:51:26Z

SUMMARY

A user schedule may an email report for a dashboard in the future time. When webdriver takes screeshot of the dashboard, one or more of its charts may not well cached, and may take pretty long time to query. Current webdriver wait timeout is 60 seconds. If one of charts didn't get rendered after timeout, currently Superset will report an email with following error message but no screenshot of dashboard.

Error: Report Schedule execution failed when generating a screenshot.

To most of dashboard report users, a report with a couple of timeout chart (showing spinner) is still acceptable, much better than an error message without report. Could we still send report with timeout chart? @dpgaspar @betodealmeida @eschutho

TESTING INSTRUCTIONS

CI and manual test

codecov · 2021-09-11T23:08:35Z

Codecov Report

Merging #16674 (45c94ee) into master (4dc859f) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master   #16674      +/-   ##
==========================================
- Coverage   76.90%   76.90%   -0.01%     
==========================================
  Files        1005     1005              
  Lines       54007    54008       +1     
  Branches     7337     7337              
==========================================
  Hits        41536    41536              
- Misses      12231    12232       +1     
  Partials      240      240

Flag	Coverage Δ
hive	`81.19% <0.00%> (-0.01%)`	⬇️
mysql	`81.65% <0.00%> (-0.01%)`	⬇️
postgres	`81.68% <0.00%> (-0.01%)`	⬇️
presto	`81.47% <0.00%> (-0.05%)`	⬇️
python	`82.18% <0.00%> (-0.01%)`	⬇️
sqlite	`81.33% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/utils/webdriver.py	`80.89% <0.00%> (-0.92%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dc859f...45c94ee. Read the comment docs.

eschutho · 2021-09-13T20:27:10Z

@graceguo-supercat we're actually looking into getting more up-to-date data on the charts by always refreshing the data and not using cached results. I wonder, given this information if we could set a cache timeout on reports for something like 30 minutes and instead of erroring after 60 seconds either lengthening the timeout or retrying the dashboard after an error. It seems like we should deliver all of the charts to the users, and if we have to keep trying until the charts are all loaded, that should be ok, since it's an async operation anyway. WDYT?

graceguo-supercat · 2021-09-13T22:38:52Z

i like the idea about invalidate cache for report.
report cache and dashboard cache should be able to set separately.
I am not sure retry can solve this problem. Let say i have a slow query always taking ~100 seconds to run. During report time, webdriver triggers the query, after 60 seconds it got timeout exception, then the dashboard report will be hold for retry? And after webdriver triggered the 2nd try, it could not get the query results from 1st try at 100 seconds, right? So when is the best time to trigger next try?
currently we have 60 second timeout, can we have extra config allow wait longer? As a report, in my opinion, the queries should be verified and optimized, so that each chart/query should no longer than a limit. 60 second seems a little tough, maybe 120 seconds is good enough (without retry)?

john-bodley · 2021-09-13T22:59:21Z

@eschutho regardless of the timeout strategy are you supportive of at least capturing the screenshot per the PR?

betodealmeida · 2021-09-13T23:15:04Z

I think this PR is an improvement, regardless of the strategy we choose.

dpgaspar · 2021-09-14T10:40:42Z

superset/utils/webdriver.py

@@ -141,6 +141,7 @@ def get_screenshot(
            img = element.screenshot_as_png
        except TimeoutException:
            logger.warning("Selenium timed out requesting url %s", url, exc_info=True)
+            img = element.screenshot_as_png


is it always possible to call element.screenshot_as_png without raising when a TimeoutException occurs?

i guess it's webdriver raise TimeoutException right? i am not sure how to not raise it...

dpgaspar · 2021-09-14T10:48:26Z

i like the idea about invalidate cache for report.

report cache and dashboard cache should be able to set separately.

+1

I am not sure retry can solve this problem. Let say i have a slow query always taking ~100 seconds to run. During report time, webdriver triggers the query, after 60 seconds it got timeout exception, then the dashboard report will be hold for retry? And after webdriver triggered the 2nd try, it could not get the query results from 1st try at 100 seconds, right? So when is the best time to trigger next try?

currently we have 60 second timeout, can we have extra config allow wait longer? As a report, in my opinion, the queries should be verified and optimized, so that each chart/query should no longer than a limit. 60 second seems a little tough, maybe 120 seconds is good enough (without retry)?

you mean besides SCREENSHOT_LOCATE_WAIT and SCREENSHOT_LOAD_WAIT?

eschutho · 2021-09-14T18:00:03Z

@eschutho regardless of the timeout strategy are you supportive of at least capturing the screenshot per the PR?

yes, definitely. This is an good win for now.

graceguo-supercat · 2021-09-14T18:22:02Z

you mean besides SCREENSHOT_LOCATE_WAIT and SCREENSHOT_LOAD_WAIT?

that's what i need! I didn't realize these wait time already configurable. Thanks!

eschutho · 2021-09-15T00:18:55Z

I am not sure retry can solve this problem. Let say i have a slow query always taking ~100 seconds to run. During report time, webdriver triggers the query, after 60 seconds it got timeout exception, then the dashboard report will be hold for retry? And after webdriver triggered the 2nd try, it could not get the query results from 1st try at 100 seconds, right? So when is the best time to trigger next try?

Is 100 seconds with the data cached or uncached? If we set a configurable report cache buffer for example, and your longest query is at 30 seconds cached and 100 seconds uncached, and your total waits were 1 min, you could set the buffer for half an hour or so and the first time it loads it will fail and retry and be mostly cached. You should keep getting lower times until it it is all cached and you hit below the 60 sec mark.

Basically I think cache-busting the reports is good but within configurable limits, depending on the needs of the organization.

graceguo-supercat · 2021-09-15T00:42:27Z

you could set the buffer for half an hour or so and the first time it loads it will fail and retry and be mostly cached.

When did re-try happen? right after 1st fail or after cache expired? If i set schedule to send report at 7:00, and 1st try failed, did i get an email at 7:00? I think we probably should discuss cache/retry in its own PR or thread.

This PR is to make sure sending out report even with chart timeout.

eschutho · 2021-09-15T17:00:07Z

you could set the buffer for half an hour or so and the first time it loads it will fail and retry and be mostly cached.

When did re-try happen? right after 1st fail or after cache expired? If i set schedule to send report at 7:00, and 1st try failed, did i get an email at 7:00? I think we probably should discuss cache/retry in its own PR or thread.

This PR is to make sure sending out report even with chart timeout.

Yes, that discussion isn't related to your pr.. we can continue in a different thread! 👍

fix: report with timeout chart

45c94ee

pull-request-size bot added the size/XS label Sep 11, 2021

graceguo-supercat requested review from betodealmeida and dpgaspar September 13, 2021 18:44

dpgaspar reviewed Sep 14, 2021

View reviewed changes

john-bodley approved these changes Sep 15, 2021

View reviewed changes

graceguo-supercat merged commit 00ca21e into apache:master Sep 15, 2021

opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021

fix: report with timeout chart (apache#16674)

832b5c5

QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 28, 2021

fix: report with timeout chart (apache#16674)

ab764bf

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.4.0 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: report with timeout chart #16674

fix: report with timeout chart #16674

graceguo-supercat commented Sep 11, 2021 •

edited

Loading

codecov bot commented Sep 11, 2021 •

edited

Loading

eschutho commented Sep 13, 2021

graceguo-supercat commented Sep 13, 2021 •

edited

Loading

john-bodley commented Sep 13, 2021

betodealmeida commented Sep 13, 2021

dpgaspar Sep 14, 2021

graceguo-supercat Sep 15, 2021

dpgaspar commented Sep 14, 2021

eschutho commented Sep 14, 2021 •

edited

Loading

graceguo-supercat commented Sep 14, 2021

eschutho commented Sep 15, 2021

graceguo-supercat commented Sep 15, 2021

eschutho commented Sep 15, 2021

fix: report with timeout chart #16674

fix: report with timeout chart #16674

Conversation

graceguo-supercat commented Sep 11, 2021 • edited Loading

SUMMARY

TESTING INSTRUCTIONS

codecov bot commented Sep 11, 2021 • edited Loading

Codecov Report

eschutho commented Sep 13, 2021

graceguo-supercat commented Sep 13, 2021 • edited Loading

john-bodley commented Sep 13, 2021

betodealmeida commented Sep 13, 2021

dpgaspar Sep 14, 2021

Choose a reason for hiding this comment

graceguo-supercat Sep 15, 2021

Choose a reason for hiding this comment

dpgaspar commented Sep 14, 2021

eschutho commented Sep 14, 2021 • edited Loading

graceguo-supercat commented Sep 14, 2021

eschutho commented Sep 15, 2021

graceguo-supercat commented Sep 15, 2021

eschutho commented Sep 15, 2021

graceguo-supercat commented Sep 11, 2021 •

edited

Loading

codecov bot commented Sep 11, 2021 •

edited

Loading

graceguo-supercat commented Sep 13, 2021 •

edited

Loading

eschutho commented Sep 14, 2021 •

edited

Loading