Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate limit setting for SqlLab #4941

Merged
merged 3 commits into from
Nov 7, 2018

Conversation

jeffreythewang
Copy link
Contributor

Summary

  • Add functionality to wrap SQL Lab queries with a limit

Description

A common problem people have is that they input a query in SqlLab to preview their data before hitting visualize to create a table, and then it takes forever to run. This can be a pain (on them and potentially on the database).

Currently one workaround is to manually enter a limit in the query editor, visualizing (which creates the datasource), and then later removing it from the base query (by editing the created datasource). With this in mind, an additional/alternative solution might be to allow users to edit the sub-query in the visualize modal before creating their datasource.

So here's a way to have a limit only in the context of SQL editor. The limit is only applied as a wrapper for select queries, and not saved in any persistent database.

Examples

screen shot 2018-04-05 at 6 48 03 pm

screen shot 2018-04-05 at 6 47 51 pm

screen shot 2018-04-05 at 6 48 28 pm

screen shot 2018-04-09 at 4 21 44 pm

Related PR

#4834 - I like this idea, but not sure if every database supports prefetching. If going with this route, I'd have a configurable page size.

Related Issues

#4588

@mistercrunch
Copy link
Member

I don't have time to really review now but wanted to make sure you've taken this https://github.com/apache/incubator-superset/blob/master/superset/assets/src/SqlLab/components/SqlEditor.jsx#L195 into consideration

@jeffreythewang
Copy link
Contributor Author

jeffreythewang commented May 7, 2018

Yeah I have. I think that should still show up as it indicates whether the limit was hit. It should retain the current behavior of not showing up if the number of rows returned is less than the limit applied. (or are you referring to some other behavior?)

@codecov-io
Copy link

codecov-io commented May 7, 2018

Codecov Report

Merging #4941 into master will decrease coverage by 0.01%.
The diff coverage is 69.23%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #4941      +/-   ##
=========================================
- Coverage   77.01%     77%   -0.02%     
=========================================
  Files          64      64              
  Lines        9508    9516       +8     
=========================================
+ Hits         7323    7328       +5     
- Misses       2185    2188       +3
Impacted Files Coverage Δ
superset/views/base.py 67.44% <ø> (ø) ⬆️
superset/config.py 92.36% <100%> (ø) ⬆️
superset/views/core.py 73.74% <57.14%> (-0.04%) ⬇️
superset/sql_lab.py 71.34% <80%> (-0.26%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69e8df4...9b886af. Read the comment docs.

@mistercrunch
Copy link
Member

Sorry about the delay. Let's get this through. Mind rebasing?

@jeffreythewang
Copy link
Contributor Author

jeffreythewang commented Jun 5, 2018

@mistercrunch There are a few things to consider given the current changes in master.

Since now you are fetching the limit if it exists in the query, should the query's written limit override the one specified in the LimitControl field? Or should it just add wrap on top of it? In the current implementation (of this un-rebased branch), the limit is applied as a wrap, and I think that makes more sense, as it separates the underlying query component from the limiting component, and it is clear that the limiting only happens in the SQL Lab context. But in the to-be-rebased version it seems like I have to choose between the SQL editor limit (that is fetched with regex) vs. the LimitControl field's limit.

For example, if I have a query like:

SELECT * FROM birth_names LIMIT 1000

and the LimitControl field has a value of 100, the current version of the code (in my un-rebased branch) would generate:

SELECT * FROM (SELECT * FROM birth_names LIMIT 1000) as subquery LIMIT 100

whereas in the to-be-rebased version, I have to apply either 1000 or 100, unless I change some functionality in db_engine_specs, or have database-specific configurations for wrapping vs forcing a limit.

This can lead to even more confusion if we have a query with multiple limits, for example:

SELECT * FROM (SELECT * FROM birth_names WHERE gender LIKE 'boy' LIMIT 10)
UNION ALL
SELECT * FROM birth_names WHERE gender LIKE 'girl' LIMIT 5

Does the user know that only the 5 will be fetched with regex? This should return 15 rows but instead it only returns 5 because of how the limit is applied. And then what if I apply a LimitControl limit on top of that?

(As an aside, for the query:

SELECT * FROM (SELECT * FROM birth_names WHERE gender LIKE 'boy' LIMIT 10)
UNION ALL
SELECT * FROM (SELECT * FROM birth_names WHERE gender LIKE 'girl' LIMIT 5)

Neither of these limits are fetched with the regex.)

I think whichever route we do end up going with, the LimitControl field's limit should be applied as a wrap.

Let me know if I'm misunderstanding how implementation of limiting (in current master) works, or if there is a better way to resolve this conflict that makes more sense.

@timifasubaa
Copy link
Contributor

timifasubaa commented Oct 15, 2018

@jeffreythewang a few recommendations for us get this through.

  1. Let's set the UI limit_query to a default of 10K and make this configurable on the backend (and the value gets propagated to the frontend).
  2. Let's send the SQL_MAX_ROW from backend to the frontend and do some validation to prevent the user from exceeding that value in the UI.
  3. Let's change the logic for overriding the limit to compare the value passed from the frontend and the existing limit specified in the query (instead of SQL_MAX_ROW). (https://github.com/apache/incubator-superset/blob/dd9eeda03e078ad053ddc2eb7170b7e281047a49/superset/sql_lab.py#L155)

With the three changes here, it should work. I'm happy to help out with any of the parts.

Copy link
Contributor Author

@jeffreythewang jeffreythewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going with having the UI box be the central point of all limit information for the user, then #5774 can be abandoned, and code related to that can be removed.

query = Query(
database_id=int(database_id),
limit=mydb.db_engine_spec.get_limit_from_sql(sql),
limit=min(lim for lim in limits if lim is not None),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built this with the assumption that we are going to do limiting via query replacement within the context of SQL Lab, so only the smaller limit is chosen.

@@ -366,6 +366,8 @@ export const initialState = {
workspaceQueries: [],
queriesLastUpdate: 0,
activeSouthPaneTab: 'Results',
defaultQueryLimit: 100,
maxRow: 10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these defaults get passed from the backend's config.py to the frontend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults are solely for unit tests.

style={{ cursor: 'pointer' }}
onClick={this.handleToggle.bind(this)}
>
LIMIT {this.props.value || this.props.maxRow || '∞'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is infinity still a possibility?

@@ -282,6 +282,9 @@
# in the results backend. This also becomes the limit when exporting CSVs
SQL_MAX_ROW = 100000

# Default row limit for SQL Lab queries
DEFAULT_SQLLAB_LIMIT = 10000

# Limit to be returned to the frontend.
DISPLAY_MAX_ROW = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to 10K or make it equal to DEFAULT_SQLLAB_LIMIT?

@timifasubaa
Copy link
Contributor

LGTM, thanks for working on this Jeff. I've asked @graceguo-supercat to take a second look in case I missed anything.

const value = 100;
wrapper = shallow(factory({ ...defaultProps, value }));
wrapper.find(Label).first().simulate('click');
setTimeout(() => {
Copy link

@graceguo-supercat graceguo-supercat Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setTimeout may cause flaky when running in Travis CI, do you really need it? also I saw in the ListControl you have Overlay, Popover, but you test on Tooltip component?

Copy link
Contributor Author

@jeffreythewang jeffreythewang Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced Tooltip assertions with assertions on validationErrors instead.

@jeffreythewang
Copy link
Contributor Author

Note that this is not ready to be merged until we find a way to load in the default values without clearing the existing state of the redux store.

@graceguo-supercat
Copy link

graceguo-supercat commented Oct 24, 2018

hi @jeffreythewang Superset use redux-localstorage store sql lab user queries. Currently we sync sql_lab Redux's complete store state with localStorage. check persistState method, it allow you only sync with some parts.

It makes sense that sql_lab configuration should be set from server-side (by passing through bootstrap data), and only user data(mostly queries) are stored in localStorage.

@graceguo-supercat
Copy link

graceguo-supercat commented Oct 25, 2018

hi @jeffreythewang I talked this feature with @kristw. He has a nice suggestion: can we allow query limit for each different tab?

  • if there is no settings for a tab, use default value from server-side config.
  • if user changed/updated the limit for an existed tab, we sync the limit into localStorage and it will be loaded for the next time.

@jeffreythewang
Copy link
Contributor Author

@graceguo-supercat Yup that is how it currently works in my implementation. Each tab has it's own queryLimit, saved into localStorage.

}

setValueAndClose(val) {
this.setState({ textValue: val }, this.submitAndClose.bind(this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the binding in constructor so it won't create new function every time this gets called.


isValidLimit(limit) {
const value = parseInt(limit, 10);
return !(isNaN(value) || value <= 0 || (this.props.maxRow && value > this.props.maxRow));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Number.isNaN instead of isNaN

bsStyle="primary"
className="float-left ok"
disabled={!isValid}
onClick={this.submitAndClose.bind(this)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind in constructor once.

<Overlay
rootClose
show={this.state.showOverlay}
onHide={this.handleHide.bind(this)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind in constructor.

<div>
<Label
style={{ cursor: 'pointer' }}
onClick={this.handleToggle.bind(this)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind in constructor

const errorMsg = 'Row limit must be positive integer' +
(this.props.maxRow ? ` and below ${this.props.maxRow}` : '');
return (
<Popover id="sqllab-limit-results">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the id necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily for getting rid of the warning. Do we want to remove it?

const textValue = this.state.textValue;
const isValid = this.isValidLimit(textValue);
const errorMsg = 'Row limit must be positive integer' +
(this.props.maxRow ? ` and below ${this.props.maxRow}` : '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and below => not greater than

condition is value <= maxRow ?

@jeffreythewang
Copy link
Contributor Author

jeffreythewang commented Oct 29, 2018

I've updated this with a working implementation for updating localStorage.

How I tested:

  1. Check out master and run SQL Lab to ensure the existing state is properly loaded.
  2. Check out this branch and ensure that defaultQueryLimit and maxRow are loaded.
  3. Update backend config values DEFAULT_SQLLAB_LIMIT, and SQL_MAX_ROW and ensure that they are properly updated on the frontend.

@graceguo-supercat I initially tried using persistState on creation of the store, but this only seemed to work for updating existing keys already loaded into the store (though this may have been affected by the structure of the object).

Use separate param for wrap sql

Get query limit from config

unit tests for limit control rendering in sql editor

py unit test

pg tests

Add max rows limit

Remove concept of infinity, always require defined limits

consistency

Assert on validation errors instead of tooltip

fix unit tests

attempt persist state

pr comments and linting
# Limit to be returned to the frontend.
DISPLAY_MAX_ROW = 1000
# Default row limit for SQL Lab queries
DEFAULT_SQLLAB_LIMIT = 10000
Copy link

@graceguo-supercat graceguo-supercat Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch Sql Lab used to only render 1000 rows for each query, but fetch a much bigger number of rows of results. After this feature, sql lab will fetch and render same number of results.
Do you have any suggestion on this default query limit number?

Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

bipinsoniguavus pushed a commit to ThalesGroup/incubator-superset that referenced this pull request Dec 26, 2018
* Add separate limit setting for SqlLab

Use separate param for wrap sql

Get query limit from config

unit tests for limit control rendering in sql editor

py unit test

pg tests

Add max rows limit

Remove concept of infinity, always require defined limits

consistency

Assert on validation errors instead of tooltip

fix unit tests

attempt persist state

pr comments and linting

* load configs in via common param

* default to 1k
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants