Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] _add_filters_from_pre_query doesn't handle dim specs #3974

Merged
merged 2 commits into from
Dec 11, 2017

Conversation

Mogball
Copy link
Contributor

@Mogball Mogball commented Dec 1, 2017

Ok this should address the underlying issue behind #3943. _add_filter_from_pre_query_data wasn't at all handling dimension specs. Also added some better error checking on modifying dimension specs.

edit: #3889 similar issue

@ghulands
Copy link

ghulands commented Dec 1, 2017

Thanks @Mogball for taking a look at this. I just applied the patch and tested it out. I now get "No data was returned" instead of the exception. If I click on View Query it only shows the first phase of the query and not the second. When I run that query directly against druid, I get results for the topN.

@Mogball
Copy link
Contributor Author

Mogball commented Dec 1, 2017

@ghulands Thanks for letting me know. I removed a condition thinking Druid handles single "dimension" specs but I think I was right before to check if a single dimension is a spec. Checkout the change and let me know if it's working.

@Mogball Mogball force-pushed the jeff/pre_query_fix branch 2 times, most recently from 90661b0 to 3693e38 Compare December 1, 2017 18:43
@ghulands
Copy link

ghulands commented Dec 1, 2017

Still the same behavior observed with updated patch.

@Mogball
Copy link
Contributor Author

Mogball commented Dec 1, 2017

Could you post logs from running these queries? (And it's still just doing 1 phase right?)

@ghulands
Copy link

ghulands commented Dec 1, 2017

Yes, it's still only running 1 phase. I edited the log to reduce the map size of the dimension.

2017-12-01 19:07:43,675:DEBUG:root:�[36m[stats_logger] (incr) explore_json�[0m
2017-12-01 19:07:43,764:DEBUG:root:�[36m[stats_logger] (incr) loaded_from_source�[0m
2017-12-01 19:07:43,784:INFO:root:Running groupby query for dimensions [[{u'outputName': u'prettyName', u'extractionFn': {u'outputName': u'dimensionOutputName', u'lookup': {u'map': {u'Code1': u'PrettyName1', u'Code2': u'PrettyName2'}, u'type': u'map', u'isOneToOne': False}, u'retainMissingValue': False, u'type': u'lookup', u'replaceMissingValueWith': u'missing_value', u'dimension': u'dimensionName'}, u'type': u'extraction', u'dimension': u'codeName', u'outputType': u'STRING'}]]
2017-12-01 19:07:43,784:INFO:root:Running two-phase query for timeseries

@Mogball
Copy link
Contributor Author

Mogball commented Dec 1, 2017

@ghulands

Okay I think I figured it out. Extraction functions that change the name of dimension values (in your case, mapping from a code to a name) cause issues with the two-phase query because it builds additional filters from output values (which are different from the original values). _add_filters_from_pre_query will ignore extraction functions. Give it another go.

@ghulands
Copy link

ghulands commented Dec 1, 2017

Hi @Mogball, This now works! Thanks so much for the quick iterating on it.

@Mogball
Copy link
Contributor Author

Mogball commented Dec 1, 2017

@fabianmenges RFC on this solution

For two-phase queries, would it be better to remove extraction functions from pre-queries (and just use the dimension name) and add them back for the final query?

@fabianmenges
Copy link
Contributor

fabianmenges commented Dec 1, 2017

I think it makes sense to strip the pre-queries (phase 1 queries) from the extraction functions since they will only change how data is displayed and pre-queries won't be displayed.

def pre_update(self, col):
# If a dimension spec JSON is given, ensure that it is
# valid JSON and that `outputName` is specified
if col.dimension_spec_json:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the validation here in general, I feel like its a bit arbitrary that we validate that
dimension_spec_json is valid but we don't do any validation for SQL columns or SQL/Druid metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed as well. Should we starting adding validation (or 'warnings') for other things as well?

@Mogball Mogball force-pushed the jeff/pre_query_fix branch 4 times, most recently from b675caa to cf30a20 Compare December 2, 2017 01:36
pre_qry_dims.append(dim['dimension'])
else:
pre_qry_dims.append(dim)
pre_qry['dimensions'] = utils.unique_list(pre_qry_dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason your not just doing this list(set(pre_qry_dims))

@@ -746,3 +746,9 @@ def get_filter_key(f):
form_data['filters'] += [filtr]
# Remove extra filters from the form data since no longer needed
del form_data['extra_filters']

def unique_list(target_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this helper is useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the easy way to do this is list(set(list)) though you can usually just use a set from the get go and stick to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true didn't realise it was that simple 🥇

@mistercrunch mistercrunch merged commit 7d37442 into apache:master Dec 11, 2017
@Mogball Mogball deleted the jeff/pre_query_fix branch December 12, 2017 06:50
@Mogball Mogball restored the jeff/pre_query_fix branch December 12, 2017 06:50
@chirpy2291
Copy link

Hi, i am facing the same issue.On running single custom dimension i get the following error:
KeyError: u"None of [[u'outputName', u'extractionFn', u'type', u'outputType', u'dimension']] are in the [index]"

Can you specify which version of druid to use? and what is the fix? @Mogball does your fix solve this?

@Mogball
Copy link
Contributor Author

Mogball commented Feb 2, 2018

What version of superset are you using? I think this is in version 0.21.0

@chirpy2291
Copy link

chirpy2291 commented Feb 4, 2018 via email

@chirpy2291
Copy link

Does 0.21 solve both the mentioned problems?

I am using 0.20.
Extensions are non filterable.And single extesion dimension is non
groupable.Any reason why topN query doesnt work on superset for single
extension dimension?
Same works if i hit the same query on druid.

@Mogball
Copy link
Contributor Author

Mogball commented Feb 4, 2018

I'm pretty sure those issues we're ironed out by then. Give 0.21.0 a try

@chirpy2291
Copy link

Hi @Mogball that was great help.I am now able to group the custom dimensions as well as view them on charts.

Do we have any update/workaround for filtering the same custom extension.I am able to only filter them through Result filters only.

@Mogball
Copy link
Contributor Author

Mogball commented Feb 6, 2018

Not sure about that. Make an issue if there isn't one already.

michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
…#3974)

* Fixed _add_filters_from_pre_query for dimension specs

* add_filters_from_pre_query ignores extraction functions
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
…#3974)

* Fixed _add_filters_from_pre_query for dimension specs

* add_filters_from_pre_query ignores extraction functions
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.21.2 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.21.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants