-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time grain blacklist and addons to config.py #5380
Conversation
villebro
commented
Jul 12, 2018
•
edited
Loading
edited
- Refactor time grain logic to avoid the use of conflicting names and durations
- Harmonize discrepancies in time grains, e.g. P3M (Druid) vs P0.25Y (all others)
- Add TIME_GRAIN_BLACKLIST to config.py to be able to remove unwanted time grains, e.g. 5/10/15 mins
- Add TIME_GRAIN_ADDONS and TIME_GRAIN_ADDON_FUNCTIONS to config.py to add custom time grains not natively supported by Superset
- Add tests for blacklist and validity of defined time grains
Codecov Report
@@ Coverage Diff @@
## master #5380 +/- ##
==========================================
+ Coverage 63.28% 63.31% +0.03%
==========================================
Files 349 349
Lines 22121 22141 +20
Branches 2457 2457
==========================================
+ Hits 13999 14019 +20
Misses 8108 8108
Partials 14 14
Continue to review full report at Codecov.
|
Also added functionality to be able to add new custom time grains in Question: in |
Comments would be much appreciated @betodealmeida @mistercrunch . This PR would make it easier to manage and add new time grains in production environments. If this PR is not aligned with the future direction of Superset I will be happy make necessary changes or close the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, @villebro! It greatly simplifies the definition of new durations and reduces a lot of the duplication.
I have just a few comments on the code, mostly nits, and there's one place where the function is wrong (probably due to an accidental deletion).
Question: in config.py, which would be preferable: using a blacklist to remove predefined options (e.g. VIZ_TYPE_BLACKLIST), or effectively defining a whitelist (e.g. LANGUAGES)?
I think doing a blacklist works better here.
'P1Y': 'year', | ||
'1969-12-28T00:00:00Z/P1W': 'week_start_sunday', | ||
'1969-12-29T00:00:00Z/P1W': 'week_start_monday', | ||
'P1W/1970-01-03T00:00:00Z': 'week_ending_saturday', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add 'P1W/1970-01-04T00:00:00Z': 'week_ending_sunday',
here, for completeness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
superset/db_engine_specs.py
Outdated
|
||
|
||
def _create_time_grains_tuple(time_grains, time_grain_functions, blacklist): | ||
ret_list = list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: use []
to instantiate an empty list. It's negligibly faster, but the preferred style for Superset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will try to remember this in the future.
superset/db_engine_specs.py
Outdated
blacklist = blacklist if blacklist else [] | ||
for duration, func in time_grain_functions.items(): | ||
if duration not in blacklist: | ||
name = time_grains.get(duration, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: {}.get
already returns None
when the key is not present, so this can be written simply as:
name = time_grains.get(duration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
superset/db_engine_specs.py
Outdated
@@ -65,12 +94,22 @@ class BaseEngineSpec(object): | |||
"""Abstract class for database engine specific configurations""" | |||
|
|||
engine = 'base' # str as defined in sqlalchemy.engine.engine | |||
time_grains = tuple() | |||
time_grain_functions = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use {}
when instantiating an empty dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
'P1M': "DATE_TRUNC('month', {col}) AT TIME ZONE 'UTC'", | ||
'P0.25Y': "DATE_TRUNC('quarter', {col}) AT TIME ZONE 'UTC'", | ||
'P1Y': "DATE_TRUNC('year', {col}) AT TIME ZONE 'UTC'", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is much cleaner now. Love it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -52,6 +52,35 @@ | |||
|
|||
Grain = namedtuple('Grain', 'name label function duration') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of the name
attribute in Grain
. I checked the code and it seems like it's not being used anywhere, and grains can be uniquely identified by the duration now that they're ISO durations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I misremembered that core.grains_dict()
created a dict that mapped duration -> grain
and name -> grain
, but the latter was in fact label -> grain
. I've now removed name.
grain_functions = cls.time_grain_functions.copy() | ||
grain_addon_functions = config.get('TIME_GRAIN_ADDON_FUNCTIONS', {}) | ||
grain_functions.update(grain_addon_functions.get(cls.engine, {})) | ||
return _create_time_grains_tuple(grains, grain_functions, blacklist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to call this get_time_grains
, in order to be more explicit and also because it has changed from an attribute to a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
superset/db_engine_specs.py
Outdated
) | ||
time_grain_functions = { | ||
None: '{col}', | ||
'PT1S': "{col}) AT TIME ZONE 'UTC'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'PT1S': "DATE_TRUNC('second', {col}) AT TIME ZONE 'UTC'",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, good catch!
None: '{col}', | ||
'PT1S': "DATE_TRUNC('SECOND', {col})", | ||
'PT1M': "DATE_TRUNC('MINUTE', {col})", | ||
'PT5M': "DATEADD(MINUTE, FLOOR(DATE_PART(MINUTE, {col}) / 5) * 5, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Were you able to test these against Snowflake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; not sure if it's the most efficient way of doing it, but managed to get good performance on a semi-large dataset, so should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All proposed changes implemented.
@betodealmeida I tried to remove I'm sure I will be able to sort out how the frontend logic works, but it will take time, so I propose merging this and removing |
@villebro, no worries, I'll take a stab at removing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you so much for doing this, @villebro!
Great, thanks @betodealmeida ! |
@betodealmeida FYI rebased to resolve conflict, any chance of getting this merged out of the way? |
@mistercrunch, can you merge this? I still don't have merging privilege. |
* Add interim grains * Refactor and add blacklist * Change PT30M to PT0.5H * Linting * Linting * Add time grain addons to config.py and refactor engine spec logic * Remove redundant import and clean up config.py * Fix bad rebase * Implement changes proposed by @betodealmeida * Revert removal of name from Grain * Linting