-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Making data gaps visible in line charts by inserting zeros #3512
Conversation
If you think it makes more sense to enhance resampling on the server side (e.g. support the various period types, derive it from the time grain) I'm happy to improve that experience instead. |
I like the fact that "insert zeros" as a trigger is much more intuitive than "Resampling". I'm thinking server side on this one. Just because pandas is so simple for this. See how time grain is fairly loosely defined at the moment? https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs.py#L192 One easy approach would be add a corresponding pandas "Resampling" came before the "Time-Grain" concept and is now confusing/obsolete. I feel like we could trigger the pandas resampling based on time grain configuration with a simple checkbox. In that context, there may be some very few specific use cases where people may want the extra flexibility of resampling with specific fill patterns but I wouldn't be against deprecating the feature. |
An extra point for pandas vs having our own JS function for this is that pandas (from memory) deals nicely with timezones, such that when we want to start thinking about comprehensive time zone management pandas may do some of that magic too. |
K, I'll take care of this and bring it down to the server side, should be able to get something going by end of next week. We probably need to have a function to translate the time grains to pandas frequencies (like the one I wrote in JS to ms) because we also need to cover Druid and the granularities for Druid are handled completely independent. Even better would probably be to unify the concept of Apart from that I don't think that my function would have problems with time zones since it just looks at raw milliseconds since epoch... |
@fabianmenges This would be super helpful with respect to the time grain problems mentioned in #487 |
Description
For many different kinds of metrics you might want to immediately see that you did not record any values in a given time period (e.g. not registering any user impressions for the first minute of every hour). With the current line chart visualization this would be very difficult to spot, you would have to spot a "missing" marker: 1 out of 60 every hour.
This merge request adds a "insert zeros" control that takes the current selected
Time Grain
into account to insert 0s into the time series data if the time series did not contain any data points for a given period specified by theTime Grain
.The main difference to resampling is, that is does this entirely on the client side and only very few data points are added to the time series.
Example 1:
We are visualizing the
birth_names
example datasource. As you can see looking at the markers on the first screenshot we have one data point per year.As you can see, we selected the time grain "month". Checking the box "Insert Zeros" will now
scan the time series: if two data points are more than one month apart from one another it will insert a 0 data point into the time series accordingly.
Example 2:
Same dataset
birth_names
. I deleted all rows with a timestamps between 1967 and 1968. Without markers you would not be able to notice them missing, with markers its still pretty difficult:When you insert 0s it is easy to see:
Features:
I added unit tests for the granularity/time-grain/period parsing. Happy to add a unit test for the adding 0 logic, I couldn't find a good place to add the spec file.
This is related to: #487 without putting 0s everywhere
Happy to get some feedback