Provide generalized aggregation #77

pluehne · 2018-01-10T13:24:19Z

This implements a generalized method for aggregating time-series data. Data can be aggregated over week or month intervals with a variety of aggregation methods to choose from.

This will be useful for providing chart views at different levels (such as two-year periods vs. just showing the last month). Additionally, the generalized form of aggregation can be used to smooth out graphs where the sampling frequency changed with an update to Hubble Enterprise.

The aggregation is done by splitting the time data into subsequent, gapless periods of time (weeks starting with Mondays or months), for each of which the aggregated values are then computed and returned.

Aggregation methods define how to aggregate the values within individual time periods. The following aggregation methods are supported:

sum
mean
min
max
first (the chronologically first available value for that period)
last
median

Periods with incomplete data at the beginning or the end of the time series are excluded from the aggregation.

Finally, the pull request usage chart is changed to make use of the new aggregation facilities to reduce the granularity from daily to monthly data for now. This might be changed when we implement detail views.

I also added several unit tests to check the aggregation methods (for off-by-one errors in particular) as well as a short piece of documentation on the new configuration options.

larsxschneider · 2018-01-10T13:44:12Z

docs/_layouts/default.html

@@ -27,7 +27,7 @@
 		<script src="{{ site.baseurl }}/assets/js/vendor/moment-with-locales.min.js"></script>
 		<script src="{{ site.baseurl }}/assets/js/vendor/Chart-2.7.1.min.js"></script>
 		<script src="{{ site.baseurl }}/assets/js/vendor/spin-2.3.2.min.js"></script>
-		<script src="{{ site.baseurl }}/assets/js/charts.js?version=1ff0187"></script>
+		<script src="{{ site.baseurl }}/assets/js/charts.js?version=e7e9c5a"></script>


Is there a way to generate a random hash or a hash over a file with Jekyll (via GitHub pages)?
/cc @parkr 😄

site.github.build_revision would work on Pages. https://help.github.com/articles/repository-metadata-on-github-pages/

@parkr: Thanks for the pointer! This would generate new ?version= hashes for each commit, right? Do you know of a way to, for example, get a hash based on the actual file content? In this way, we wouldn’t force client or server cache flushes with every single commit. (I’m just asking, your solution would solve our problem pretty well already!)

No hashes on a per-file basis are generated, so that’s the best solution that will achieve cache misses based on what’s in Pages.

Okay, then this is the way to go. Thanks, @parkr!

pluehne · 2018-01-15T17:31:37Z

In case someone is interested, I’ll come back to this soon. I wanted to have #81 in place before finishing this.

filmaj

This looks really good! Nice job on the tests - this PR bumps up the JS code coverage from 11% to 26%.

filmaj · 2018-01-15T18:33:33Z

docs/spec/charts.js

+    createList,
+    createTable,
+    createSpinner,
+    d3,


Globals can be defined in a variety of ways in eslint. Other than inlining the list into particular .js files (like we do here), you can also specify them via eslint config files, similarly to how we do in the browser-context specific .eslintrc.json file (both env and globals config properties, in their own different ways, define what globals are expected to be available). I think we should move any globals defined by external libraries that our scripts depend on, like d3, jquery and moment.js, for example, to the configuration files, rather than as per-js-file running lists. In this way, if our webapp frontend ends up adding a new .js file, we won't have to duplicate the inline comment allowing for d3 globals.

Thanks, that’s a good pointer. I wasn’t happy with inlining these globals either, and I second that it would be best to configure this in the .eslintrc.json.

I’ll return to this pull request in a couple of days. I’ll have to rebase these changes on top of the other new stuff, and then I’ll also adjust the global directives.

@filmaj: I tried to change the globals declaration for d3 to true without success. I’d say let’s tidy up the global declaration in a separate pull request 🙂.

Noted, I can take that on.

That would be nice, thanks 😄!

@pluehne I added another commit to my fork's branch of this PR that should fix that, feel free to pull in to this PR: https://github.com/filmaj/hubble/tree/patrick/improved-aggregation

@filmaj: Thanks a lot! What I didn’t understand was that there are multiple .eslintrc.json files with different purposes—one for linting the assets themselves and one for linting the unit tests.

I squashed your changes into my commit above (to not have one commit do it improperly and another one fixing it right after), and I hope that this is fine with you 🙂.

@pluehne yep perfectly fine 👍

filmaj · 2018-01-23T16:00:03Z

Huh, this code coverage comment uses the compact "header" layout. 🤔

larsxschneider · 2018-01-24T12:30:04Z

docs/assets/js/charts.js

+    data.sort((row1, row2) => row1['date'] - row2['date']);
+
+    const dateStart = data[0]['date'];
+    // Ranges are exclusive, so add one more day to include the last date


s/last date/last day/?

Refering to the “last date” was intentional, because this actually is about dates. However, “day” isn’t wrong either … not sure whether that’s worth rephrasing though.

larsxschneider · 2018-01-24T12:33:24Z

docs/assets/js/charts.js

+        // Note that this assumes complete data in the period
+        // Should data points be missing, aggregation methods such as the sum will lead to results that can't be
+        // compared to periods with complete data
+        // Hence, the maintainers of the data need to ensure that the input is well-formed


micro-nit: It might be more readable if you add a . at the end of the sentence.

I didn’t add periods, because we don’t have them in comments in other places. Should we do this everywhere then? Or put periods just between sentences in multisentence comments?

Okay, then I’d say that we put periods between sentences systematically except for the end of comments (as is practiced by many print magazines, for example).

I changed this as I suggested above.

larsxschneider · 2018-01-24T12:36:34Z

docs/assets/js/charts.js

+            function(keyID, key)
+            {
+                if (key == 'date')
+                    return;


micro-nit: that might be overly cautious as we use the value already in line 155 above

This check is to exclude the dates themselves from being aggregated.

The data points all have the structure:

{ "date": "yyyy-mm-dd", "key1": "value1", "key2": "value2", ... }

We only want to aggregate the values of key1 and key2 but not date.

This implements a generalized method for aggregating time-series data. Data can be aggregated over week or month intervals with a variety of aggregation methods to choose from. This will be useful for providing chart views at different levels (such as two-year periods vs. just showing the last month). Additionally, the generalized form of aggregation can be used to smooth out graphs where the sampling frequency changed with an update to Hubble Enterprise. The aggregation is done by splitting the time data into subsequent, gapless periods of time (weeks starting with Mondays or months), for each of which the aggregated values are then computed and returned. Aggregation methods define how to aggregate the values within individual time periods. The following aggregation methods are supported: - sum - mean - min - max - first (the chronologically first available value for that period) - last - median Periods with incomplete data at the beginning or the end of the time series are excluded from the aggregation. Finally, the pull request usage chart is changed to make use of the new aggregation facilities to reduce the granularity from daily to monthly data for now. This might be changed when we implement detail views. I also added several unit tests to check the aggregation methods (for off-by-one errors in particular) as well as a short piece of documentation on the new configuration options.

codecov-io · 2018-01-24T20:32:06Z

Codecov Report

Merging #77 into master will increase coverage by 10.52%.
The diff coverage is 70.37%.

pluehne added the enhancement label Jan 10, 2018

pluehne assigned larsxschneider Jan 10, 2018

pluehne self-assigned this Jan 10, 2018

pluehne requested a review from larsxschneider January 10, 2018 13:24

larsxschneider reviewed Jan 10, 2018

View reviewed changes

pluehne force-pushed the patrick/improved-aggregation branch 2 times, most recently from 194f329 to 7dfc075 Compare January 10, 2018 14:03

filmaj approved these changes Jan 15, 2018

View reviewed changes

pluehne added the work in progress label Jan 18, 2018

pluehne force-pushed the patrick/improved-aggregation branch 3 times, most recently from b5944e2 to 618bf5f Compare January 22, 2018 16:34

pluehne force-pushed the patrick/improved-aggregation branch from 618bf5f to 01b488f Compare January 23, 2018 13:47

pluehne force-pushed the patrick/improved-aggregation branch 3 times, most recently from f50edb7 to 379562a Compare January 24, 2018 11:29

pluehne removed the work in progress label Jan 24, 2018

larsxschneider reviewed Jan 24, 2018

View reviewed changes

pluehne force-pushed the patrick/improved-aggregation branch 2 times, most recently from ce65f17 to 7f0abc0 Compare January 24, 2018 12:54

pluehne force-pushed the patrick/improved-aggregation branch from 7f0abc0 to 01e714c Compare January 24, 2018 20:29

larsxschneider merged commit 74fce70 into master Jan 25, 2018

larsxschneider deleted the patrick/improved-aggregation branch January 25, 2018 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide generalized aggregation #77

Provide generalized aggregation #77

pluehne commented Jan 10, 2018 •

edited

Loading

larsxschneider Jan 10, 2018

parkr Jan 10, 2018

pluehne Jan 10, 2018

parkr Jan 10, 2018

pluehne Jan 10, 2018

pluehne commented Jan 15, 2018

filmaj left a comment

filmaj Jan 15, 2018

pluehne Jan 17, 2018

pluehne Jan 24, 2018

filmaj Jan 24, 2018

pluehne Jan 24, 2018

filmaj Jan 24, 2018

pluehne Jan 24, 2018

filmaj Jan 24, 2018

filmaj commented Jan 23, 2018

larsxschneider Jan 24, 2018

pluehne Jan 24, 2018

larsxschneider Jan 24, 2018

pluehne Jan 24, 2018

pluehne Jan 24, 2018

pluehne Jan 24, 2018

larsxschneider Jan 24, 2018

pluehne Jan 24, 2018

codecov-io commented Jan 24, 2018

Provide generalized aggregation #77

Provide generalized aggregation #77

Conversation

pluehne commented Jan 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pluehne commented Jan 15, 2018

filmaj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filmaj commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 24, 2018

Codecov Report

pluehne commented Jan 10, 2018 •

edited

Loading