Rework iteration to avoid overflow #68

marshallpierce · 2017-10-24T15:07:22Z

Incorporating a gaggle of smaller fixes I came upon while working on this:

Iteration now stops when the index with the max value is reached rather than going by total count. This avoids stopping too early when total count saturates (fixes Iteration stops too early when total_count has saturated #47, and a few parts of Address remaining logic that can overflow #38). Count since last iteration is also started from 0 at each iteration point, making it less prone to overflow.
Linear iteration was stopping one iteration too soon in some cases due to a subtle off-by-one.
Quantile iteration now skips straight to quantile 1.0 if it reaches the index with the max value, just like the Java impl. See long-winded comments, and even more long-winded tests.
Rather than having pickers provide accessors for metadata about what was just picked, have pick() return a metadata struct with optional fields so that the timeline is clear and pickers don't need mostly useless fields to hang on to data until it can be requested 1 function call later.
Ported iteration tests from the old rust implementation.

…porky enough to stand on its own and data_access is a port of a Java test file anyway.

- `LinearIterator` no longer aborts one step too early in the final bucket. - `PickyIterator` now returns metadata when it picks. This allows `IterationValue` to provide better data about the current iteration progress without introducing a separate stage in the `PickyIterator` lifecycle to query about what was just picked. - count since last iteration is now reset every iteration, making it less prone to overflow. - end-of-histogram is detected by comparing with max nonzero index, not total count, which avoids overflow.

- Supply count at current index to `pick()` since we already have that available - Quantile iterator won't get stuck asymptotically chasing quantile 1.0_f64 - More tests

marshallpierce · 2017-10-24T15:08:48Z

src/iterators/mod.rs

-        self.value
+    /// The value iterated to. Some iterators provide a specific value inside the bucket, while
+    /// others just use the highest value in the bucket.
+    pub fn value_iterated_to(&self) -> u64 {


Maybe this rename-for-clarity is not worth the compatibility concern?

I think it's fine.

marshallpierce · 2017-10-24T15:10:45Z

src/iterators/mod.rs


                    // make sure we don't add this index again
                    self.fresh = false;
                }
            }

            // figure out if picker thinks we should yield this value
-            if self.picker.pick(self.current_index, self.total_count_to_index) {
-                let val = self.current();


I felt it was easier to reason at-a-glance about when the iterator's fields were used with this function inlined, since it matters that you read things like count_since_last_iteration before resetting a few lines below.

marshallpierce · 2017-10-24T15:11:10Z

src/iterators/quantile.rs


 /// An iterator that will yield at quantile steps through the histogram's value range.
 pub struct Iter<'a, T: 'a + Counter> {
    hist: &'a Histogram<T>,
    ticks_per_half_distance: u32,
    quantile_to_iterate_to: f64,
-    quantile_just_picked: f64


this is what returning Option<PickMetadata> lets us avoid

marshallpierce · 2017-10-24T15:12:28Z

src/iterators/quantile.rs

+            return None;
+        }
+
+        // Because there are effectively two quantiles in play (the quantile of the value for the


comments per LoC might be getting a little out of hand in this struct, but I like to beat historically confusing logic into submission with overwhelming documentation

I have no objection to this comment. Seems well-placed and useful.

marshallpierce · 2017-10-24T15:13:31Z

tests/data_access.rs

@@ -542,365 +539,3 @@ fn total_count_exceeds_bucket_type() {

    assert_eq!(400, h.count());
 }
-
-#[test]


quantile tests moved to their own file

jonhoo · 2017-10-24T15:17:19Z

src/iterators/mod.rs

-
-                // TODO count starting at 0 each time we emit a value to be less prone to overflow
-                self.prev_total_count = self.total_count_to_index;
+            if let Some(metadata) = self.picker.pick(self.current_index, self.total_count_to_index, self.count_at_index) {


I think rustfmt might complain about this line.

Good point. I haven't been rustfmt-ing but I'm happy to adopt it. I don't even really care if its formatting is suboptimal for some aesthetic; consistency is good. I'll format the iterator files.

marshallpierce · 2017-10-24T15:18:21Z

@algermissen here's how the quantile iteration example looks now:

jot 10000 0 10000000000 | cg run --example cli -- serialize -c | cg run --example cli -- iter-quantiles -t 5

       Value          QuantileValue      QuantileIteration TotalCount 1/(1-Quantile)

           0 0.00010000000000000000 0.00000000000000000000          1           1.00
   999292927 0.10000000000000000555 0.10000000000000000555       1000           1.11
  1999634431 0.20000000000000001110 0.20000000000000001110       2000           1.25
  3001024511 0.30009999999999997788 0.30000000000000004441       3001           1.43
...
  9999220735 0.99990000000000001101 0.99985351562500002220       9999        6826.67
  9999220735 0.99990000000000001101 0.99987792968750000000       9999        8192.00
  9999220735 0.99990000000000001101 0.99989013671875004441       9999        9102.22
 10007609343 1.00000000000000000000 0.99990234375000008882      10000       10240.00
 10007609343 1.00000000000000000000 1.00000000000000000000      10000              ∞
#[Mean       = 5000000916.51, StdDeviation   = 2887040392.98]
#[Max        =  10007609343, Total count    =        10000]
#[Buckets    =           54, SubBuckets     =        56320]

The 1/(1-quantile) column is now using the quantile iterated to, not the quantile of the value of the current bucket, so there's only one row with ∞. Also, quantile iteration now skips all intermediate steps to 1.0 once it reaches the last bucket. (Small proviso on that: if total count saturates, it doesn't do the skipping thing. Could be fixed with more fiddling, but this PR was getting complex enough, and probably nobody will ever complain about that...)

jonhoo · 2017-10-24T15:21:53Z

This is great! I also think the long comment is useful (not to mention the many extra tests).

See my one comment about rustfmt, but apart from that, lgtm.

This release has a couple of backwards-incompatible changes: - the old `len()` is now `distinct_values()` - the new `len()` is the old `count()` (which is deprecated) - `IterationValue::value` became `value_iterated_to` Some other API changes: - iterator values gained `quantile_iterated_to()` - `Histogram` gained `is_empty()` Behind the scenes: - #67 and #68 landed a number of fixes to iterators such that the produced values are more correct and sensible. - errors were moved into their own module.

algermissen · 2017-10-26T11:43:56Z

@marshallpierce Thanks a lot, the iteration logic looks definitely fine now. The double line at end is gone for my cases.

One nit: The CLI output adds a column which breaks the format named 'hgrm' by the original hdrhistogram and CLI output cannot be loaded by the original plot HTML pages. No big deal, just wanted to give that feedback.

marshallpierce · 2017-10-26T12:47:37Z

@algermissen yeah, I'm a little conflicted on that.

Anti-extra-column:

It's lame to break other tools by tweaking formats.

Pro-extra-column:

The extended format is a little more comprehensible since it shows why it's possible to have a quantile change without the value changing
We shouldn't be parsing a fragile text format for machine processing; we should use tools like https://hdrhistogram.github.io/HdrHistogramJSDemo/plotFiles.html that operate on the serialized histogram

Overall I'm weakly in favor of keeping the output the way it is (human oriented) but I could be persuaded.

algermissen · 2017-10-26T13:02:14Z

@marshallpierce I agree with you - I am only building a proof of concept throw-away tool, so I am looking for the least amount of work. Otherwise I'd never use the text format.

As I am fine with copy/pasting your loop and removing the col by hand, no need for me to pursuade you in dropping the col :-)

Keep it as it is - it's better - my 2ct

Marshall Pierce and others added 6 commits October 18, 2017 17:39

Failing test to show that iter_recorded stops too soon

5e02609

Pull quantile stuff out of data_access because the quantile stuff is …

95436cb

…porky enough to stand on its own and data_access is a port of a Java test file anyway.

Port iteration tests from the old rust impl

85eadf7

Improve quantile iteration logic, and document its quirkiness.

beec46f

- Supply count at current index to `pick()` since we already have that available - Quantile iterator won't get stuck asymptotically chasing quantile 1.0_f64 - More tests

Use quantile_iterated_to() in iter-quantiles example.

e250d1b

marshallpierce commented Oct 24, 2017

View reviewed changes

jonhoo reviewed Oct 24, 2017

View reviewed changes

marshallpierce added 2 commits October 24, 2017 10:39

Apply rustfmt to the iterator code and tests.

dd2fd6c

Merge branch 'master' into iter-recorded-total-count

b9e12c9

marshallpierce merged commit 2758171 into master Oct 24, 2017

marshallpierce deleted the iter-recorded-total-count branch November 1, 2017 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework iteration to avoid overflow #68

Rework iteration to avoid overflow #68

marshallpierce commented Oct 24, 2017

marshallpierce Oct 24, 2017

jonhoo Oct 24, 2017

marshallpierce Oct 24, 2017

marshallpierce Oct 24, 2017

marshallpierce Oct 24, 2017

jonhoo Oct 24, 2017

marshallpierce Oct 24, 2017

jonhoo Oct 24, 2017

marshallpierce Oct 24, 2017

marshallpierce commented Oct 24, 2017

jonhoo commented Oct 24, 2017

algermissen commented Oct 26, 2017

marshallpierce commented Oct 26, 2017 •

edited

Loading

algermissen commented Oct 26, 2017

Rework iteration to avoid overflow #68

Rework iteration to avoid overflow #68

Conversation

marshallpierce commented Oct 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marshallpierce commented Oct 24, 2017

jonhoo commented Oct 24, 2017

algermissen commented Oct 26, 2017

marshallpierce commented Oct 26, 2017 • edited Loading

algermissen commented Oct 26, 2017

marshallpierce commented Oct 26, 2017 •

edited

Loading