Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete overhaul of the difference reporter #124

Merged
merged 1 commit into from
Mar 12, 2019
Merged

Complete overhaul of the difference reporter #124

merged 1 commit into from
Mar 12, 2019

Conversation

dsnet
Copy link
Collaborator

@dsnet dsnet commented Feb 27, 2019

The previous implementation of the reporter simply listed all differences,
each qualified by the full path to the difference.
This method of reporting is exact, but difficult for humans to parse.
It is one of the more common sources of complaints by users and a significant
reason why cmp is not preferred over competing libraries.

This change reimplements the reporter to format the output as a
structured literal in pseudo-Go syntax. The output resembles literals that
the user would likely have in their test code. Differences between the
x and y values are denoted by a '-' or '+' prefix at the start of the line.

An overview of the new implementation is as follows:

  • report.go: The defaultReporter type implements the Reporter interface.

  • report_value: Through the PushStep/PopStep API, the defaultReporter is able
    to contruct an in-memory valueNode tree representing the comparison of
    x and y as cmp.Equal walks the sub-values.

  • report_compare.go: After report_value.go constructs an AST-representation
    of the compared values, report_compare.go formats the valueNode tree as a
    textNode tree, which is the textual output in a tree form.
    Some relevant design decisions include:

    • The format logic goes through effort to avoid printing ignored nodes.
    • Some number of surrounding equal (but not ignored) struct fields,
      slice elements, or map entries are printed for context.
    • cmp.Equal may declare two sub-reflect.Values to be equal, but are
      different values when printed. In order to present a unified view on
      this "equal" node, the logic formats both values and arbitrarily choses
      the one with the shorter string.
    • Transformed nodes are formatted with the pseudo-Go syntax of:
      Inverse(TransformerName, OutputType{...})
      where Inverse is some magical pseudo-function that inverts the
      transformation referred to by TransformerName. The OutputType literal
      is the output of the transformation.
  • report_reflect.go: This contains logic to pretty-print reflect.Values and
    is relied upon by report_compare.go to format the leaves of the tree.
    Note that the leaves of the tree can be any arbitrary Go type and value
    (including cyclic data structures).

  • report_text.go: This contains logic for purely lexicographical formatting
    and is depended upon by the other report_*.go files.

Advantages:

  • The output is more familiar as it uses pseudo-Go syntax for literals
  • It provides context about surrounding struct fields, slice elements, or
    map entries that were equal
  • Inserted and removed elements in a slice are easier to visualize
  • Related diffs lie on the same indentation
  • For diffs in a deeply nested value, the output is easier to visualize
    than having a list of all the full paths to the diff.

Disadvantages:

  • The implementation is drastically more complex.
  • In most cases, the output is longer (though more sparse)

@dsnet dsnet requested a review from cybrcodr February 27, 2019 02:42
@dsnet
Copy link
Collaborator Author

dsnet commented Feb 27, 2019

\cc @neild
\cc @rogpeppe (since you mentioned in the past that you found the reporter output hard to grok)

Take a look at compare_test.go for how the new report diff looks like.

Copy link

@cybrcodr cybrcodr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool diff output. Just a first passed on this. I've not reviewed the report*.go files yet.

I'm curious why use the pseudo-function name Inverse? Why not just use the transform name as the pseudo-func name?

cmp/internal/value/zero_test.go Outdated Show resolved Hide resolved
{[16]uint8}[15]:
-: 0x61
+: 0x8f`,
[16]uint8{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. The nice thing with the previous version is that it has the index, this new one doesn't have it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. That's one downside of the new output. In theory, we could put a comment on the right that keeps track of the index.

***{***int}:
-: 0
+: 1`,
&&&int(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I'd expect that this should be ***int instead since the comparison is on the dereferenced value. And the example that uses intPtr() should be &int instead of *int.

But then I realized on types like map[*X]Y, the diff shows the key with pointer value instead of the dereferenced value and also it is not syntactically correct to have map[&X]Y. But perhaps this is an exception, similar to []*T, map[S]*T, but the actual value for *T here would have &T{ ... } anyways unless it is a map key.

So, I think I have a minor preference for ***int on this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go syntax, * means dereference, while & means to take the address of.

In the previous reporter output, the path was an expression from the root variable to the specific value. In that situation, you want to dereference pointers (hence the * operator).

In the new reporter output, we're constructing a pseudo Go-literal. There is no existing value to de-reference. Instead we are taking the address of the inner-most literal.

@dsnet
Copy link
Collaborator Author

dsnet commented Feb 27, 2019

I'm curious why use the pseudo-function name Inverse? Why not just use the transform name as the pseudo-func name?

If the transform name itself is used directly, it's subtly lying. I need someway to indicate that its the opposite of the transformation. I opted for a magic Inverse function rather than appending an "Inverse" prefix to the transform name since the transform name itself may possible have an "Inverse" prefix.

@dsnet dsnet force-pushed the report-diff branch 2 times, most recently from 6e3e363 to 11e6ded Compare February 27, 2019 10:22
Copy link
Contributor

@rogpeppe rogpeppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great improvement in general, thanks.
I haven't had time to review most of the core logic - I mostly just looked at the output.

There's one thing I'd like to bring up. We use cmp a lot in tests, and thus I end up looking at a lot of its output. Almost every time I look at cmp output, I have re-read the key to remind myself of which is + and which is -.

In my head, what you're expecting is old (it's already in the source code) and what you've got is new (it just occurred in the test), so intuitively I expect to see cmp print a description of what things aren't in old (subtracted) and are in new (added).

I understand why cmp chooses the sense that it does (we always write if got != want {) but it still seems wrong to me every time I see it, because diffs are always phrased with - meaning "old" and + meaning "new".

Is there any chance of changing the sense?

cmp/compare_test.go Show resolved Hide resolved
if testflags.Deterministic {
p = 0xdeadf00f // Only used for stable testing purposes
}
return fmt.Sprintf("⟪0x%x⟫", p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the Go font (which I use) doesn't seem to contain these bracket characters. How about using the more commonly available "«" and "»" (00AB and 00BB) ?

λ(λ(λ({uint8}))):
-: 0x00
+: 0x01`,
uint8(Inverse(λ, uint16(Inverse(λ, uint32(Inverse(λ, uint64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to document what's going with Inverse here. Also, what if you have a type named "Inverse"? ISTM that could be a bit confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not-sure-if-I'm-serious:

uint8(λ⁻¹(uint16(λ⁻¹(uint32(λ⁻¹(uint64(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any type named Inverse will at least be qualified by the package name (e.g., mypkg.Inverse).

The λ⁻¹ is interesting. I'll see how it renders on a number of different terminals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, but FWIW that superscript minus doesn't render correctly on the browser on my phone or in my terminal/editor (using the Go font).

-: map[*testprotos.Stringer]*testprotos.Stringer{s"hello": s"world"}
+: map[*testprotos.Stringer]*testprotos.Stringer(nil)`,
map[*testprotos.Stringer]*testprotos.Stringer(
- {⟪0xdeadf00f⟫: s"world"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this s" notation implies "this string is the output of a String method", it could probably use some documentation as that's not necessarily clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmp/compare_test.go Show resolved Hide resolved
// IsZero reports whether v is the zero value.
// This does not rely on Interface and so can be used on unexported fields.
func IsZero(v reflect.Value) bool {
switch v.Kind() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this cope with reflect.Invalid too (e.g. when IsZero is called in reflect.ValueOf(nil))?

in interface{}
want bool
}{
{0, true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about a test case or two for structs?

cmp/report_reflect.go Show resolved Hide resolved
cmp/report_reflect.go Show resolved Hide resolved
cmp/report_value.go Show resolved Hide resolved
@dsnet
Copy link
Collaborator Author

dsnet commented Feb 27, 2019

@rogpeppe

I understand why cmp chooses the sense that it does (we always write if got != want {) but it still seems wrong to me every time I see it, because diffs are always phrased with - meaning "old" and + meaning "new".

Is there any chance of changing the sense?

This was debated very heavily inside Google. The current stylistic recommendation is to manually provide the (-want +got) legend in the test printout.

That said, whether to use - or + or got or want is kind of beyond the scope of this PR. Want to file an issue for it?

The previous implementation of the reporter simply listed all differences,
each qualified by the full path to the difference.
This method of reporting is exact, but difficult for humans to parse.
It is one of the more common sources of complaints by users and a significant
reason why cmp is not preferred over competing libraries.

This change reimplements the reporter to format the output as a
structured literal in pseudo-Go syntax. The output resembles literals that
the user would likely have in their test code. Differences between the
x and y values are denoted by a '-' or '+' prefix at the start of the line.

An overview of the new implementation is as follows:

* report.go: The defaultReporter type implements the Reporter interface.

* report_value: Through the PushStep/PopStep API, the defaultReporter is able
to contruct an in-memory valueNode tree representing the comparison of
x and y as cmp.Equal walks the sub-values.

* report_compare.go: After report_value.go constructs an AST-representation
of the compared values, report_compare.go formats the valueNode tree as a
textNode tree, which is the textual output in a tree form.
Some relevant design decisions include:
	* The format logic goes through effort to avoid printing ignored nodes.
	* Some number of surrounding equal (but not ignored) struct fields,
	slice elements, or map entries are printed for context.
	* cmp.Equal may declare two sub-reflect.Values to be equal, but are
	different values when printed. In order to present a unified view on
	this "equal" node, the logic formats both values and arbitrarily choses
	the one with the shorter string.
	* Transformed nodes are formatted with the pseudo-Go syntax of:
		Inverse(TransformerName, OutputType{...})
	where Inverse is some magical pseudo-function that inverts the
	transformation referred to by TransformerName. The OutputType literal
	is the output of the transformation.

* report_reflect.go: This contains logic to pretty-print reflect.Values and
is relied upon by report_compare.go to format the leaves of the tree.
Note that the leaves of the tree can be any arbitrary Go type and value
(including cyclic data structures).

* report_text.go: This contains logic for purely lexicographical formatting
and is depended upon by the other report_*.go files.

Advantages:
* The output is more familiar as it uses pseudo-Go syntax for literals
* It provides context about surrounding struct fields, slice elements, or
map entries that were equal
* Inserted and removed elements in a slice are easier to visualize
* Related diffs lie on the same indentation
* For diffs in a deeply nested value, the output is easier to visualize
than having a list of all the full paths to the diff.

Disadvantages:
* The implementation is drastically more complex.
* In most cases, the output is longer (though more sparse)
@dsnet
Copy link
Collaborator Author

dsnet commented Mar 12, 2019

Merging this. The exact way inverse and pointers and stuff are represented can be addressed in a future PR.

@dsnet dsnet merged commit 2940eda into master Mar 12, 2019
@dsnet dsnet deleted the report-diff branch March 12, 2019 00:36
type indentMode int

func (n indentMode) appendIndent(b []byte, d diffMode) []byte {
if flags.Deterministic || randBool {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

I was using the output of cmp.Diff in a Go runnable example, and it was failing intermittently. I'd like to understand why do this.

(Sorry, not sure whether it's OK to comment in an old pull request... I got here looking for rationale.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of cmp.Diff is not defined. It can and will change over time. For example, this CL completely changed the output; the new output is better, but the change breaks any tests which depended on the old one.

Making the output trivially nondeterministic ensures that anyone depending on stable output from cmp.Diff will discover the problem immediately rather than at some future point in time when the output changes in a more significant way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dneil is correct. I mailed out #145 to document this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants