Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to elide diffs #91

Merged
merged 1 commit into from
May 13, 2021
Merged

Add a way to elide diffs #91

merged 1 commit into from
May 13, 2021

Conversation

mcmire
Copy link
Collaborator

@mcmire mcmire commented Jun 19, 2020

When looking at a large diff for which many of the lines do not change,
it can be difficult to locate the lines which do. Text-oriented
diffs such as those you get from a conventional version control system
solve this problem by removing those unchanged lines from the diff
entirely. For instance, here is a section of the README with a line
removed. Notice that only the part of the file we care about, which is
around the line deleted, is displayed in the diff:

diff --git a/README.md b/README.md
index 56b046c..b38f4ca 100644
--- a/README.md
+++ b/README.md
@@ -169,7 +169,6 @@ SuperDiff.configure do |config|
   config.add_extra_differ_class(YourDiffer)
   config.add_extra_operation_tree_builder_class(YourOperationTreeBuilder)
   config.add_extra_operation_tree_class(YourOperationTree)
-  config.add_extra_diff_formatter_class(YourDiffFormatter)
 end

This commit implements a similar feature for data-oriented diffs. It
adds two new configuration options to allow you to control the elision
logic:

  • diff_elision_enabled — The elision logic is disabled by default so
    as not to surprise people, so setting this to true will turn it on.
  • diff_elision_maximum — This number controls what happens to
    unchanged lines (i.e. lines that are neither "insert" lines nor
    "delete" lines) that are in between changed lines. If a section of
    unchanged lines is beyond this number, the gem will elide (a fancy
    word for remove) the data structures within that section as much as
    possible until the limit is reached or it cannot go further. Elided
    lines are replaced with a # ... marker.

Here are a few examples:

Elision enabled

If you add this to your test helper:

SuperDiff.configure do |config|
  config.diff_elision_enabled = true
end

And you have this test:

expected = [
  "Afghanistan",
  "Aland Islands",
  "Albania",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Argentina",
  "Aruba",
  "Australia"
]
actual = [
  "Afghanistan",
  "Aland Islands",
  "Algeria",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Armenia",
  "Aruba",
  "Australia"
]
expect(actual).to eq(expected)

Then you will get a diff that looks like:

  [
    # ...
-   "Albania",
+   "Algeria",
    # ...
-   "Argentina",
+   "Armenia",
    "Aruba",
    "Australia"
  ]

Elision enabled and maximum specified

Configuration:

SuperDiff.configure do |config|
  config.diff_elision_enabled = true
  config.diff_elision_maximum = 5
end

Test:

expected = [
  "Afghanistan",
  "Aland Islands",
  "Albania",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Argentina",
  "Aruba",
  "Australia"
]
actual = [
  "Afghanistan",
  "Aland Islands",
  "Algeria",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Armenia",
  "Aruba",
  "Australia"
]
expect(actual).to eq(expected)

Resulting diff:

  [
    "Afghanistan",
    "Aland Islands",
-   "Albania",
+   "Algeria",
    "American Samoa",
    "Andorra",
    # ...
    "Antarctica",
    "Antigua And Barbuda",
-   "Argentina",
+   "Armenia",
    "Aruba",
    "Australia"
  ]

Elision enabled and maximum specified, but indentation limits complete elision

Configuration:

SuperDiff.configure do |config|
  config.diff_elision_enabled = true
end

Test:

expected = {
  foo: {
    bar: [
      "one",
      "two",
      "three"
    ],
    baz: "qux",
    fizz: "buzz",
    zing: "bing"
  }
}
actual = [
  foo: {
    bar: [
      "one",
      "two",
      "tree"
    ],
    baz: "qux",
    fizz: "buzz",
    zing: "bing"
  }
]
expect(actual).to eq(expected)

Resulting diff:

  {
    foo: {
      bar: [
        # ...
-       "three"
+       "tree"
      ],
      # ...
    }
  }

Notice how we cannot fully elide all of the unchanged lines in this case
because otherwise the diff would look like this and it wouldn't make
sense:

  # ...
-       "three"
+       "tree"
  # ...

Fixes #75.


IssueHunt Summary

Referenced issues

This pull request has been submitted to:


@mcmire mcmire force-pushed the add-diff-elisions branch 3 times, most recently from e0df503 to f8c35ae Compare June 27, 2020 02:56
@mcmire mcmire changed the title WIP: Elide diffs Add a way to elide diffs Jun 27, 2020
@mcmire mcmire changed the title Add a way to elide diffs WIP: Add a way to elide diffs Jun 27, 2020
@mcmire mcmire marked this pull request as draft June 27, 2020 02:57
@mcmire mcmire marked this pull request as ready for review July 4, 2020 22:10
@mcmire
Copy link
Collaborator Author

mcmire commented Jul 4, 2020

Hey @ndbroadbent! So I've come up with a solution that I think could work. I feel like this is the kind of thing that will need some tweaking, so for now, this functionality is disabled by default, but I've explained above how you can turn it on. There are also a couple of knobs you can play around with. Give this branch a shot in your app and let me know what you think!

@mcmire mcmire changed the title WIP: Add a way to elide diffs Add a way to elide diffs Jul 4, 2020
@ndbroadbent
Copy link

ndbroadbent commented Jul 7, 2020

This is awesome, thanks a lot for working on this!!

I've just tried out the branch in my own project, and I think it's working really well so far! I also tried out the diff_elision_padding option but it looks like this turns off the eliding so it shows the full diff, even if I set: config.diff_elision_padding = 3.

This is a huge improvement though:

SuperDiff.configure do |config|
  config.diff_elision_enabled = true
  config.diff_elision_maximum = 5
end

Produces:

      Diff:

      ┌ (Key) ──────────────────────────┐
      │ ‹-› in expected, not in actual  │
      │ ‹+› in actual, not in expected  │
      │ ‹ › in both expected and actual │
      └─────────────────────────────────┘

        {
          # ...
          "properties" => {
            # ...
      +     "ssn" => {
      +       "type" => "string",
      +       "title" => "Ssn"
      +     },
            # ...
          },
          "required" => [
            # ...
      +     "ssn",
            "stringOptions",
            "textFieldOptions",
            "urlField"
          ],
          "additionalProperties" => true
        }

Here is a full example from one of my tests using:

SuperDiff.configure do |config|
  config.diff_elision_enabled = true
  config.diff_elision_maximum = 0
  config.diff_elision_padding = 2
end

Output: https://gist.github.com/ndbroadbent/73bee150957c21cf762110454941e0c8

This is for a really complex JSON object / Ruby hash (for a JSON schema test), with lots of nested data.

The other thing is the "Expected <...> to eq <...>" error message from RSpec, which is really huge when comparing these two hashes. (As you can see in the gist I posted.) Is this outside the control of super_diff, or do you think there might be a way to elide this as well? (No worries if not, just I thought I should ask about that as well!)

Thanks again, this is really awesome! This is a real test that I'm actually working on right now, so this is already super helpful!

@mcmire
Copy link
Collaborator Author

mcmire commented Aug 27, 2020

Sorry I haven't replied to this yet! I took a break to focus on some other projects. Glad to hear this is working out for you (aside from the maximum thing). I hear you around eliding the "Expected" line — I just tried it out for some work stuff and I'm running into that issue as well. I'm not sure if that should be a configurable thing or not, I'm thinking yes to be consistent, but I'll do something about that.

I've also noticed that lines that are pure additions or deletions don't get elided — which makes sense because this PR doesn't handle them — but I'm wondering whether they should be handled somehow in all of this too. Because even if you don't see unchanged lines in the output, you can still end up with a giant diff if you have an addition of an object or two that has a complex data structure. So... I'll have to think about that too. Maybe after fixing the maximum issue I'll merge this in and then iterate on it in future releases.

@ndbroadbent
Copy link

Hi, no worries at all, and no hurry! Yes, I think it could be useful if the pure additions or deletions were also elided, because that can sometimes happen with some of my tests. Thanks again for all your work!

@mcmire mcmire force-pushed the add-diff-elisions branch 8 times, most recently from c94cbbf to bed972a Compare May 9, 2021 21:24
When looking at a large diff for which many of the lines do not change,
it can be difficult to locate the lines which do. Text-oriented
diffs such as those you get from a conventional version control system
solve this problem by removing those unchanged lines from the diff
entirely. For instance, here is a section of the README with a line
removed. Notice that only the part of the file we care about, which is
around the line deleted, is displayed in the diff:

```
diff --git a/README.md b/README.md
index 56b046c..b38f4ca 100644
--- a/README.md
+++ b/README.md
@@ -169,7 +169,6 @@ SuperDiff.configure do |config|
   config.add_extra_differ_class(YourDiffer)
   config.add_extra_operation_tree_builder_class(YourOperationTreeBuilder)
   config.add_extra_operation_tree_class(YourOperationTree)
-  config.add_extra_diff_formatter_class(YourDiffFormatter)
 end
```

This commit implements a similar feature for data-oriented diffs. It
adds two new configuration options to allow you to control the elision
logic:

* `diff_elision_enabled` — The elision logic is disabled by default so
  as not to surprise people, so setting this to `true` will turn it on.
* `diff_elision_maximum` — This number controls what happens to
   unchanged lines (i.e. lines that are neither "insert" lines nor
   "delete" lines) that are in between changed lines. If a section of
   unchanged lines is beyond this number, the gem will elide (a fancy
   word for remove) the data structures within that section as much as
   possible until the limit is reached or it cannot go further. Elided
   lines are replaced with a `# ...` marker.

Here are a few examples:

\### Elision enabled

If you add this to your test helper:

``` ruby
SuperDiff.configure do |config|
  config.diff_elision_enabled = true
end
```

And you have this test:

``` ruby
expected = [
  "Afghanistan",
  "Aland Islands",
  "Albania",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Argentina",
  "Aruba",
  "Australia"
]
actual = [
  "Afghanistan",
  "Aland Islands",
  "Algeria",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Armenia",
  "Aruba",
  "Australia"
]
expect(actual).to eq(expected)
```

Then you will get a diff that looks like:

```
  [
    # ...
-   "Albania",
+   "Algeria",
    # ...
-   "Argentina",
+   "Armenia",
    "Aruba",
    "Australia"
  ]
```

\### Elision enabled and maximum specified

Configuration:

``` ruby
SuperDiff.configure do |config|
  config.diff_elision_enabled = true
  config.diff_elision_maximum = 5
end
```

Test:

```
expected = [
  "Afghanistan",
  "Aland Islands",
  "Albania",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Argentina",
  "Aruba",
  "Australia"
]
actual = [
  "Afghanistan",
  "Aland Islands",
  "Algeria",
  "American Samoa",
  "Andorra",
  "Angola",
  "Anguilla",
  "Antarctica",
  "Antigua And Barbuda",
  "Armenia",
  "Aruba",
  "Australia"
]
expect(actual).to eq(expected)
```

Resulting diff:

```
  [
    "Afghanistan",
    "Aland Islands",
-   "Albania",
+   "Algeria",
    "American Samoa",
    "Andorra",
    # ...
    "Antarctica",
    "Antigua And Barbuda",
-   "Argentina",
+   "Armenia",
    "Aruba",
    "Australia"
  ]
```

\### Elision enabled and maximum specified, but indentation limits complete elision

Configuration:

``` ruby
SuperDiff.configure do |config|
  config.diff_elision_enabled = true
end
```

Test:

``` ruby
expected = {
  foo: {
    bar: [
      "one",
      "two",
      "three"
    ],
    baz: "qux",
    fizz: "buzz",
    zing: "bing"
  }
}
actual = [
  foo: {
    bar: [
      "one",
      "two",
      "tree"
    ],
    baz: "qux",
    fizz: "buzz",
    zing: "bing"
  }
]
expect(actual).to eq(expected)
```

Resulting diff:

```
  {
    foo: {
      bar: [
        # ...
-       "three"
+       "tree"
      ],
      # ...
    }
  }
```

Notice how we cannot fully elide all of the unchanged lines in this case
because otherwise the diff would look like this and it wouldn't make
sense:

```
  # ...
-       "three"
+       "tree"
  # ...
```
@mcmire
Copy link
Collaborator Author

mcmire commented May 9, 2021

Hey @ndbroadbent! I know it's almost a year later from the original commit 😅 but I've finally tied a bow on this PR. Would you mind double-checking that this branch still works for you? If it does I will merge it!

@ndbroadbent
Copy link

ndbroadbent commented May 13, 2021

Hi @mcmire, that's awesome, thanks for working on this! I've just updated my Gemfile.lock to use the latest commit, and this appears to be working great! It's really useful for some of my "snapshot" tests that compare some really large hashes (e.g. JSON schemas):

Screen Shot 2021-05-13 at 9 30 50 PM

I haven't tried to test lots of edge cases but it looks good to me, and I can let you know if I ever run into any issues in the future. Thanks!

@mcmire mcmire merged commit d18bd14 into master May 13, 2021
@mcmire mcmire deleted the add-diff-elisions branch May 13, 2021 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truncate sections of very long hashes where the contents match, and only show the diff
2 participants