Truncate sections of very long hashes where the contents match, and only show the diff #75

ndbroadbent · 2020-04-10T06:47:18Z

Hello! I've just started using super_diff and it's really awesome!

I have some tests that compare some really huge hashes, and I noticed that super_diff will print the entire hash even if only a few lines have changed. So my test failure output has 3000+ lines, and I have to scroll really far to find the +/- diffs.

Would it be possible to add an option to remove any large sections that match, and just show the diffs with a few lines of context at the top and bottom? It would be great if it could have a sane threshold to do this automatically. E.g. If the total diff output is more than 100 lines long, then only show the diff lines with 10 lines of context on each side.

Would you be interested in adding something like this?

IssueHunt Summary

mcmire has been rewarded.

Backers (Total: $200.00)

docspring ($200.00)

Submitted pull Requests

#91 Add a way to elide diffs

Tips

Checkout the Issuehunt explorer to discover more funded issues.
Need some help from other developers? Add your repositories on IssueHunt to raise funds.

mcmire · 2020-04-11T03:12:38Z

Hey — glad you find this gem useful! Eliding the output is an interesting idea. I agree that sometimes I have trouble looking at large swaths of data as well. One thought that comes to mind is that eliding a diff of two data structures is different from eliding a diff of strings, like the one you might see when looking at a diff for a commit. For instance, say you have a data structure like this:

  [
    {
      "user_id": "18949452",
      "user": {
        "id": 18949452,
        "id_str": "18949452",
        "name": "Financial Times",
        "screen_name": "FT",
        "location": "London",
        "description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
        "url": "http://t.co/dnhLQpd9BY",
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
            "urls": [
              {
                "url": "https://t.co/5BsmLs9y1Z",
                "expanded_url": "http://FT.com",
                "display_url": "FT.com",
                "indices": [
                  65,
                  88
                ]
              }
            ]
          }
        },
        "protected": false,
        "followers_count": 3972876,
        "fast_followers_count": 28,
        "normal_followers_count": 3972848,
        "friends_count": 793,
        "listed_count": 37009,
        "created_at": "Tue Jan 13 19:28:24 +0000 2009",
        "favourites_count": 38,
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "verified": true,
        "statuses_count": 273860,
        "media_count": 51044,
        "lang": null,
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "FFF1E0",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_tile": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/18949452/1581526592",
        "profile_image_extensions": {
          "mediaStats": {
            "r": {
              "missing": null
            },
            "ttl": -1
          }
        },
        "profile_banner_extensions": {
          "mediaStats": {
            "r": {
              "missing": null
            },
            "ttl": -1
          }
        },
        "profile_link_color": "9E2F50",
        "profile_sidebar_border_color": "FFFFFF",
        "profile_sidebar_fill_color": "E9DECF",
        "profile_text_color": "000000",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": false,
        "default_profile_image": false,
        "pinned_tweet_ids": [
          1228367685601415200
        ],
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
        "has_custom_timelines": true,
        "can_dm": null,
        "can_media_tag": true,
        "following": false,
        "follow_request_sent": false,
        "notifications": false,
        "muting": false,
        "blocking": false,
        "blocked_by": false,
        "want_retweets": false,
        "advertiser_account_type": "none",
        "advertiser_account_service_levels": [],
        "profile_interstitial_type": "",
        "business_profile_state": "none",
        "translator_type": "none",
        "followed_by": false,
        "ext": {
          "highlightedLabel": {
            "r": {
              "ok": {}
            },
            "ttl": -1
          }
        },
        "require_some_consent": false
      },
      "token": "117"
    }
  ]

If you were to elide this by showing the 10 lines on each side for each "hunk", then it'd look like:

  [
@@ ... 4 more lines ... @@
        "id_str": "18949452",
        "name": "Financial Times",
        "screen_name": "FT",
        "location": "London",
        "description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
        "url": "http://t.co/dnhLQpd9BY",
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
            "urls": [
              {
                "url": "https://t.co/5BsmLs9y1Z",
                "expanded_url": "http://FT.com",
                "display_url": "FT.com",
@@ ... 51 more lines ... @@
        "profile_sidebar_fill_color": "E9DECF",
        "profile_text_color": "000000",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": false,
        "default_profile_image": false,
        "pinned_tweet_ids": [
          1228367685601415200
        ],
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
        "has_custom_timelines": true,
        "can_dm": null,
        "can_media_tag": true,
        "following": false,
        "follow_request_sent": false,
        "notifications": false,
        "muting": false,
        "blocking": false,
        "blocked_by": false,
@@ ... 19 more lines ... @@
  ]

That would help a little. But what if we intelligently collapsed some of the inner data structures and/or omitted some of the keys where it made sense?

  [
    {
      "user_id": "18949452",
      "user": {
@@ ... omitted for brevity ... @@
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
@@ ... collapsed for brevity ... @@
          }
        },
@@ ... omitted for brevity ... @@
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
@@ ... omitted for brevity ... @@
      },
      "token": "117"
    }
  ]

The reason I mention this approach is because it may in fact be easier than the first approach. Why? The way that the gem works is that it builds a sort of AST of the diff and then uses that AST to generate the diff. That is, if you have a diff like:

  {
-   foo: "bar",
+   baz: "qux",
    blurgh: {
+     one: "two",
      three: "four"
    }
  }

the way that this is represented internally is something like:

OperationSequence::Hash.new([
  UnaryOperation.new(
    name: :delete,
    collection: { foo: "bar", blurgh: { three: "four" } },
    key: :foo,
    value: "bar"  
  ),
  UnaryOperation.new(
    name: :insert,
    collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
    key: :baz,
    value: "qux"  
  ),
  BinaryOperation.new(
    name: :change,
    left_collection: { foo: "bar", blurgh: { three: "four" } },
    right_collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
    left_key: :blurgh,
    right_key: :blurgh,
    left_value: { three: "four" },
    right_value: { one: "two", three: "four" },
    child_operations: OperationSequence::Hash.new([
      UnaryOperation.new(
        name: :insert,
        collection: { one: "two", three: "four" },
        key: :one,
        value: "two"
      ),
      UnaryOperation.new(
        name: :noop,
        collection: { one: "two", three: "four" },
        key: :three,
        value: "four"
      )
    ])
  )
])

OperationSequence objects can be turned into strings by way of a DiffFormatter, so effectively the diff string is built from the inside out. Therefore, if you wanted to change how the diff is generated, you'd do it on the DiffFormatter level — as long as it was still centered around a certain data structure. That is, it'd be difficult to go back through the diff as whole and remove lines around additions or deletions, because those additions and deletions are stored inside nodes in the AST. Perhaps there is a way to do it, but I'm not immediately sure how (you'd somehow need to flatten the AST, or something).

Anyway, that's a long way of saying "yes". It just depends on what makes sense here. What do you think?

ndbroadbent · 2020-04-11T06:26:19Z

Hello, thanks for your reply! I think that collapsing some of the inner data structures sounds like a great idea!

I usually don't like to open feature requests without offering to help with a PR, but this seems like this might be quite complicated. Would you like me to try to get something working using the "collapse data structures" approach?

mcmire · 2020-04-11T17:51:58Z

@ndbroadbent Sure, if you want to take a stab at it, go ahead! I agree that this might take a while for someone fresh to the codebase to implement, but I made a diagram of the most important classes in this gem and how they fit together, so hopefully it at least gives you some context behind the explanation I gave above: https://docs.google.com/drawings/d/1nKi4YKXgzzIIM-eY0P4uwjkglmuwlf8nTRFne8QZhBg/edit?usp=sharing. Feel free to ask questions if you get stuck, though, and I can help you out!

ndbroadbent · 2020-05-02T11:31:53Z

Hello again! I'm coming back to this issue because I'm working on another test case where this would have been super helpful. I'm working on some code that generates and processes large JSON schemas, so it's been incredibly helpful to view the diff between two large schemas. (Before super_diff, I was using some ad-hoc debugging code that I would have to update all the time.)

I forked the repo and starting poking around, but I think it's still a little bit beyond my ability, and I also don't have a lot of time at the moment. Is there any chance that my company could sponsor the development of this feature? If this might be an option, then please send me an email at nathan docspring.com! Thanks!

issuehunt-oss · 2020-05-06T05:27:20Z

@docspring has funded $200.00 to this issue.

Submit pull request via IssueHunt to receive this reward.
Want to contribute? Chip in to this issue via IssueHunt.
Checkout the IssueHunt Issue Explorer to see more funded issues.
Need help from developers? Add your repository on IssueHunt to raise funds.

mcmire · 2021-05-13T20:00:55Z

Fixed with the above PR ^

issuehunt-oss · 2021-05-13T20:05:39Z

@mcmire has rewarded $180.00 to @mcmire. See it on IssueHunt

💰 Total deposit: $200.00
🎉 Repository reward(0%): $0.00
🔧 Service fee(10%): $20.00

issuehunt-oss bot added the 💵 Funded on Issuehunt label May 6, 2020

mcmire self-assigned this May 10, 2020

mcmire mentioned this issue Jun 9, 2020

Release version compatible with Ruby 2.7 #89

Closed

mcmire mentioned this issue Jul 4, 2020

Add a way to elide diffs #91

Merged

mcmire closed this as completed in #91 May 13, 2021

issuehunt-oss bot added the 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt label May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate sections of very long hashes where the contents match, and only show the diff #75

Truncate sections of very long hashes where the contents match, and only show the diff #75

ndbroadbent commented Apr 10, 2020 •

edited by issuehunt-oss bot

Loading

mcmire has been rewarded.

Backers (Total: $200.00)

Submitted pull Requests

Tips

mcmire commented Apr 11, 2020 •

edited

Loading

ndbroadbent commented Apr 11, 2020

mcmire commented Apr 11, 2020 •

edited

Loading

ndbroadbent commented May 2, 2020

issuehunt-oss bot commented May 6, 2020

mcmire commented May 13, 2021

issuehunt-oss bot commented May 13, 2021

Truncate sections of very long hashes where the contents match, and only show the diff #75

Truncate sections of very long hashes where the contents match, and only show the diff #75

Comments

ndbroadbent commented Apr 10, 2020 • edited by issuehunt-oss bot Loading

mcmire has been rewarded.

Backers (Total: $200.00)

Submitted pull Requests

Tips

mcmire commented Apr 11, 2020 • edited Loading

ndbroadbent commented Apr 11, 2020

mcmire commented Apr 11, 2020 • edited Loading

ndbroadbent commented May 2, 2020

issuehunt-oss bot commented May 6, 2020

mcmire commented May 13, 2021

issuehunt-oss bot commented May 13, 2021

ndbroadbent commented Apr 10, 2020 •

edited by issuehunt-oss bot

Loading

mcmire commented Apr 11, 2020 •

edited

Loading

mcmire commented Apr 11, 2020 •

edited

Loading