Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate sections of very long hashes where the contents match, and only show the diff #75

Closed
ndbroadbent opened this issue Apr 10, 2020 · 7 comments · Fixed by #91
Closed
Assignees
Labels
🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt

Comments

@ndbroadbent
Copy link

ndbroadbent commented Apr 10, 2020

Issuehunt badges

Hello! I've just started using super_diff and it's really awesome!

I have some tests that compare some really huge hashes, and I noticed that super_diff will print the entire hash even if only a few lines have changed. So my test failure output has 3000+ lines, and I have to scroll really far to find the +/- diffs.

Would it be possible to add an option to remove any large sections that match, and just show the diffs with a few lines of context at the top and bottom? It would be great if it could have a sane threshold to do this automatically. E.g. If the total diff output is more than 100 lines long, then only show the diff lines with 10 lines of context on each side.

Would you be interested in adding something like this?


IssueHunt Summary

mcmire mcmire has been rewarded.

Backers (Total: $200.00)

Submitted pull Requests


Tips

@mcmire
Copy link
Collaborator

mcmire commented Apr 11, 2020

Hey — glad you find this gem useful! Eliding the output is an interesting idea. I agree that sometimes I have trouble looking at large swaths of data as well. One thought that comes to mind is that eliding a diff of two data structures is different from eliding a diff of strings, like the one you might see when looking at a diff for a commit. For instance, say you have a data structure like this:

  [
    {
      "user_id": "18949452",
      "user": {
        "id": 18949452,
        "id_str": "18949452",
        "name": "Financial Times",
        "screen_name": "FT",
        "location": "London",
        "description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
        "url": "http://t.co/dnhLQpd9BY",
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
            "urls": [
              {
                "url": "https://t.co/5BsmLs9y1Z",
                "expanded_url": "http://FT.com",
                "display_url": "FT.com",
                "indices": [
                  65,
                  88
                ]
              }
            ]
          }
        },
        "protected": false,
        "followers_count": 3972876,
        "fast_followers_count": 28,
        "normal_followers_count": 3972848,
        "friends_count": 793,
        "listed_count": 37009,
        "created_at": "Tue Jan 13 19:28:24 +0000 2009",
        "favourites_count": 38,
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "verified": true,
        "statuses_count": 273860,
        "media_count": 51044,
        "lang": null,
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "FFF1E0",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_tile": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/18949452/1581526592",
        "profile_image_extensions": {
          "mediaStats": {
            "r": {
              "missing": null
            },
            "ttl": -1
          }
        },
        "profile_banner_extensions": {
          "mediaStats": {
            "r": {
              "missing": null
            },
            "ttl": -1
          }
        },
        "profile_link_color": "9E2F50",
        "profile_sidebar_border_color": "FFFFFF",
        "profile_sidebar_fill_color": "E9DECF",
        "profile_text_color": "000000",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": false,
        "default_profile_image": false,
        "pinned_tweet_ids": [
          1228367685601415200
        ],
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
        "has_custom_timelines": true,
        "can_dm": null,
        "can_media_tag": true,
        "following": false,
        "follow_request_sent": false,
        "notifications": false,
        "muting": false,
        "blocking": false,
        "blocked_by": false,
        "want_retweets": false,
        "advertiser_account_type": "none",
        "advertiser_account_service_levels": [],
        "profile_interstitial_type": "",
        "business_profile_state": "none",
        "translator_type": "none",
        "followed_by": false,
        "ext": {
          "highlightedLabel": {
            "r": {
              "ok": {}
            },
            "ttl": -1
          }
        },
        "require_some_consent": false
      },
      "token": "117"
    }
  ]

If you were to elide this by showing the 10 lines on each side for each "hunk", then it'd look like:

  [
@@ ... 4 more lines ... @@
        "id_str": "18949452",
        "name": "Financial Times",
        "screen_name": "FT",
        "location": "London",
        "description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
        "url": "http://t.co/dnhLQpd9BY",
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
            "urls": [
              {
                "url": "https://t.co/5BsmLs9y1Z",
                "expanded_url": "http://FT.com",
                "display_url": "FT.com",
@@ ... 51 more lines ... @@
        "profile_sidebar_fill_color": "E9DECF",
        "profile_text_color": "000000",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": false,
        "default_profile_image": false,
        "pinned_tweet_ids": [
          1228367685601415200
        ],
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
        "has_custom_timelines": true,
        "can_dm": null,
        "can_media_tag": true,
        "following": false,
        "follow_request_sent": false,
        "notifications": false,
        "muting": false,
        "blocking": false,
        "blocked_by": false,
@@ ... 19 more lines ... @@
  ]

That would help a little. But what if we intelligently collapsed some of the inner data structures and/or omitted some of the keys where it made sense?

  [
    {
      "user_id": "18949452",
      "user": {
@@ ... omitted for brevity ... @@
        "entities": {
          "url": {
            "urls": [
              {
+               "url": "http://t.co/dnhLQpd9BY",
+               "expanded_url": "http://www.ft.com/",
+               "display_url": "ft.com",
+               "indices": [
+                 0,
+                 22
+               ]
              }
            ]
          },
          "description": {
@@ ... collapsed for brevity ... @@
          }
        },
@@ ... omitted for brevity ... @@
        "pinned_tweet_ids_str": [
-         "1228367685601415168"
        ],
@@ ... omitted for brevity ... @@
      },
      "token": "117"
    }
  ]

The reason I mention this approach is because it may in fact be easier than the first approach. Why? The way that the gem works is that it builds a sort of AST of the diff and then uses that AST to generate the diff. That is, if you have a diff like:

  {
-   foo: "bar",
+   baz: "qux",
    blurgh: {
+     one: "two",
      three: "four"
    }
  }

the way that this is represented internally is something like:

OperationSequence::Hash.new([
  UnaryOperation.new(
    name: :delete,
    collection: { foo: "bar", blurgh: { three: "four" } },
    key: :foo,
    value: "bar"  
  ),
  UnaryOperation.new(
    name: :insert,
    collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
    key: :baz,
    value: "qux"  
  ),
  BinaryOperation.new(
    name: :change,
    left_collection: { foo: "bar", blurgh: { three: "four" } },
    right_collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
    left_key: :blurgh,
    right_key: :blurgh,
    left_value: { three: "four" },
    right_value: { one: "two", three: "four" },
    child_operations: OperationSequence::Hash.new([
      UnaryOperation.new(
        name: :insert,
        collection: { one: "two", three: "four" },
        key: :one,
        value: "two"
      ),
      UnaryOperation.new(
        name: :noop,
        collection: { one: "two", three: "four" },
        key: :three,
        value: "four"
      )
    ])
  )
])

OperationSequence objects can be turned into strings by way of a DiffFormatter, so effectively the diff string is built from the inside out. Therefore, if you wanted to change how the diff is generated, you'd do it on the DiffFormatter level — as long as it was still centered around a certain data structure. That is, it'd be difficult to go back through the diff as whole and remove lines around additions or deletions, because those additions and deletions are stored inside nodes in the AST. Perhaps there is a way to do it, but I'm not immediately sure how (you'd somehow need to flatten the AST, or something).

Anyway, that's a long way of saying "yes". It just depends on what makes sense here. What do you think?

@ndbroadbent
Copy link
Author

Hello, thanks for your reply! I think that collapsing some of the inner data structures sounds like a great idea!

I usually don't like to open feature requests without offering to help with a PR, but this seems like this might be quite complicated. Would you like me to try to get something working using the "collapse data structures" approach?

@mcmire
Copy link
Collaborator

mcmire commented Apr 11, 2020

@ndbroadbent Sure, if you want to take a stab at it, go ahead! I agree that this might take a while for someone fresh to the codebase to implement, but I made a diagram of the most important classes in this gem and how they fit together, so hopefully it at least gives you some context behind the explanation I gave above: https://docs.google.com/drawings/d/1nKi4YKXgzzIIM-eY0P4uwjkglmuwlf8nTRFne8QZhBg/edit?usp=sharing. Feel free to ask questions if you get stuck, though, and I can help you out!

@ndbroadbent
Copy link
Author

Hello again! I'm coming back to this issue because I'm working on another test case where this would have been super helpful. I'm working on some code that generates and processes large JSON schemas, so it's been incredibly helpful to view the diff between two large schemas. (Before super_diff, I was using some ad-hoc debugging code that I would have to update all the time.)

I forked the repo and starting poking around, but I think it's still a little bit beyond my ability, and I also don't have a lot of time at the moment. Is there any chance that my company could sponsor the development of this feature? If this might be an option, then please send me an email at nathan docspring.com! Thanks!

@issuehunt-oss
Copy link

issuehunt-oss bot commented May 6, 2020

@docspring has funded $200.00 to this issue.


@mcmire
Copy link
Collaborator

mcmire commented May 13, 2021

Fixed with the above PR ^

@issuehunt-oss
Copy link

issuehunt-oss bot commented May 13, 2021

@mcmire has rewarded $180.00 to @mcmire. See it on IssueHunt

  • 💰 Total deposit: $200.00
  • 🎉 Repository reward(0%): $0.00
  • 🔧 Service fee(10%): $20.00

@issuehunt-oss issuehunt-oss bot added the 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt label May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants