-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate sections of very long hashes where the contents match, and only show the diff #75
Comments
Hey — glad you find this gem useful! Eliding the output is an interesting idea. I agree that sometimes I have trouble looking at large swaths of data as well. One thought that comes to mind is that eliding a diff of two data structures is different from eliding a diff of strings, like the one you might see when looking at a diff for a commit. For instance, say you have a data structure like this: [
{
"user_id": "18949452",
"user": {
"id": 18949452,
"id_str": "18949452",
"name": "Financial Times",
"screen_name": "FT",
"location": "London",
"description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
"url": "http://t.co/dnhLQpd9BY",
"entities": {
"url": {
"urls": [
{
+ "url": "http://t.co/dnhLQpd9BY",
+ "expanded_url": "http://www.ft.com/",
+ "display_url": "ft.com",
+ "indices": [
+ 0,
+ 22
+ ]
}
]
},
"description": {
"urls": [
{
"url": "https://t.co/5BsmLs9y1Z",
"expanded_url": "http://FT.com",
"display_url": "FT.com",
"indices": [
65,
88
]
}
]
}
},
"protected": false,
"followers_count": 3972876,
"fast_followers_count": 28,
"normal_followers_count": 3972848,
"friends_count": 793,
"listed_count": 37009,
"created_at": "Tue Jan 13 19:28:24 +0000 2009",
"favourites_count": 38,
"utc_offset": null,
"time_zone": null,
"geo_enabled": false,
"verified": true,
"statuses_count": 273860,
"media_count": 51044,
"lang": null,
"contributors_enabled": false,
"is_translator": false,
"is_translation_enabled": false,
"profile_background_color": "FFF1E0",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_image_url": "http://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/931156393108885504/EqEMtLhM_normal.jpg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/18949452/1581526592",
"profile_image_extensions": {
"mediaStats": {
"r": {
"missing": null
},
"ttl": -1
}
},
"profile_banner_extensions": {
"mediaStats": {
"r": {
"missing": null
},
"ttl": -1
}
},
"profile_link_color": "9E2F50",
"profile_sidebar_border_color": "FFFFFF",
"profile_sidebar_fill_color": "E9DECF",
"profile_text_color": "000000",
"profile_use_background_image": true,
"has_extended_profile": false,
"default_profile": false,
"default_profile_image": false,
"pinned_tweet_ids": [
1228367685601415200
],
"pinned_tweet_ids_str": [
- "1228367685601415168"
],
"has_custom_timelines": true,
"can_dm": null,
"can_media_tag": true,
"following": false,
"follow_request_sent": false,
"notifications": false,
"muting": false,
"blocking": false,
"blocked_by": false,
"want_retweets": false,
"advertiser_account_type": "none",
"advertiser_account_service_levels": [],
"profile_interstitial_type": "",
"business_profile_state": "none",
"translator_type": "none",
"followed_by": false,
"ext": {
"highlightedLabel": {
"r": {
"ok": {}
},
"ttl": -1
}
},
"require_some_consent": false
},
"token": "117"
}
]
If you were to elide this by showing the 10 lines on each side for each "hunk", then it'd look like: [
@@ ... 4 more lines ... @@
"id_str": "18949452",
"name": "Financial Times",
"screen_name": "FT",
"location": "London",
"description": "Big stories and breaking news headlines as they are published on https://t.co/5BsmLs9y1Z. For a curated feed of our journalism, follow @financialtimes",
"url": "http://t.co/dnhLQpd9BY",
"entities": {
"url": {
"urls": [
{
+ "url": "http://t.co/dnhLQpd9BY",
+ "expanded_url": "http://www.ft.com/",
+ "display_url": "ft.com",
+ "indices": [
+ 0,
+ 22
+ ]
}
]
},
"description": {
"urls": [
{
"url": "https://t.co/5BsmLs9y1Z",
"expanded_url": "http://FT.com",
"display_url": "FT.com",
@@ ... 51 more lines ... @@
"profile_sidebar_fill_color": "E9DECF",
"profile_text_color": "000000",
"profile_use_background_image": true,
"has_extended_profile": false,
"default_profile": false,
"default_profile_image": false,
"pinned_tweet_ids": [
1228367685601415200
],
"pinned_tweet_ids_str": [
- "1228367685601415168"
],
"has_custom_timelines": true,
"can_dm": null,
"can_media_tag": true,
"following": false,
"follow_request_sent": false,
"notifications": false,
"muting": false,
"blocking": false,
"blocked_by": false,
@@ ... 19 more lines ... @@
] That would help a little. But what if we intelligently collapsed some of the inner data structures and/or omitted some of the keys where it made sense? [
{
"user_id": "18949452",
"user": {
@@ ... omitted for brevity ... @@
"entities": {
"url": {
"urls": [
{
+ "url": "http://t.co/dnhLQpd9BY",
+ "expanded_url": "http://www.ft.com/",
+ "display_url": "ft.com",
+ "indices": [
+ 0,
+ 22
+ ]
}
]
},
"description": {
@@ ... collapsed for brevity ... @@
}
},
@@ ... omitted for brevity ... @@
"pinned_tweet_ids_str": [
- "1228367685601415168"
],
@@ ... omitted for brevity ... @@
},
"token": "117"
}
] The reason I mention this approach is because it may in fact be easier than the first approach. Why? The way that the gem works is that it builds a sort of AST of the diff and then uses that AST to generate the diff. That is, if you have a diff like: {
- foo: "bar",
+ baz: "qux",
blurgh: {
+ one: "two",
three: "four"
}
} the way that this is represented internally is something like: OperationSequence::Hash.new([
UnaryOperation.new(
name: :delete,
collection: { foo: "bar", blurgh: { three: "four" } },
key: :foo,
value: "bar"
),
UnaryOperation.new(
name: :insert,
collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
key: :baz,
value: "qux"
),
BinaryOperation.new(
name: :change,
left_collection: { foo: "bar", blurgh: { three: "four" } },
right_collection: { baz: "qux", blurgh: { one: "two", three: "four" } },
left_key: :blurgh,
right_key: :blurgh,
left_value: { three: "four" },
right_value: { one: "two", three: "four" },
child_operations: OperationSequence::Hash.new([
UnaryOperation.new(
name: :insert,
collection: { one: "two", three: "four" },
key: :one,
value: "two"
),
UnaryOperation.new(
name: :noop,
collection: { one: "two", three: "four" },
key: :three,
value: "four"
)
])
)
]) OperationSequence objects can be turned into strings by way of a DiffFormatter, so effectively the diff string is built from the inside out. Therefore, if you wanted to change how the diff is generated, you'd do it on the DiffFormatter level — as long as it was still centered around a certain data structure. That is, it'd be difficult to go back through the diff as whole and remove lines around additions or deletions, because those additions and deletions are stored inside nodes in the AST. Perhaps there is a way to do it, but I'm not immediately sure how (you'd somehow need to flatten the AST, or something). Anyway, that's a long way of saying "yes". It just depends on what makes sense here. What do you think? |
Hello, thanks for your reply! I think that collapsing some of the inner data structures sounds like a great idea! I usually don't like to open feature requests without offering to help with a PR, but this seems like this might be quite complicated. Would you like me to try to get something working using the "collapse data structures" approach? |
@ndbroadbent Sure, if you want to take a stab at it, go ahead! I agree that this might take a while for someone fresh to the codebase to implement, but I made a diagram of the most important classes in this gem and how they fit together, so hopefully it at least gives you some context behind the explanation I gave above: https://docs.google.com/drawings/d/1nKi4YKXgzzIIM-eY0P4uwjkglmuwlf8nTRFne8QZhBg/edit?usp=sharing. Feel free to ask questions if you get stuck, though, and I can help you out! |
Hello again! I'm coming back to this issue because I'm working on another test case where this would have been super helpful. I'm working on some code that generates and processes large JSON schemas, so it's been incredibly helpful to view the diff between two large schemas. (Before super_diff, I was using some ad-hoc debugging code that I would have to update all the time.) I forked the repo and starting poking around, but I think it's still a little bit beyond my ability, and I also don't have a lot of time at the moment. Is there any chance that my company could sponsor the development of this feature? If this might be an option, then please send me an email at nathan docspring.com! Thanks! |
@docspring has funded $200.00 to this issue.
|
Fixed with the above PR ^ |
@mcmire has rewarded $180.00 to @mcmire. See it on IssueHunt
|
Hello! I've just started using super_diff and it's really awesome!
I have some tests that compare some really huge hashes, and I noticed that super_diff will print the entire hash even if only a few lines have changed. So my test failure output has 3000+ lines, and I have to scroll really far to find the +/- diffs.
Would it be possible to add an option to remove any large sections that match, and just show the diffs with a few lines of context at the top and bottom? It would be great if it could have a sane threshold to do this automatically. E.g. If the total diff output is more than 100 lines long, then only show the diff lines with 10 lines of context on each side.
Would you be interested in adding something like this?
IssueHunt Summary
mcmire has been rewarded.
Backers (Total: $200.00)
Submitted pull Requests
Tips
The text was updated successfully, but these errors were encountered: