Show function context for diff hunks. #402

sesse · 2022-10-07T20:28:24Z

This fixes a difftastic shortcoming relative to “git diff” (or GNU diff's -p, --show-c-function); it shows the context for a given hunk (which function it is in, assuming it actually starts in one). Based on practical testing, it seems to work well with at least Rust, C, C++, Kotlin, Python and Swift, but probably also for more languages.

Unlike git diff and GNU diff, it does not get confused by the function definition being split over multiple lines, since it uses the tree-sitter grammar to find out what is a function definition, which should be much more robust than the textual heuristics diff uses.

There are a number of shortcomings:

Context is currently only available in side-by-side mode, not inline.
The source code is re-parsed using tree-sitter at display time; we could probably reuse the tree from before it was converted into Difftastic's syntax tree, at the expense of holding onto more RAM during the diffing.
If the context is not the same on both sides, difftastic won't try to show both, but instead show the language name as before. The typical failure case for this is if the function changed name or function arguments between the two versions.
Certain language grammars, such as Julia and elisp, don't have a clear node that delineates the function body from the name and arguments, which will cause the entire function definition to be treated as the context (which thus usually means it won't be shown at all; see the previous point).
The joining of tokens across newlines can cause somewhat unidiomatic whitespace to be added, such as fn foo( arg) instead of fn foo(arg) if “arg” is on a separate line.

Despite all of this, it seems to work very well in practice, from practical experience during real development over a couple of days.

Fixes #304.

sesse · 2022-10-07T20:45:05Z

Example output:

This fixes a difftastic shortcoming relative to “git diff” (or GNU diff's -p, --show-c-function); it shows the context for a given hunk (which function it is in, assuming it actually starts in one). Based on practical testing, it seems to work well with at least Rust, C, C++, Kotlin, Python and Swift, but probably also for more languages. Unlike git diff and GNU diff, it does not get confused by the function definition being split over multiple lines, since it uses the tree-sitter grammar to find out what is a function definition, which should be much more robust than the textual heuristics diff uses. There are a number of shortcomings: - Context is currently only available in side-by-side mode, not inline. - If the context is not the same on both sides, difftastic won't try to show both, but instead show the language name as before. The typical failure case for this is if the function changed name or function arguments between the two versions. - Certain language grammars, such as Julia and elisp, don't have a clear node that delineates the function body from the name and arguments, which will cause the entire function definition to be treated as the context (which thus usually means it won't be shown at all; see the previous point). - The joining of tokens across newlines can cause somewhat unidiomatic whitespace to be added, such as fn foo( arg) instead of fn foo(arg) if “arg” is on a separate line. Despite all of this, it seems to work very well in practice, from practical experience during real development over a couple of days. Fixes #304.

sesse · 2022-10-16T11:56:18Z

I modified this a bit to be faster, avoiding the re-parse. (It took more CPU time than I had expected, from a profiling run, so it's better to keep the tree and send it through to the display function.)

sesse · 2022-11-23T11:57:39Z

Over six weeks with no response; I'm giving up these PRs.

Wilfred · 2022-11-23T22:55:29Z

Hi @sesse! I really appreciate your PRs. Just so you know, I have a 7 week old baby so difftastic is a much lower priority right now.

#409 seems totally reasonable, #402 is definitely something I want (preferably without an extra parse), and #420 is an incredible achievement. I tinkered with bidirectional dijkstra myself at one point but getting the reverse edges set up and the termination condition correct is hard. I'm really excited to look at your visualisations more closely too.

I want to balance this with #395, another PR with a ton of perf benefits that I haven't had much time to look at yet. That's lots of smaller optimisations, and I'm trying to merge as many as possible whilst still making the graph algorithm easy to hack on for e.g. A* or bidirectional changes.

If you'd like to reopen the PRs you're very welcome to, but if not I understand too. I just wanted to thank you for contributions regardless :)

sesse · 2022-11-24T08:29:07Z

Hi Wilfred,

First of all, congratulations on your baby. I understand that this moves a lot of things around in one's life and requires a ton of attention and energy, and that it makes it really hard to tend to mundane issues such as volunteer OSS work.

No OSS maintainer has an obligation towards their users. I've been on the other end of this bargain a number of times (the open-source one, not the baby one), and I know how much work it is to deal with the constant stream of more-or-less good-intentioned demands and patches that come in. If one sends in a patch and has it refused, or a bug that the maintainer won't prioritize, the reasonable thing to do is just to part ways and leave the maintainer alone. But this cuts both ways; there's no obligation to fight for one's patch. Both sides are largely devoid of glory when it comes to day-to-day work; the only thing we really get is a feeling of recognition and perhaps slightly better software for everyone. So when you send something and it just gets the silence game while you see there's other activity in the repo, the fun disappears :-) All I'd want is something like “hi, I'm really busy and I don't want to merge this right away, please wait a couple of months”.

As for the patches, #402 already has no extra parse (the first version had, but I fixed that fairly early on), and I believe a well-tuned A* heuristic would trounce the approach in #420 (looking at the visualizations, the search explores a bunch of nodes one would hope it could not; I believe the basic idea of changing from pointers to node IDs make a lot of sense, though, and I think QuarticCat has a similar idea somewhere in his patch chain). The PRs are closed, not deleted—if you want to take a look at some future point, fix whatever beginner-Rust mistakes I've made and cherry-pick in the parts you want, you are of course welcome to do that. But I've cut my losses, in a sense, so I'm just running my own private fork with the parts that I want. Which isn't a declaration of war, it's just how things have to be sometimes when real life interferes.

All the best :-)

QuarticCat · 2022-12-01T04:15:53Z

I want to balance this with #395, another PR with a ton of perf benefits that I haven't had much time to look at yet.

This PR doesn't seem to conflict with #395 and #401 and LGTM.

QuarticCat · 2022-12-01T04:31:50Z

Maybe it's time to find a collaborator :)

Move corpus

sesse closed this Nov 23, 2022

hugo-vrijswijk pushed a commit to hugo-vrijswijk/difftastic that referenced this pull request Jul 23, 2024

Merge pull request Wilfred#402 from eed3si9n/wip/corpus

599d12b

Move corpus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show function context for diff hunks. #402

Show function context for diff hunks. #402

sesse commented Oct 7, 2022

sesse commented Oct 7, 2022

sesse commented Oct 16, 2022

sesse commented Nov 23, 2022

Wilfred commented Nov 23, 2022 •

edited

Loading

sesse commented Nov 24, 2022

QuarticCat commented Dec 1, 2022

QuarticCat commented Dec 1, 2022

Show function context for diff hunks. #402

Show function context for diff hunks. #402

Conversation

sesse commented Oct 7, 2022

sesse commented Oct 7, 2022

sesse commented Oct 16, 2022

sesse commented Nov 23, 2022

Wilfred commented Nov 23, 2022 • edited Loading

sesse commented Nov 24, 2022

QuarticCat commented Dec 1, 2022

QuarticCat commented Dec 1, 2022

Wilfred commented Nov 23, 2022 •

edited

Loading