Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show function context for diff hunks. #402

Closed
wants to merge 1 commit into from
Closed

Show function context for diff hunks. #402

wants to merge 1 commit into from

Conversation

sesse
Copy link
Contributor

@sesse sesse commented Oct 7, 2022

This fixes a difftastic shortcoming relative to “git diff” (or GNU diff's -p, --show-c-function); it shows the context for a given hunk (which function it is in, assuming it actually starts in one). Based on practical testing, it seems to work well with at least Rust, C, C++, Kotlin, Python and Swift, but probably also for more languages.

Unlike git diff and GNU diff, it does not get confused by the function definition being split over multiple lines, since it uses the tree-sitter grammar to find out what is a function definition, which should be much more robust than the textual heuristics diff uses.

There are a number of shortcomings:

  • Context is currently only available in side-by-side mode, not inline.
  • The source code is re-parsed using tree-sitter at display time; we could probably reuse the tree from before it was converted into Difftastic's syntax tree, at the expense of holding onto more RAM during the diffing.
  • If the context is not the same on both sides, difftastic won't try to show both, but instead show the language name as before. The typical failure case for this is if the function changed name or function arguments between the two versions.
  • Certain language grammars, such as Julia and elisp, don't have a clear node that delineates the function body from the name and arguments, which will cause the entire function definition to be treated as the context (which thus usually means it won't be shown at all; see the previous point).
  • The joining of tokens across newlines can cause somewhat unidiomatic whitespace to be added, such as fn foo( arg) instead of fn foo(arg) if “arg” is on a separate line.

Despite all of this, it seems to work very well in practice, from practical experience during real development over a couple of days.

Fixes #304.

@sesse
Copy link
Contributor Author

sesse commented Oct 7, 2022

Example output:
image

This fixes a difftastic shortcoming relative to “git diff”
(or GNU diff's -p, --show-c-function); it shows the context
for a given hunk (which function it is in, assuming it actually
starts in one). Based on practical testing, it seems to work
well with at least Rust, C, C++, Kotlin, Python and Swift,
but probably also for more languages.

Unlike git diff and GNU diff, it does not get confused by the
function definition being split over multiple lines, since it
uses the tree-sitter grammar to find out what is a function
definition, which should be much more robust than the textual
heuristics diff uses.

There are a number of shortcomings:

 - Context is currently only available in side-by-side mode,
   not inline.
 - If the context is not the same on both sides, difftastic won't
   try to show both, but instead show the language name as before.
   The typical failure case for this is if the function changed
   name or function arguments between the two versions.
 - Certain language grammars, such as Julia and elisp, don't
   have a clear node that delineates the function body from the
   name and arguments, which will cause the entire function
   definition to be treated as the context (which thus usually
   means it won't be shown at all; see the previous point).
 - The joining of tokens across newlines can cause somewhat
   unidiomatic whitespace to be added, such as fn foo( arg)
   instead of fn foo(arg) if “arg” is on a separate line.

Despite all of this, it seems to work very well in practice,
from practical experience during real development over
a couple of days.

Fixes #304.
@sesse
Copy link
Contributor Author

sesse commented Oct 16, 2022

I modified this a bit to be faster, avoiding the re-parse. (It took more CPU time than I had expected, from a profiling run, so it's better to keep the tree and send it through to the display function.)

@sesse
Copy link
Contributor Author

sesse commented Nov 23, 2022

Over six weeks with no response; I'm giving up these PRs.

@sesse sesse closed this Nov 23, 2022
@Wilfred
Copy link
Owner

Wilfred commented Nov 23, 2022

Hi @sesse! I really appreciate your PRs. Just so you know, I have a 7 week old baby so difftastic is a much lower priority right now.

#409 seems totally reasonable, #402 is definitely something I want (preferably without an extra parse), and #420 is an incredible achievement. I tinkered with bidirectional dijkstra myself at one point but getting the reverse edges set up and the termination condition correct is hard. I'm really excited to look at your visualisations more closely too.

I want to balance this with #395, another PR with a ton of perf benefits that I haven't had much time to look at yet. That's lots of smaller optimisations, and I'm trying to merge as many as possible whilst still making the graph algorithm easy to hack on for e.g. A* or bidirectional changes.

If you'd like to reopen the PRs you're very welcome to, but if not I understand too. I just wanted to thank you for contributions regardless :)

@sesse
Copy link
Contributor Author

sesse commented Nov 24, 2022

Hi Wilfred,

First of all, congratulations on your baby. I understand that this moves a lot of things around in one's life and requires a ton of attention and energy, and that it makes it really hard to tend to mundane issues such as volunteer OSS work.

No OSS maintainer has an obligation towards their users. I've been on the other end of this bargain a number of times (the open-source one, not the baby one), and I know how much work it is to deal with the constant stream of more-or-less good-intentioned demands and patches that come in. If one sends in a patch and has it refused, or a bug that the maintainer won't prioritize, the reasonable thing to do is just to part ways and leave the maintainer alone. But this cuts both ways; there's no obligation to fight for one's patch. Both sides are largely devoid of glory when it comes to day-to-day work; the only thing we really get is a feeling of recognition and perhaps slightly better software for everyone. So when you send something and it just gets the silence game while you see there's other activity in the repo, the fun disappears :-) All I'd want is something like “hi, I'm really busy and I don't want to merge this right away, please wait a couple of months”.

As for the patches, #402 already has no extra parse (the first version had, but I fixed that fairly early on), and I believe a well-tuned A* heuristic would trounce the approach in #420 (looking at the visualizations, the search explores a bunch of nodes one would hope it could not; I believe the basic idea of changing from pointers to node IDs make a lot of sense, though, and I think QuarticCat has a similar idea somewhere in his patch chain). The PRs are closed, not deleted—if you want to take a look at some future point, fix whatever beginner-Rust mistakes I've made and cherry-pick in the parts you want, you are of course welcome to do that. But I've cut my losses, in a sense, so I'm just running my own private fork with the parts that I want. Which isn't a declaration of war, it's just how things have to be sometimes when real life interferes.

All the best :-)

@QuarticCat
Copy link
Contributor

I want to balance this with #395, another PR with a ton of perf benefits that I haven't had much time to look at yet.

This PR doesn't seem to conflict with #395 and #401 and LGTM.

@QuarticCat
Copy link
Contributor

Maybe it's time to find a collaborator :)

hugo-vrijswijk pushed a commit to hugo-vrijswijk/difftastic that referenced this pull request Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show function name in hunk header
3 participants