Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add the stdlib_diff tool to compare gno and go standard libraries #2869

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

Villaquiranm
Copy link
Contributor

@Villaquiranm Villaquiranm commented Sep 29, 2024

continuing the work started on #1425
thanks FloRichardAloeCorp for the great job on this issue 👍
closes #1310

Contributors' checklist...
  • Added new tests, or not needed, or not feasible
  • Provided an example (e.g. screenshot) to aid review or the PR is self-explanatory
  • Updated the official documentation or not needed
  • No breaking changes were made, or a BREAKING CHANGE: xxx message was included in the description
  • Added references to related issues and PRs
  • Provided any useful hints for running manual tests
  • Added new benchmarks to generated graphs, if any. More info here.

@Villaquiranm Villaquiranm requested a review from a team as a code owner September 29, 2024 10:54
@Villaquiranm Villaquiranm requested review from ajnavarro and zivkovicmilos and removed request for a team September 29, 2024 10:54
@Villaquiranm Villaquiranm changed the title Feat/stdlib diff feat: Add the stdlib_diff tool to compare gno and go standard libraries #1425 Sep 29, 2024
@Villaquiranm Villaquiranm changed the title feat: Add the stdlib_diff tool to compare gno and go standard libraries #1425 feat: Add the stdlib_diff tool to compare gno and go standard libraries Sep 29, 2024
Copy link

codecov bot commented Sep 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.06%. Comparing base (367408a) to head (a752351).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2869      +/-   ##
==========================================
- Coverage   63.33%   63.06%   -0.27%     
==========================================
  Files         548      548              
  Lines       78601    81110    +2509     
==========================================
+ Hits        49780    51155    +1375     
- Misses      25466    26529    +1063     
- Partials     3355     3426      +71     
Flag Coverage Δ
contribs/gnodev 61.11% <ø> (ø)
contribs/gnofaucet 14.82% <ø> (ø)
gno.land 67.37% <ø> (ø)
gnovm 67.88% <ø> (ø)
misc/genstd 79.72% <ø> (ø)
tm2 62.45% <ø> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Villaquiranm
Copy link
Contributor Author

@thehowl I took a look of this, the current status:
1- Not done yet
2- I think it is already done, I can select the code and the + & - are not selected
3- I was thinking on implementing this using semaphores ? like that I could have the entire code on the same file as it is right now. If you think is necessary to use channels I could do that too.

4 and 5 are fixed on eaf6a87
Only for 4 I think there are some folder we should avoid ? For exemple cmd & vendor as they have a lot of files and I think they are not related to standard lib ?

Thank you for your contribution!

Sorry for the wait, this is an initial to improve. Nice work, but you can improve.

Some functionality missing on the generated report:

1- Formatting is still very funky. Use CSS to render tabs correctly; they should render as 4 spaces.

image

2- Ideally, for the added/deleted lines, make the + and - symbols non selectable. This makes it so that if I select the lines in either side of the page I can copy and paste the code without copying the symbols.

3- I believe the report's performance could be improved. Consider using the library I mention in the comments (diffmatchpatch) and parallelizing execution. You can parallelize execution by having a goroutine trying to first of all find all of the directories in the source and destination, and then sending off to a channel individual package names.

4- The tool does not currently perform diffing recursively. Ie. creating the report on my stdlibs it does not generate a diff for package regexp/syntax. Consider searching using something like filepath.WalkDir.

5- On the report, please make the lines which are == greyed out (putting css opacity: 0.75 probably works)

@thehowl
Copy link
Member

thehowl commented Oct 2, 2024

@Villaquiranm

  1. Formatting: you're right, + / - are not selectable. Can you add line numbers as well, and align each column? (line number, + / - / no sign, code line), making sure that only the code is selectable?
  2. For the diff'ing algo and performance, I actually found this library when I was working on a side project some time ago. I believe it's the best for this use case. diffmatchpatch, which I linked in the original PR, is not ideal as it is for "character-by-character" diff (ie. what Google uses for google docs); not for the kind of diffs we use in programming.
    There are also likely some other "easy" performance wins. As you mentioned, we should skip directories which entirely don't exist in their Gno counterpart (in the index, we can just show cmd in red and say missing in gno or missing in go for libs like std).
    Parallelism may not be necessary; let's see what single-thread performance we can get, and if it's still slow after my suggestions use pprof to find out what's taking the most time. I think we can get <5s performance on most machines pretty easily even without it.
  3. Can you make sure the index lists sub-directories as well?

@Kouteki Kouteki added the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Oct 3, 2024
@jefft0 jefft0 removed the review/triage-pending PRs opened by external contributors that are waiting for the 1st review label Oct 4, 2024
@jefft0
Copy link
Contributor

jefft0 commented Oct 4, 2024

Removed the "review team" label because this is already reviewed by thehowl.

@Villaquiranm
Copy link
Contributor Author

@Villaquiranm

  1. Formatting: you're right, + / - are not selectable. Can you add line numbers as well, and align each column? (line number, + / - / no sign, code line), making sure that only the code is selectable?
  2. For the diff'ing algo and performance, I actually found this library when I was working on a side project some time ago. I believe it's the best for this use case. diffmatchpatch, which I linked in the original PR, is not ideal as it is for "character-by-character" diff (ie. what Google uses for google docs); not for the kind of diffs we use in programming.
    There are also likely some other "easy" performance wins. As you mentioned, we should skip directories which entirely don't exist in their Gno counterpart (in the index, we can just show cmd in red and say missing in gno or missing in go for libs like std).
    Parallelism may not be necessary; let's see what single-thread performance we can get, and if it's still slow after my suggestions use pprof to find out what's taking the most time. I think we can get <5s performance on most machines pretty easily even without it.
  3. Can you make sure the index lists sub-directories as well?

Hey thanks a lot again for the review 👍 @thehowl .

For 1 I added the line numbers and they are not selectable

2- Thanks a lot for this library tip, I was actually taking a lot of time trying to figure this line diff algorithm but It was pretty straightforward with the one you recommended. The only downside is that the library just return the part of the file that have some changes, so to print the entire file I needed to add some specific code to accomplish it.

3- subdirectories are listed on the index, if a library have subdirectories it will appear like a toggle list.
For performance I think we have something decent / usable the entire diff of the standard library takes around 2 seconds on my machine.

I think there is still some improvements to do to the overall code, Just wanted to make sure this is the behaviour we'll like before doing some cleaning .

Thanks

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

I suggest you not to rebase your PRs, but instead to merge in master; it makes my work when reviewing easier. And possibly yours as well, in managing conflicts. (There are none here I think, but this is good general guidance)

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

How are you running the thing? I'm running with go run . -src /usr/lib/go/src -dst ~/oc/gno/gnovm/stdlibs/ -out $(mktemp -d | tee /dev/stderr) but it turns out an empty report.

@Villaquiranm
Copy link
Contributor Author

I suggest you not to rebase your PRs, but instead to merge in master; it makes my work when reviewing easier. And possibly yours as well, in managing conflicts. (There are none here I think, but this is good general guidance)

Sorry about that I'll keep it on mind

How are you running the thing? I'm running with go run . -src /usr/lib/go/src -dst ~/oc/gno/gnovm/stdlibs/ -out $(mktemp -d | tee /dev/stderr) but it turns out an empty report.

I run it like this and for me it's working fine:

 go run . -src /opt/homebrew/Cellar/go/1.23.0/libexec/src -dst ../../gnovm/stdlibs -out ~/gnoreport/
2024/10/05 23:55:08 Building report...
2024/10/05 23:55:09 Report generation done!

@Villaquiranm
Copy link
Contributor Author

@thehowl
For some reason if I put
-dst ../../gnovm/stdlibs/ instead of -dst ../../gnovm/stdlibs is not working I'll check :(

@thehowl
Copy link
Member

thehowl commented Oct 5, 2024

Okay, figured it out.

My src directory is a symlink. It works when I use its realpath (it's a unix command). Can you make sure symlinks are handled? (It's pretty common on linux).

The performance is satisfactory now. Thank you.

I have a bunch of comments to make on some of the looks. Mostly things about UX / confusing more than design wise, just to make sure this thing is decently usable. Tell me if you want 'em your way, or I can wait if you already have ideas.

  • Can we make the toggles of differing files open by default?
  • Can we make files that are missing in one or the other closed by default, but with a visually impactful note? (like red/bold ...)
  • The split view doesn't align. Try to see it on github for what I mean (screenshot). Lines which are the same should be on the same level, ...

image

  • Can we make sure to add also files and dirs that don't exist in go's stdlib? like std. This way, it's not just a compatibility document, but a general way for users to see what changes between the two standard libraries.
  • Maybe it's best if you work your way from a Unified diff in the gotextdiff lib? I haven't examined your code yet, just spitballing based also on what I remember from using it.

@Villaquiranm
Copy link
Contributor Author

Villaquiranm commented Oct 9, 2024

@thehowl Thanks again for this new round of review :)
fixes on 3550f0b

My src directory is a symlink. It works when I use its realpath (it's a unix command). Can you make sure symlinks are handled? (It's pretty common on linux).

  • handled symlinks
  • Can we make the toggles of differing files open by default?
  • now the different files are open by default, but equal or one side missing files are not
  • Can we make files that are missing in one or the other closed by default, but with a visually impactful note? (like red/bold ...)
  • Missing files are printed in colors Red if missing in destination and green if missing in source
  • The split view doesn't align. Try to see it on github for what I mean (screenshot). Lines which are the same should be on the same level, ...
    I tried to align the best I could with the lib, There are still some lines that could be on the same line but in overall is better than before
  • Can we make sure to add also files and dirs that don't exist in go's stdlib? like std. This way, it's not just a compatibility document, but a general way for users to see what changes between the two standard libraries.
  • I'm still missing this I'll work on it now
  • Maybe it's best if you work your way from a Unified diff in the gotextdiff lib? I haven't examined your code yet, just spitballing based also on what I remember from using it.

Yep I'm using unified diff but seems like it only keeps the lines that have some differences, so entire parts of the file could be omited

edits := myers.ComputeEdits(span.URIFromPath(f.Src), f.srcContent, f.dstContent)
	unified := gotextdiff.ToUnified(f.Src, f.Dst, f.srcContent, edits)

Some sort of what github does(you can only see from line 8-12:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Status: In Review
Development

Successfully merging this pull request may close these issues.

Tool for comparing source code of standard libaries between Go and Gno
5 participants