-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autogenerate gix-packetline-blocking/src #1340
Conversation
By copying the files with added headers, instead of having a symlink. Right now this is entirely as a shell script. It may be beneficial to rewrite at least some parts, especially those that are unnatural to implement portably as a shell script even given Bash, in Rust.
This was always intended to work, but didn't because script subprocesses always used the original command name/path, but this was after changing directory. The script already relies on the repository being organized a particular way, so relying on itself (or another script suitable to do its work) being at the known location is no worse. This also makes it unnecessary to guard against the rare {}-in-path case. This change has the further benefit that if the script is ever successfully invoked via a Windows-style path with "\" separators (which can happen on Windows), its subprocesses can still execute. This also removes an altogether ineffective attempt to support the rare case of a working tree whole top-level directory name ends in a newline character, and instead changes the comment to note that this does work. A few other comments throughout are reworded for clarity as well. It may make sense to change the script more deeply than this (even before replacing part of it with a Rust tool). By sacrificing a small amount of portability, a loop could be used, as find recognizes -print0 in GNU/Linux (via GNU findutils), most BSD systems including macOS, and busybox (thus also Alpine Linux). Unfortunately a recursive glob, which would be even nicer, is not supported by versions of bash that ship with macOS.
This modifies the script so that: - When the target does not exist on disk at all, we proceed, even if the deletion is an unstaged change. - When there is an uncommitted merge, as happens when a conflict is being resolved (and occasionally otherwise), recognize this case (rather than misinterpreting the short status output), proceeding only if there are no changes at all at the target. Even staged changes cause the copy to be cancelled in that case. - Some cumbersomely expressed code is written more clearly, a couple of specific error messages are added for what should be rare failures of git commands, checking for grep failure is removed (grep should not actually fail if used correctly), and the big comment in first_line_ends_crlf is made more accurate.
This doesn't happen on Windows filesystems, but can happen elsewhere. It can happen on occasion by accident.
`find -print0` is not POSIX, but supported systems include: - GNU/Linux systems, using GNU findutils - Other Linux-based systems using BusyBox find (e.g. Alpine Linux) - BSD systems, including macOS (all versions, since 10) Since this is a Bash script, and even the old versions of Bash on macOS systems support `read -d ''` to read null-byte-delimited input, assuming `find -print0` is supported everywhere needed, this is highly likely to work everywhere needed. It allows filenames to be safely read even if they contain weird characters including newlines, as can happen accidentally (hopefully not intentionally). It would be nicer to loop through the result of a recursive glob (globstar, **), but Bash only supports that starting in version 4, which no version of macOS ships (or is likely to ship). The benefit of this approach, compared to the `find -exec` way used before, is that the code is simpler, no longer needs to call itself as a subprocess through find, and no longer needs a help to clarify a mode that is not meant to be used from the outside. (It may also be slightly faster.) This makes some behavioral changes, in areas where the design had been driven by the implementation rather than the other way around: - We stop on the first failure. - Because of that, there is no need to restate what files could not be generated (it is at most one, and the failure should show it). - As noted above, this script can no longer be invoked to process an individual file (which as not a design goal), and it therefore no longer accepts any command-line arguments (they are ignored). This also fixes a misspelling the big first_line_ends_crlf comment.
This also refactors slightly in other ways, clarifies a comment, and improves an error message.
For the real purpose of this new check, which is to verify that `just copy-packetline` has been run when needed, to ensure that gix-packetline-blocking/src is not out of date, it should only be necessary to run this check on one platform. So this should probably be changed to `runs-on: ubuntu-latest` instead of a matrix strategy soon. I'm running it on three platforms initially to check the script (though some functionality of the script is not exercised). I've also tested the script locally on Ubuntu 22.04.4 LTS, and on Windows 10 (in Git Bash).
Somewhat unintuitively, the "-" is considered to be a file operand and not an option, even though it treated to signify standard input rather than a file named "-". On some systems including macOS, options to cat cannot follow paths. On those systems, writing "-" before "--" causes "--" to be treated as a filename, causing the script to break on macOS. This changes the order, which lets the cat command work on all systems, though (as before) if the subsequent argument were a literal "-" then that would not have the desired effect. In this case that is okay, since we actually know that the other argument, the value of the source_file variable, is not exactly "-". (We even know that it does not begin "-". But keeping the "--" before it makes clearer to human readers that this argument is a path and never an option.)
This speeds things up, as no cargo or just commands have to be run, and no Rust toolchain has to be installed. If part of all of the script is replaced by a tool implemented in Rust, then of course this should (and probably must) be revisited.
Since --porcelain is guaranteed stable across future git versions.
Only the secondary reason, which is the less important one, was given before. Now both are given and the more important reason is emphasized. (If the entire subtree is deleted on disk, staged or not, then we do proceed, since neither disadvantage applies. I think this is clear from context, since this is checked first, and like the other checks, it is commented.)
These are copies, except for their headers, and generated by the etc/copy-packetline.sh script. This should make the new CI checks, which use the script, pass.
So when gix-packetline-blocking/src was not updated, the diff from running the script is shown in full. Right now this is to investigate how the script does not seem to work on macOS, but it may be useful to keep it.
While a command like `find foo` should add a `/` to make paths like `foo/bar`, some find implementations, including in macOS, add their own second slash even when there already was one, so `find foo/` outputs paths like `foo//bar`. (The current POSIX standard seems to forbid this, but it seems early standards did not, and in any case it is the behavior of find even on recent macOS systems.) That broke the copy-packetline.sh script on macOS: - Unsightly extra slashes were added to the headers placed in the copied/generated files inside gix-packetline-blocking/src. - Due to this happening on macOS, the script produced different output on macOS and other systems, preventing a precise portable CI check that the worktree remains clean even after running it. This commit addresses that by not adding a `/` to the path anymore, and making the required slight change to how the source prefix is removed (to be replaced by the target prefix) to accommodate this. That prefix removal was the original reason I had added a `/` originally. It served two purposes: to make the nature of the replacement slightly clearer to human readers, and to safeguard against a prefix that was not a whole path component. The first goal remains valuable but the benefit was slight, so it's okay to lose out on that. The second is no longer needed, because it was related to a way the script was more complex and thus more prone to to error at that time; the associated check was already removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a million for this improvement!
Please note that I just took a quick look and didn't read any of the PR description yet, which I will get to in a bit.
This reruns `just copy-packetline`, turning the comments in files in gix-packetline-blocking/src from doc comments to regular ones.
Building on d25e5d3, another option for names could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this wonderful work, leading to a very impactful fix! Finally, we can use cargo
on Windows again.
I only have made a couple of smaller adjustments. One change that I am not perfectly certain about is the removal of the continue-on-error: true
line, which I thought was a left-over rather than intentional. If that's wrong, let's fix it before the merge.
Although I did use CI checks during development (rebasing onto a branch of my fork in which CI runs on all branches), I've decided not to deliberately push each commit separately in this PR as I often do in GitPython. I'd be pleased to do something that like on request. The reasons I've refrained are that I don't know if it would be welcome here, with the more involved CI checks, and that I wonder if I may need to rebase this anyway, as I think this is a case where messages in the conventional commits style aren't called for, but I'm not sure.
I'd be happy to keep the current style of pushing once, as otherwise I am pretty sure CI would be blocked for a long time which could be an impediment.
If the cumbersome state of the script ought to be ameliorated, then I suspect that replacing it with Rust may be the overall best approach. I am also curious if there is some way to do the whole thing in Rust yet let it work even if
cargo
frowns on the state of the workspace.
I understand the reasoning for the script, find it very elaborate and well made, and chose to consider it read-only. It's not that it's unmaintainable, it's just that it has one purpose that it fulfils so we can move on to greener pastures :).
I am done with the refactor and encourage you to bring in last minute changes, also to the names - the ones proposed are shorter, and they are also not names I plan to merge once CI is complete, but probably only in an hour or so. |
Thanks, I'll push a change to rename the identifiers shortly. |
In the copy-packetline.sh script: - Rename all identifiers with "source" to "input", and with "destination" or "target" to "output". This includes variables and functions. - Adjust messages accordingly. This is not a simple search and replace, since sometimes "target" becomes "output location" (to avoid giving the impression that it must always being as a dir). - Adjust comments accordingly, but use the word "destination" in a few comments that had said "target", where it's clear it applies to "output" and seems to be clearer wording than "output". - A couple very small unrelated comment rephrasings for clarity. In the copy-packetline job definition in ci.yml: - Put `fail-fast: false`, which has no effect on a matrix that only generates one job, but will aid in producing results for all three platforms if the other two have to be reenabled to test a change to the script. This is what I had been using `continue-on-error` for, but `fail-fast: false` is the better way to do it (and also avoids confusion with the other meanings of `continue-on-error`). - Tweak indentation so that removing the leading `# ` on the macos-latest and windows-latest entries would place them in the list (so that if they ever have to be reenabled to test changes to the script, then this will be slightly easier).
Thank you, much better! I also just now realise that the new CI job takes a mere 3 seconds, so running more of these isn't ever going to be an issue. |
Really I should never have used that, because it was applied to each job in the matrix for the purpose of ensuring that results from all platforms appeared during testing, rather than to specific jobs considered experimental. I should have used
I think it is less confusing with just one, yes. If the script will rarely be edited, then I think it's better to have just the ubuntu-latest job for the check. |
In c957ab8 (GitoxideLabs#1340), I intended to mark the contents of `gix-packetline-blocking/src` as auto-generated for github-linguist, but I did not do so correctly, because giving the directory as a path does not specify the `linguist-generated` attribute as being unset for the contents. This fixes that. The bug, and fix, is revealed by comparing the output of `github-linguist --breakdown` before and after this: --- a 2024-11-18 15:45:57.376285107 -0500 +++ b 2024-11-18 15:46:44.808328857 -0500 @@ -1,6 +1,6 @@ -94.14% Rust -4.58% Shell -1.19% HTML +94.08% Rust +4.63% Shell +1.20% HTML 0.09% Makefile Makefile: @@ -682,23 +682,6 @@ gix-pack/tests/pack/multi_index/mod.rs gix-pack/tests/pack/multi_index/verify.rs gix-pack/tests/pack/multi_index/write.rs -gix-packetline-blocking/src/decode.rs -gix-packetline-blocking/src/encode/async_io.rs -gix-packetline-blocking/src/encode/blocking_io.rs -gix-packetline-blocking/src/encode/mod.rs -gix-packetline-blocking/src/lib.rs -gix-packetline-blocking/src/line/async_io.rs -gix-packetline-blocking/src/line/blocking_io.rs -gix-packetline-blocking/src/line/mod.rs -gix-packetline-blocking/src/read/async_io.rs -gix-packetline-blocking/src/read/blocking_io.rs -gix-packetline-blocking/src/read/mod.rs -gix-packetline-blocking/src/read/sidebands/async_io.rs -gix-packetline-blocking/src/read/sidebands/blocking_io.rs -gix-packetline-blocking/src/read/sidebands/mod.rs -gix-packetline-blocking/src/write/async_io.rs -gix-packetline-blocking/src/write/blocking_io.rs -gix-packetline-blocking/src/write/mod.rs gix-packetline/src/decode.rs gix-packetline/src/encode/async_io.rs gix-packetline/src/encode/blocking_io.rs (The test was done on Ubuntu 18.04 LTS ESM with `github-linguist` provided by the `ruby-github-linguist` package, version 5.3.3-1.) In addition to fixing that, this also simplifies `.gitattributes` files throughout the repository. This simplification includes removing redundant `clrf=input` (discussed in GitoxideLabs#1676) and a spurious unrecognized `-eof` (see GitoxideLabs#1689), changing lone `**` to `*` where equivalent, and some other changes. This also makes the style of `.gitattributes` files more consistent, using `foo`/`-foo` rather than `foo=true`/`foo=false` everywhere even where both work, and using whitespace more consistently.
Fixes #1310
This replaces the symbolic link at
gix-packetline-blocking/src
with a copied source tree fromgix-packetline/src
, with headers prepended to state that they are copies and, for each file, what file it is a copy of (other than that header).The headers are in the format suggested in #1310 (comment), except that a blank line (i.e., two newlines rather than one) appears after each one. In files where there are already
//!
comments at the top, I am unsure if a blank line is correct, or if it should instead be omitted or a second line of just//!
added to signify a paragraph break. Currently there is only one such file,gix-packetline/src/lib.rs
.This also adds a shell script,
etc/copy-packetline.sh
, which was used to perform the copy, and can be used at any time to update the copy when changes are made ingix-packetline/src
. It can be invoked directly, or by runningjust copy-packetline
. Both approaches work even ifgix-packeltine-blocking/src
has is absent or has been checked out as a regular file due tocore.symlinks
beingfalse
or symbolic links being otherwise unavailable.However, the latter approach, even though at this time no Rust tool is used, will still print an error message that may mislead the person running it into thinking it has failed. It shows the same message as in #1310 (comment). But this message actually always happens whenever a
just
command is run when the project structure is unacceptable tocargo
. The cause is:https://github.com/Byron/gitoxide/blob/68d1a29e8e497fa130e5733d1ecb6b7694a489ca/justfile#L193
It seems to me that it should be possible to keep that message from being printed for recipes that don't rely on it. But I think that ought be considered outside the scope of this PR.
The script is used on CI to check that it has no effect. For now I have added three jobs with a matrix strategy, but this is unnecessary to check that the script has been run if needed; for that we just need one. Instead, running it on three platforms checks that the script works on all of them (I don't have a macOS system). Because it does not test the full range of the script's functionality--as noted below, the script implements specific rules for when it is willing to make the copy--I also tested the script, I believe thoroughly, locally on Windows and Ubuntu.
I've kept all three jobs so if the script has to be changed during further work on this PR then they will be there for testing. I think all but the
ubuntu-latest
job can ultimately be removed.The script tries to refuse only when its effects could lead to data loss or confusion. The rules it follows, for the target, are implemented and commented in the
check_target_dir
shell function. Basically:gix-packetline-blocking/src
on disk, then always proceed. This is safe, and it provides a simple way to get to a state where the script will regenerate the files: deleting the entire subtree indicates a willingness for it to be replaced.The script does not stage its own changes. This helps avoid data loss, makes it easy to keep changes from the script identifiable by staging and then running it, and makes the script suitable for use in a pre-commit hook if desired.
The script decides between
\n
and\r\n
for the line endings of each file's prepended header by checking how the first line of the file being copied ends. I believe this is a better approach that analyzing the file as a whole, since if a file is to be inconsistent, it ought not to be made more inconsistent by having more separate sections with different line endings. Because various platforms, git configurations, and approaches to editing files affect line endings, I think there is no reasonable way to decide this besides to look at the files themselves. They are checked in a way that overcomes the strong tendency of tools in Cygwin-like environments (including MSYS2 and "Git Bash") to claim that\r\n
is merely\n
.The script performs a single pass, adding the headers at the time the files are written, rather than copying the files and then modifying them. This has some advantages, which might justify retaining this approach over others that have been discussed, as detailed below.
It may be that this shell script should be partly or wholly replaced with a tool coded in Rust. I believe that was the idea in #1310 (comment): that a shell script would check if it is safe and reasonable to copy the file and, if so, do so and then invoke a Rust tool to prepend the headers. The reason not to write the whole thing in Rust was that it can fail initially if there is no symlink or directory tree. I embarked on writing the script with the idea that I would write the whole thing as a shell script to get clear on exactly what it should do, then pare down the script, replacing the header-prepending code with a Rust tool.
As I wrote the script, I found that it was unexpectedly tricky, relative to the apparent simplicity of the task, to implement it as a sufficiently portable shell script that was robust to rare but plausible oddities. That made me think as much of it should be turned into Rust as possible. But I persisted in implementing the whole thing as a shell script initially, because I also noticed some advantages of it being implemented that way. I now suspect those advantages are sufficient to justify it remaining as a shell script in its entirety:
cargo
to be able to run and see files where they're supposed to be everywhere in the workspace; it may fail due to conceptually unrelated breakages in any crate.just
, because no Rust toolchain has to be installed.gix-packetline/src
, one might wish to perform the updates automatically using a pre-commit hook. Having them done by a script that doesn't depend on other state of the project benefits this, mainly in that one could leave the hook enabled all the time even when making other unrelated changes that don't requirecargo
to accept the state of the workspace.I am inclined to find those factors persuasive when taken together, but I am not certain. In particular, this should be weighed against one of the benefits of having as much of this logic in Rust as possible: more gitoxide developers are likely to be able to work efficiently with the code, since gitoxide is a Rust project. (On the other hand, gitoxide does rely substantially on shell scripts as fixtures.) In addition, while this is the sort of thing one would think a shell script would be great for, it actually incurs some complexities due to limitations related to portability that I believe would be absent in Rust code. Ways the script is weird or cumbersome in service of portability are detailed below.
Although I did use CI checks during development (rebasing onto a branch of my fork in which CI runs on all branches), I've decided not to deliberately push each commit separately in this PR as I often do in GitPython. I'd be pleased to do something that like on request. The reasons I've refrained are that I don't know if it would be welcome here, with the more involved CI checks, and that I wonder if I may need to rebase this anyway, as I think this is a case where messages in the conventional commits style aren't called for, but I'm not sure.
Cygwin-like environments on Windows, including MSYS2 and the MSYS2-ish "Git Bash" environment, are weird, and some of the weirdness, as mentioned above and as commented in the script, constrains the approaches that can be taken to distinguish
\r\n
from mere\n
, especially since the script should avoid using anything that may not be available on other systems that have nothing to do with Windows. This is covered in full in the big comment infirst_line_ends_crlf
, and thus not detailed here.Bash offers a number of nice features beyond what are standardized for
sh
by POSIX. But many of them were introduced in Bash 4, which macOS does not and probably never will ship. This needs to work on macOS systems where no newer Bash has been installed; furthermore, most scripts in the project specify/bin/bash
as the interpreter, and it could be confusing to change that to/usr/bin/env bash
just for this. (I have tried to adhere to the style of other scripts.) Before major version 4, Bash lacks recursive globbing (globstar
), the ability to have the rightmost command in a pipeline affect the calling environment (lastpipe
), and the ability to check the exit status of a command run in process substitution (process substitution would otherwise be a workaround for not havinglastpipe
).I assume Bash 3 or later, and the existence of command-line tools that are required by POSIX (in addition, I try to avoid using any that are, in practice, ever absent; for example,
file
is often not installed in Docker images). This is with one notable exception: I assume thatfind
, which I'm using because I don't have recursive**
globs, supports the-print0
action; if it does not, then the script will fail (but signal the failure properly, due toset -o pipefail
, which is supported even in Bash 3).The contents of this repository are generally trusted, and I doubt anyone will deliberately create a file with a newline character in its name (which Windows can't tolerate, but other systems can) without good reason. But:
gpg
orssh
) in their presence.Of course, this can be made fully portable by using the
-exec
action. But this has a couple of problems:+
form).Fortunately
-print0
seems in practice to be very portable. GNU findutils, BusyBoxfind
(as in minimal environments or Alpine Linux), and thefind
command in BSD systems, including macOS, all support it, and have supported it for a long time.It may be that there is a much simpler and more elegant way to do this than I think, even in a shell script. Ordinarily, when writing an explicit loop in a shell script, one should consider alternatives (and I have considered some). However, before proceeding with an idea to simplify the script, the general problem that most Unix-style tools that conceptually operate on text do not properly distinguish between
\n
and\r\n
in Cygwin-like environments should be considered. If the cumbersome state of the script ought to be ameliorated, then I suspect that replacing it with Rust may be the overall best approach. I am also curious if there is some way to do the whole thing in Rust yet let it work even ifcargo
frowns on the state of the workspace.