-
-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize release CI workflow and include universal binaries and checksums #1486
Conversation
This is inspired by the ripgrep workflow, though it currently does it in a different way, since right now we are using actions rather than the `gh` command. Actual releases should of course become public. This can be done manually; or another job, which depends on others, could be added so it happens automatically if and after all builds succeed; or this change to creating the releases as drafts (which at this moment is mainly useful to facilitate testing without causing confusion) could be reverted if no longer needed.
The variables that controlled the behavior of Rust programs and the output of `cargo` are only used in the `build-release` job, not in the `create-release` jobs, and one of them was already set in the more suitable narrower scope. This moves them to that scope, without duplication. They still apply to all steps in the build jobs. This also: - Adjusts comment wording to clarify how `CARGO` is set and used. - Quotes `1` as an environment variable value, since such values are always strings (it was being converted), but having it as a numeric YAML literal made it seem like that was not the case. - Adds spacing for clarity and for stylistic consistency across the two job definitions.
This is strongly inspired by an analogous check in the ripgrep workflow, but done differently (from how that currently does it) by parsing `Cargo.toml` using `yq`. The `yq` command supports reading TOML (just not writing it). This command is present on the `ubuntu-latest` runner. This also moves the "Show the version" step into the new step, and has that step also show the version extracted from `Cargo.toml`, so as to make errors easier to understand and also to help verify that this logic is really working.
This way, real mismatches won't appear wrongly to be due to the "v" that is now accounted for.
This permits names that start with `TEST-` or end with `-DO-NOT-USE`, printing a different "OK" message for them. The check comes after the comparison to the version in `Cargo.toml`, which is still always looked up, for three reasons: - Although it would be very weird and probably a bad idea to put a `-DO-NOT-USE` version in `Cargo.toml`, if it were ever done accidentally or on purpose, the message indicating a match to `Cargo.toml` should still be written. - Having code paths that are only exercised for actual releases and rarely or never in testing is likely to lead to bugs. - For looking it up and reporting it initially: this information is potentially valuable even when deliberately not used. This commit also makes two other changes in that same script step: - A custom message is now printed if the version is rejected only because it didn't have the "v" prefix (which this project's version tags and GitHub release names are using). - Stylistic adjustment, mostly to match the quoting style used throughout the workflow.
This is mainly to avoid the effect of specifically recommending that the release version/name (usually taken from a tag) be changed to match the version in Cargo.toml, which would not always be good advice because it may be that the version in Cargo.toml is wrong, or that both are wrong, or in testing that a specially named tag (or `VERSION` environment variable) should be used.
This is to keep the steps that produce that directory tree all together.
We are already using a `$CARGO` environment variable, which is mainly for convenient expansion by the shell, but `cargo` (and commands like `cross` with the same interface) pass it on to build scripts, which may use it (and if or how they do may in principle vary by feature or target). But the sitation with `$TARGET` is analogous--it would also make commands more readable, and it is also passed down to scripts by `cargo`. So this adds `$TARGET`. Doing this serves another purpose, which is to make it easier to reason about the semantics of the commands the shell is running. Using `${{ }}` interpolation should not be a security risk here, since all values are trusted. But injecting characters such as `'` could still happen by accident. Often it may not be justified, outside of reusable workflows or those running on events with elevated security risks, to route them through environment variables to ensure their contents are not interpreted specially by the shell. However, with the addition of `$TARGET`, it seems that most of that has already been done, such that clarity is overall improved rather than worsened by going the rest of the way. So this does that too, adding other environment variables in the narrowest scope that is broad enough to avoid duplication. Now all `${{ }}` interpolations are outside of script code. Note that these changes only apply to the release workflow and may not necessarily be justified in other workflows.
- Slightly reword and reformat the block comment. - Run two commands instead of using `&&`, since with `shell: bash` (inherited as the default as specified at the workflow level), the actual shell command invoked for script steps includes `-e`, so the script would still immediately fail if the first command fails.
This makes substantial changes to the release workflow, most of them straightforwardly adapted from corresponding material in the ripgrep release workflow: - The biggest change is to use `gh` (the GitHub CLI) instead of both the ncipollo/release-action and actions/upload-release-asset actions. - Use outputs instead of artifacts for the information that needs to go from the `create-release` job into the `build-release` jobs. This eliminates the need for actions/upload-artifact and actions/download-artifact. Furthermore, since `gh` doesn't require a URL to add files to an existing release, there is only one output, the version. - Split up the "Build archive" step so it doesn't need awkward conditional logic inside a single script step. Now the platform agnostic part of creating the directory and putting documentation in it is one step, followed by steps with `if:` keys for Windows and Unix. For this, the main differences from how it is currently written in the ripgrep workflow are the step titles, the uses of shell expansion rather than `${{ }}` interpolation for the environment variables, and the omission of checksum files since we are not currently generating those. This notably does not add either of the following to the workflow: - This does not set `permissions:` for the workflow. It was not set before, so the configuration, including in the upstream repo, seems not to require it. (Note that this does not imply that the configuration in the ripgrep repo doesn't require it.) - This does try to do anything explicit to take the place of specifying `omitBody: true` for ncipollo/release-action. I'm not sure what should be done for this, but the current behavior seems to produce the same result, and passing `--notes ''` to `gh` might go too far. The current ripgrep workflow has no explicit argument corresponding to this.
This expands the stub job for making Universal 2 binaries for macOS into a full definition, downloading and extracting the two architecture-specific archives from the GitHub release. This may require further refinement. It also refactors the way environment variables are set for the preexisting jobs: - Order variables in `env:` so they may be easier to understand. - Define `VERSION` in `env:` rather than in its own step. I think this is a bit more readable, but also, it allows the new job to be stylistically consistent with the preexisting jobs.
- Add missing `GITHUB_TOKEN` for download. This is neeeded even for the download step because the release is a draft. (Even if that is changed, it may be made a draft manually during creation under some conditions, which would not usually signal a wish that the Universal 2 binary archive specifically be omitted.) - Break up into more steps.
This adds a `REPOSITORY` environment variable in at the job level for build-macos-universal2-release and references it in the two steps that apparently need it. Although these are the same two steps that use the token, that is not added to steps that don't require it, since it is sensitive.
As commented, this should catch if the features lists get out of sync. (It may catch other problems too, but without this issue, the check is likely not justified. Printing the list of artifacts at the time publishing is about to occur may still be justified in that case, though.)
This uses the code from the ripgrep workflow to do so, with small modifications to fit the style used here, and, except for the code that is specific to Windows, occurring twice: once for most of the Unix jobs, and once for the macOS Universal 2 archive. This also makes these closely related changes: + Refactor the parts of the Universal 2 job that are similar to the other jobs so they are expressed more similarly. + Check the new checksums for the `gh release download` downloaded archives that the Universal 2 job takes its architecture-specific binaries from (to combine into an universal binary). The risk that the files would be corrupted when downloaded in this way is *extremely* low, but the presence of a checksum published for the Universal 2 archive might be interpreted to mean that downloaded archives used for the constituent binary images were verified. (As done here, this verification is not really for security, since the checksums used to do it are obtained from the same source in the same way -- which fortunately is pretty secure. It may safeguard against a very small risk of corruption. It also fails earlier if the files are not downloaded at all, in case the cause is not one that caused `gh` to exit with a failure status.)
With `shasum` for Unix checksums, they look like: 407860b1605577700750b92f464068fdaa65ff5ecb7fabcd5a9ba8dac7156149 gitoxide-max-pure-v0.38.0-alpha.2-DO-NOT-USE-x86_64-unknown-linux-musl.tar.gz With `certutil` for Windows checksums, they looked like: SHA256 hash of gitoxide-max-pure-v0.38.0-alpha.2-DO-NOT-USE-x86_64-pc-windows-msvc.zip: 870a157307d8674f981278afa2161973d65a4c6956fc2810cdc901886a41da12 CertUtil: -hashfile command completed successfully. Unlike `shasum`, the `certutil` command does not verify checksums, it only generates them. As far as I know, there are no common tools that require the format to be as `certutil` outputs it. In contrast, tools commonly expect the format `shasum` outputs. Furthermore, the Git Bash environment from Git for Windows includes `shasum`, which means it is present: - On GitHub Actions runners for Windows (as for other platforms). - On the computers of most Windows users in gitoxide's user base. Even if someone does not have `shasum` or another tool that will automatically verify checksums from this format, they would at worst need to verify it manually, which I believe is typically already the case when examining output from `certutil` in the above format. Furthermore, we have not published checksums before, so for gitoxide no one is relying on checksums being published that way.
All files whose checksums are computed here are binary files, and even if `shasum` were to behave the same, it omits the `*` prefix in front of the filename that allows humans and tools to know for sure that it was binary mode unless `--binary` is passed.
To avoid setting `GITHUB_TOKEN` where not needed, this defines it in `env:` for each individual step that needs it, as elsewhere. This is more awkward than in the other jobs, since half the steps use it here, but this may be okay.
Since they are all exactly the same as the job IDs.
This is to account for recent additions that are not present in the history of the ripgrep workflow, so if there are problems with them then people don't spend too much time trying to figure out what in the ripgrep workflow they would have come from.
Thank you so, so much for tackling this! Here is my thoughts and comments related to the various points. 1. Validate version against Cargo.toml unless specially namedNeat, appreciated! 2. Create release as a draft and publish after assets are all attachedThat is a major improvement, thank you! 3. Use gh for all release operations, eliminating deprecated actionsThat is great to hear, 4. Make archives with macOS universal binariesThis is fantastic and a great achievement, thank you!
I had touched points with these over at GitButler and didn't particularly like them for the added level of indirection. But then again, I also don't like YAML based things so I am biased. However, if you think the complexity is worth it and feel like it should be done, I'd absolutely go with it if it manages to avoid code duplication. Personally, I am fine with the trade-offs as is, as I assume builds work most of the time now. 5. Generate and include SHA256 checksumsThank you, I am absolutely happy with the choices made. Something I'd find helpful is if the filename would reveal information on either how it was created or how users can validate the hash. I suppose the suffix already does it, but I wonder if naming it If you agree, this could be a follow-up PR. 6. Document and refactor to avoid confusionI just read through the new file start to end (without using the diff-view), and can confirm that it makes a clean and much simpler impression, with all steps clearly marked. Great work! There was the 7. Make the workflow more readily testable?
I agree, let's go with that. 💜Lovely, thanks again 🙏 |
No problem!
This is thanks to @NobodyXu for suggesting it and explaining how to use use
There are ways to bring in static type checking and even a different syntax. I don't know how much that would help or if it would be worthwhile. I'll try and open a discussion question about it soon.
At this point, I think I would only want to do it if it does not increase complexity. Even then, my main reluctance is that putting code that really is tightly coupled to this workflow in another file could make it harder to understand what is going on, even if the combined length and complexity of both files were to decrease.
One problem is that I don't think I actually know how people should validate the hash, other than that they should use whatever program their system has for validating SHA256 checksums provided in a file formatted in this traditional way. I think most Unix-like systems have such a command, but the best command to do so may differ across systems. The available command for this is not always The workflow uses the One might think that Most Linux-based systems that intentionally support command-line use will therefore have
The above is mostly a problem for naming the file in a way that makes clearer how it should be used. It is not necessarily a problem for giving instructions, which could simply say something like, "On most systems, you can use However, I am not sure it makes sense for a file with those instructions to be among the release files. It seems like this belongs in the top-level readme, or a separate top-level file giving or reiterating installation instructions. The disadvantage of this is that if the way the hashes are provided changes over time, then there could be skew between the version of the documentation that a user is reading and the version that would apply to the download the user is doing. Some projects repeat the hashes of downloadable archives in the release notes and give instructions. If that is done, then this makes sense there. Or the release notes could mention the file of hashes and say how to check them, or link to information about how to check them. But I don't know if either of these things is worthwhile. There is another consideration that may be important. Providing a single file of hashes may be confusing to users who have not often checked hashes, because running Whatever is done for this, the
Thanks--this is tricky to be sure of, and your appraisal increases my confidence that the stylistic changes have helped, or at least not hurt.
This was actually there before, with the same I didn't do anything further with it in this PR, because it is something I hope to remove altogether soon, when working on #1477 and adding more Linux targets and features in the spirit of #1242. I don't think the packages listed there would be sufficient to do everything we want to do with musl. I'm also not sure any such command could be sufficient, since I tested
In view of that, it seems to me that it may be possible to make and publish musl builds for all features that we are building other targets for, without changing any code of If this problem can be solved soon--even if not by the ultimately solution which is probably rustls--then I think further changes to the step with the |
For the signature part, I'd recommend to use minisign as cargo-binstall can use it to verify identity of the publisher. The schema I recommend is to generate a new key-pair during publishing, write the private key into For the private key, it would be used to sign the binary artifacts, and then be destroyed. In cargo-binstall release PR, we achieve this by setting up another key-pair, use it to encrypt the temporary private key, and upload it as artifacts. Once all package building/signature is done, we delete the private key. |
Thanks for the suggestion, @NobodyXu , I like the idea of huddling around a standard that is already supported by a tool tasked with installing software, it's a great match. There is certainly complexities involved due to the key handling, but I'd hope it's exclusively during artifact production in this workflow. Otherwise, So all in all, maybe aligning around how influential projects do it might be more suitable as first step.
The IDE can certainly provide more support, but I don't know if this is anything that can be improved by changing the file. I suppose it's more about which plugins are installed. Probably I am missing something, apologies.
It's exciting to hear that this could go away and the MUSL builds could work with the |
It depends on how you generate the minisign key-pair. If you generate it once and stored the private key within repo secrets and write the public key to If you generate one per release like cargo-binstall does, then you'd have to change It would have to generate a key-pair, write the public key into Cargo.toml before publishing, and the private key would be encrypted and uploaded to action artifact for signing (removed after building is completed). We did this because crates.io is guaranteed to be immutable, whereas github release can be modified at anytime. By generating a unique public key for each release and putting it into |
Thanks for elaborating. I think for one, I think minisign is an upgrade to just providing hashes as it can potentially validate authorship, hashes can only validate the file one was supposed to download was actually downloaded. Having a per-release key that is thrown away seems to invalidate the idea of proving authorship, as it's an ever-changing public key that anyone could have created. Maybe this is also where I am wrong or misunderstand things. |
The idea is to minimize the risk of malicious takeover. By having a key for each release, stored on crates.io, we guarantee that even if attacker takeover the account/repository, they can't modify the existing release artifacts without getting noticed. Sure they can create new release, but they can't modify existing releases and put something into it. |
Ah, right, and |
Yes we already check that in cargo-binstall
We currently only support crypto asymmetric keys in the Storing the key in github release makes no sense at all, since it can be overwritten by malicious takeover. We also have plans to support more algorithms |
Yes, for VS Code there is the GitHub Actions extension. But I'm suggesting something else: generating them from a DSL written in a statically typed language. This may be a hard sell because, as far as I can tell, currently no one has made anything where Rust would be that language. I've opened #1512 where I have described this in more detail.
Due to #1493, I am not sure this can be solved the way I had hoped. Since the step with Regarding signing releases as discussed above, is there anything I can or should do to help out with that? |
I'd leave it as is, but go back to that once The BoM already interests me a lot though, and if that interests you as well maybe that could be investigated for the purpose of adding it to GitHub releases. |
Closes #1478
Closes #1484
Major changes:
Cargo.toml
unless specially named.gh
for all release operations, eliminating deprecated actions (#1484).First… what it looks like
To avoid confusion with real releases of gitoxide that people should actually use, I plan to delete that release eventually, but given its distinctive version "number," I'm not too worried and I'd be pleased to leave it up for at least as long as this PR is open and, if useful, possibly a bit longer.
If one wishes to do so, one can download all the files from that test release using
gh
:mkdir tmp cd tmp gh release -R EliahKagan/gitoxide download v0.38.0-alpha.1-DO-NOT-USE
Each of the above checkboxed points ("tasks") is elaborated below, including the last one which I've written as unchecked because I'm not sure the changes here really achieve it. [Edit: It is now checked too, having been done later, in #1499.]
1. Validate version against
Cargo.toml
unless specially namedThis is inspired by a check performed in the ripgrep workflow, though the behavior here is different to permit
TEST-*
and*-DO-NOT-USE
and messages that report the nature of the likely mistake. Also, while I think this may not have been the case when ripgrep added this check, now theubuntu-latest
runner has theyq
command, which accepts TOML input and therefore allows extracting the version fromCargo.toml
to be done robustly.2. Create release as a draft and publish after assets are all attached
This creates a draft release, which is only visible to users with push permissions on the repository. If any of steps (of any job) fail, it remains a draft release and the problem can be investigated. If they all succeed, then the last step of the last job marks it non-draft, publishing it.
This is directly adapted from the ripgrep workflow. The difference is that marking the release non-draft seems to be done manually there, rather than happening automatically. (For further discussion of the options related to this, see "Make the workflow more readily testable?" below.)
3. Use
gh
for all release operations, eliminating deprecated actionsThis fixes #1484 by using
gh
instead of several actions related to operating on releases, including theupload-release-asset
action, but also others that are not deprecated. This is for the reasons presented there, and as mentioned there, this also uses a (single) job output rather than GHA artifacts for passing metadata (now it is just a version) to subsequent jobs, thereby eliminating the use of actions related to GHA assets as well.Although there are minor differences, and also more uses of
gh
because it is also used in the changes presented below, this is perhaps the area where the greatest benefit has coming from incorporating changes that appeared in the ripgrep workflow.4. Make archives with macOS universal binaries
As recommended in #1478 (comment), this uses
lipo
to create archives with macOS Universal 2 binaries. There is one such archive for each feature.It makes them by downloading the
aarch64
andx86_64
archives for the same feature from the draft release and making a repack in which theein
andgix
binaries are each combinations of both architectures' binaries.Unfortunately, I did not find a way to make the dependencies between jobs as granular as I had hoped in #1478 (comment). I wanted to make them depend only on non-universal macOS and only on the ones for the same feature. It doesn't look like there is currently a supported way to do this with GitHub Actions using
needs
, and as far as I can tell it would not be made easier by using artifacts rather than the release itself to hold the assets.It should be feasible to do it, or at least get part of the way there, using other techniques. In particular, the macOS jobs could be split off into a separate job definition, and substantial code duplication--which if present in this workflow would almost certainly lead to bugs, I think--could be avoided with a reusable workflow or composite action.
I decided not to try to do anything like that at this time. Instead, I just made the jobs that generate builds with universal binaries depend on all the jobs that generate the other builds. Unfortunately this means there is a substantial delay between when the architecture-specific builds are attached to the release and when the universal builds made from them are attached. There are two ways this could matter, though currently they are not very significant:
fail-fast: false
is set in thebuild-release
job definition so that jobs from that matrix are no longer cancelled automatically when others fail, then having the universal builds depend on the aarch64 and x86_64 builds could cause the universal builds not to be built when otherwise they would be able to.5. Generate and include SHA256 checksums
The ripgrep workflow does this and I figured it would be a good idea. But this differs in a few ways:
hashes.sha256
files that lists all of them. I wasn't sure what filename to use; the namehashes.sha256
is inspired by that file in PowerShell release assets. If anything goes wrong in any step of any job up to that point, then the individual checksum files are retained in the unpublished draft.shasum
to include the leading*
before each filename in its output, attesting that the checksums were computed while treating the files that way.shasum
is used on all platforms, even Windows, becausecertutil
produces output in a different format that, to the best of my knowledge, no common tools will automatically use. (Theshasum
command actually exists on the Windows GHA runners because Git for Windows ships it; it's just in a subdirectory that is not always in thePATH
.) An example of a Rust project that uses this traditional Unix format for all archives including.zip
archives for Windows targets isuv
(though I did not use anything from its release workflow).Although I think this approach is reasonable or I wouldn't be doing it, I also chose it because of its proximity to other approaches: this should not be too hard to change into any of the following if preferred:
.sha256
files and still includehashes.sha256
..sha256
files and don't includehashes.sha256
..sha256
files even while the release is a draft.gpg
to sign thehashes.sha256
file (would need another secret if automatic).6. Document and refactor to avoid confusion
Overall the comments are actually sparer. But I placed one at the top of each job definition identifying what it does. I think that creating the release as a draft is also more intuitive, which does not mean we necessarily should keep it that way, but does mean that less confusion is likely when reading the workflow as long as it is the case.
The refactoring largely consists of changes to the way non-literal values get into script steps:
if:
keys guarding the steps.$VAR
) rather than by interpolating into the script with a template (${{ env.VAR }}
).env:
. (When non-sensitive and frequently used, put them in the job-levelenv:
.)In other scenarios--not this one--the latter two points would be necessary for security, because untrusted data can contain arbitrary characters including quotation marks. In this case, that is not a worry (except by accident), because none of the values are ever untrusted or otherwise especially unpredictable. So this is largely a stylistic matter. But:
env
, such as the lifetime of variables inserted by appending to$GITHUB_ENV
. I think this helps avoid that too.github.GITHUB_TOKEN
, but it does have write permissions.Relating to the above, I had previously removed debug printing that didn't work because it attempted to print environment variables that were not set. I gave the workflow another look to see if maybe some of this information would be useful after all, and the ripgrep workflow helped too in what it prints. I brought some of that back, in subsequent steps so that the variables are set before being printed.
7. Make the workflow more readily testable?
Having this make releases initially as drafts seems beneficial in and of itself, in that the release can get into its final state before being made public, and in that the failure to produce or attach assets may sometimes mean the release should not be published without fixing something.
But the bigger reason I did it this way was that it enabled me to test this in my (public) fork, without creating a large number of releases that could potentially confuse people especially if they appeared in their GitHub "what's new" type feeds. (Though I did also name my versions with the string
DO-NOT-USE
, so most likely no one will be misled.) Only pretty near the end did I add the commit that enabled the last step of the newpublish-release
job.But there are other possible approaches and some considerations the current approach may not adequately satisfy:
cargo-smart-release
have bearing on this?Customization is pretty easy to add when a workflow is triggered by the
workflow_dispatch
event (here's an example). It is not always quite so easy in what is the more important case here of being triggered in push. But, for example, this could look at repository variables to determine some of its behavior. It could also have different behavior based on factors like whether it is running in a fork, though I would want that only to be a matter of different defaults so that testing changes to it in a fork could still be done robustly.Of course, it can also be changed over time as preferences about how it should work become clear or change. It will also be changing in order to simplify things related to stripping debug symbols (#1477) and to publish releases for more targets or for some existing targets with more features (#1242). This PR satisfies the second point of #1242 (and #1479 satisfied the first), but otherwise this PR does not attempt to cover any of that.