Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repro nightly builds #7651

Merged

Conversation

s373nZ
Copy link
Contributor

@s373nZ s373nZ commented Sep 7, 2024

Repro nightly builds

Description

This pull request attempts to address #7117 to implement Github Actions CI/CD to perform nightly reproducible builds. The goal is to catch local file changes that might emerge through the build process and result in a -modded release version. Currently supporting focal, jammy and noble Ubuntu releases.

Have been testing by adding the following to the on key of .github/workflows/repro.yml:

  push:
    branches:
      - 7117-repro-nightly-builds

Add adding some temporary commit to dirty the tree in the contrib/reprobuild/Dockerfile.[release]

Related Issues

Checklist

Ensure the following tasks are completed before submitting the PR:

  • Changelog has been added in relevant commits.
  • Tests have been added or updated to cover the changes.
  • Documentation has been updated as needed.
  • Any relevant comments or TODOs have been addressed or removed.

Additional Notes

  • Added an entry to the CHANGELOG -- it was the first one observed since the last release, so also added the front matter.
  • I started this work a few days ago, without realizing there was a "expiration" date on the Sphinx bounty entry for Aug. 2, 2024. Just before submitting this PR, and rebasing against master, I've noticed @ShahanaFarooqui has made a few commits regarding the repro build process. Don't mean to be stepping on toes! I don't know if this work overlaps, and I'm hoping it still qualifies for the bounty, but if not, please consider it as an alternative submission to whatever planning around repro builds made since Aug. 2.

@s373nZ
Copy link
Contributor Author

s373nZ commented Sep 9, 2024

The build failures appear to be in different place each time, and should not be related to my changes, so I assume they are inherited in CI build failures bubbling up from new changes to master over the weekend. I'll do my best to monitor the CI and rebase opportunely. Otherwise, please advise.

@s373nZ s373nZ force-pushed the 7117-repro-nightly-builds branch 3 times, most recently from 44448f0 to e0b85c5 Compare September 13, 2024 14:09
@s373nZ
Copy link
Contributor Author

s373nZ commented Sep 13, 2024

Some reformatting of the commands and minor syntactical optimizations.

@ShahanaFarooqui ShahanaFarooqui added this to the v24.11 milestone Sep 13, 2024
@s373nZ
Copy link
Contributor Author

s373nZ commented Sep 17, 2024

Removed modifications to the CHANGELOG.md and opted for a Changelog-None hint in the commit message. Users probably don't care about all the CI processes...

.github/workflows/repro.yml Outdated Show resolved Hide resolved
.github/workflows/repro.yml Show resolved Hide resolved
@s373nZ
Copy link
Contributor Author

s373nZ commented Sep 26, 2024

@ShahanaFarooqui Thanks for having a look! I'll address at your change requests as soon as I can, but it might take some time. As your question on testing is important, I'm leaving some preliminary feedback here so it's not wrapped up with the review comments.

I've been testing failed builds using this branch, which adds an extra commit to write a line in the Cargo.toml file just prior to the tools/repro-build.sh script being executed by the Dockerfile command. Here is an example failed run.

One interesting and potentially tricky aspect of this PR is this comment in the Dockerfiles -- that first the repo is cloned to the /repo directory by the CI/CD, and then it is cloned again from from that local clone to the "working directory" of the Dockerfile (/build). I remember that this made it a little interesting to figure out a testing plan. Also, I wasn't quite sure how to force a build resulting in a dirty tree. I'm not sure if that has an impact on your own test case, but it's something to be aware of.

I'll try to reproduce your test case as well. Is it safe to say that if I git revert 21fcd45 on my testing branch, you would expect the build to produce a dirty tree?

@ShahanaFarooqui
Copy link
Collaborator

I'll try to reproduce your test case as well. Is it safe to say that if I git revert 21fcd45 on my testing branch, you would expect the build to produce a dirty tree?

Yes; due to commit 21fcd45, CLN v24.08.1 builds are released with -modded.

@s373nZ
Copy link
Contributor Author

s373nZ commented Oct 1, 2024

@ShahanaFarooqui I tried to reproduce the -modded build locally using my 7117-repro-nightly-builds-test branch which contains 21fcd45 reverted using the Reproducible builds documentation, but it didn't produce a -modded build for me. I am happy to keep trying, but before I sink too much more time into it, are you able to provide any hints on how the $V variable that is found in the Makefile here is used, and when / how it should be set during the Repro build process to trigger the comment change in cln-rpc/src/model.rs?

Also, do any other test-cases which produce -modded build come to mind that we can try in tandem?

@ShahanaFarooqui
Copy link
Collaborator

@ShahanaFarooqui I tried to reproduce the -modded build locally using my 7117-repro-nightly-builds-test branch which contains 21fcd45 reverted using the Reproducible builds documentation, but it didn't produce a -modded build for me.

I pulled your branch s373nZ:7117-repro-nightly-builds and it is generating the -modded build locally with below steps:

poetry shell
poetry install
./configure
make

I am happy to keep trying, but before I sink too much more time into it, are you able to provide any hints on how the $V variable that is found in the Makefile here is used, and when / how it should be set during the Repro build process to trigger the comment change in cln-rpc/src/model.rs?

$V variable is to set verbose level (default is 0). I never used it in CLN but you can try it with make V=1. We do not need to set it in the process as default non-verbose run is good enough.

Also, do any other test-cases which produce -modded build come to mind that we can try in tandem?

All commits without tags will only generate -modded versions. Therefore, your current commit is generating a modded version too ( v24.08-65-g4009340-modded). You can see how VERSION is calculated here.

@s373nZ
Copy link
Contributor Author

s373nZ commented Oct 2, 2024

Apologies in advance for the looooooong comment.

I pulled your branch s373nZ:7117-repro-nightly-builds and it is generating the -modded build locally with below steps...

I followed the steps you describe and replicated the results you report. The version number as computed in the Makefile is indeed, -modded. However, I'm starting to feel confusion regarding the requirements of this task as detailed in in #7117:

  1. The developer tags the release
  2. Then uploads the tag
  3. Builds the reprobuild, but during the build something got changed
  4. The version string now is something like v24.02-modded due to
    the dirty tree.

My reading of the issue description, particularly #3, is that the Repro build process is performed according to the documentation here. So, my local testing process so far has been:

sudo docker run --rm -v $(pwd):/build ubuntu:noble \
	bash -c "apt-get update && apt-get install -y debootstrap && debootstrap noble /build/noble"
sudo tar -C noble -c . | sudo docker import - noble
sudo docker build -t cl-repro-noble - < contrib/reprobuild/Dockerfile.noble

mkdir -p release
sudo docker run --rm -v $(pwd):/repo -e FORCE_MTIME=$(date +%F) -ti cl-repro-noble

And then I check the name of the artifact tarball in release/ to see if it has been -modded. The extra FORCE_MTIME variable is set because it is not possible to derive the date from CHANGELOG.md unless more formal release steps have been taken.

After testing both methods:

  • Via your method of testing, via a local build, the VERSION is indeed computed to be -modded.
  • Via my method of testing, using Docker under the Repro documentation, the VERSION is not -modded.

I looked into the reason for the difference and it seems to be related to two particularities under the Docker setup:

  1. As mentioned in Repro nightly builds #7651 (comment), the Docker Repro process checks out a fresh copy of the local repository, ignoring any local modifications.
  2. The VERSION for the Docker Repro process is determined here in repro-build.sh, BEFORE the compilation happens.

In the work submitted so far, I've taken special care not to modify any of the Repro process scripts and tools for fear of unintended consequences.

Questions

The below (somewhat rhetorical) questions follow from this research:

  1. Which way are the Repro builds being performed in practice? And which method of testing is deterministic of success against the requirements in gci: Consider adding a nightly CI workflow for reprobuilds #7117?
    • If some methodology performing local builds is used, should the Repro documentation be updated, and should a local build be performed in the Github action instead of the Repro build?
    • If the Docker method in the docs is used, under what circumstances would a dirty tree from a local build become an issue with a release?
  2. Is it possible that computing the VERSION in repro-build.sh prior to performing the build is considered a bug?
    • Maybe a refactor to change the Dockerfiles to perform a fresh repo checkout in this commit had the unintended consequence of missing local changes when computing the initial VERSION?
    • I would take a shot at changing the repro-build.sh script to compute the VERSION after the build is performed, but I see a conflict because VERSION is passed in to the make command.
  3. According to your last comment, All commits without tags will only generate -modded versions. If this is the case, wouldn't ALL nightly builds produce -modded as they aren't tagged, and thus negate the value of having nightly builds? When building under master, the version is not -modded, but that branch also doesn't contain post-build local changes.

Possible Solutions

  • Add another step to the action which 1) performs a "regular" local build prior to the Repro build, 2) computes the VERSION and passes it in as an override variable FORCE_VERSION to 3) make the Repro build use this value for the test release.
    • Concerned this is redundant and bypasses the Repro build process too much.
  • Remove the Docker Repro build functionality and replace it with a local build instead.
    • Concerned this is not reflecting the Repro build steps at all as per the documentation.

Thanks again for your help and patience on this. Let me know if I'm missing something or you have feedback on the requirements or next steps. Happy to catch you on Discord sometime to discuss as well.

sidenote: I noticed the bash command to compute the VERSION differs slightly between the Makefile here and the repro-build.sh script here. Should these be uniform?

@ShahanaFarooqui
Copy link
Collaborator

@s373nZ It seems there might be some misunderstanding regarding both the requirements and the approach. Let's discuss it on Discord first, and then we can update the final decision here. Are you on CLN's Discord server?

@ShahanaFarooqui
Copy link
Collaborator

All commits without tags will only generate -modded versions. Therefore, your current commit is generating a modded version too ( v24.08-65-g4009340-modded). You can see how VERSION is calculated here.

Apologies for the confusion between a dirty tree (-modded) and the number of commits with the commit hash suffix. Each new commit increases the commit count and updates the commit hash, but it does not trigger the --dirty flag unless there are uncommitted changes in the working directory.

@s373nZ
Copy link
Contributor Author

s373nZ commented Oct 3, 2024

Now that we agree on the difference between the commit hash and the -modded appearing in the VERSION, Shahana explained more about the release process. I need to explore that more to determine how builds with local changes are actually ending up -modded. Will look further into tools/build-release.sh and the Release Checklist for clues on how to modify the PR to emulate these failures.

To test I will make two additional test branches with the following changes

@ShahanaFarooqui
Copy link
Collaborator

ShahanaFarooqui commented Oct 3, 2024

  • Change pyln-client version from "^24.2" to "^24.8" in the clnrest plugin and it should make a dirty version of poetry.lock. This should result in a build which is -modded.

This will work only if we run poetry install first. But Dockerfile.<ubuntu-version> uses pip install instead :).

@s373nZ
Copy link
Contributor Author

s373nZ commented Oct 4, 2024

Made a few more changes:

  • Added a phony Makefile target named version which just outputs the $(VERSION) variable that is computed by the Makefile bootstrap. Makes it available to re-use this functionality in other processes.
  • Adjusted the Repro Github action to extract the VERSION from the saved image in a similar fashion to how we are capturing the Git logs.
  • Adjusted the Assert check to read the extracted version for -modded and fail on that case as well as a -modded release file.
  • Rebased against master

@s373nZ
Copy link
Contributor Author

s373nZ commented Oct 4, 2024

Testing

@ShahanaFarooqui following from our conversations above and in Discord, here is a summary and results of the test cases:

  1. Testing branch: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test
  • This branch only contains a modification to run the action on a push to the branch. Its intent is to test the PR functionality without needing to wait until the run is scheduled.
  • Interestingly, after I rebased against master, this branch's run succeeded for noble and jammy, yet failed for focal as there appears to be local modifications after the repro build. I tested this locally and reproduced the local modifications under focal, so I'm actually wondering if there is a pre-existing -modded build happening on focal.
  • Due to the above, focal builds fail across all the tests.
  1. Test case 1: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-1
  • This branch has reverted commit 21fcd45 here
  • RUN FAILS as expected due to differences in cln-rpc/src/model.rs
  • TEST SUCCESS (AFAICT)
  1. Test case 2: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-2
  • This branch is a direct checkout of origin/release-24.08.1 with this PR's commits cherry-picked on top.
  • RUN FAILS as expected due to differences in cln-rpc/src/model.rs
  • TEST SUCCESS (AFAICT)
  1. Test case 3: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test-3
  1. Test case 4: Used the following instructions locally, but substituted jammy for noble, as there was an error compiling lowdown with jammy because it required GLIBC_2.38. May be related to recent lowdown releases in last few days?

Instructions

Failure case 1:
-------------------------------------------------------------------------------------------------------------
1: Checkout master branch
2: Build builder images: `. contrib/cl-repro.sh`
3: Run your local image: docker run -it -v $(pwd):/repo cl-repro-jammy /bin/bash
4: Compile CLN inside the builder image:
    4.1: cd /repo
    4.2: ./configure
    4.3: make
5: Dirty tree due to missing poetry shell & install:
    modified:   contrib/pyln-grpc-proto/pyln/grpc/node_pb2.py
    modified:   contrib/pyln-grpc-proto/pyln/grpc/node_pb2_grpc.py
    modified:   contrib/pyln-grpc-proto/pyln/grpc/primitives_pb2.py
-------------------------------------------------------------------------------------------------------------
  • I ran this locally with noble and then tested it against a copy of the assert logic, and it produced a -modded VERSION with the expected dirty tree.
  • TEST SUCCESS (AFAICT - I don't see how we can run this properly through a branch and the Github action, and the jammy compilation for lowdown is also concerning)

In summary, I am pretty happy with the testing so far. The big question is whether the new changes have discovered that focal repro builds are always producing dirty, -modded versions. Let me know what you think. Happy to chat on Discord or set up another meeting to go over things more efficiently. And no rush -- take your time :)

- Locked grpcio-tools version to fix dirty tree issue. Ref: ElementsProject#7376 (comment)

- Updated python version to 3.10 for future proofing

- Added manual dispatch for github action
@ShahanaFarooqui
Copy link
Collaborator

  1. Testing branch: https://github.com/s373nZ/lightning/tree/7117-repro-nightly-builds-test
  • This branch only contains a modification to run the action on a push to the branch. Its intent is to test the PR functionality without needing to wait until the run is scheduled.
  • Interestingly, after I rebased against master, this branch's run succeeded for noble and jammy, yet failed for focal as there appears to be local modifications after the repro build. I tested this locally and reproduced the local modifications under focal, so I'm actually wondering if there is a pre-existing -modded build happening on focal.
  • Due to the above, focal builds fail across all the tests.

I pushed the last commit to fix the dirty tree issue on Focal by locking the grpcio-tools version. The problem was caused by grpcio-tools version mismatch. Core lightning is using v1.62.2 as specified in poetry.lock, but Dockerfile.<ubuntu-distro> was installing v1.66.2 by default (via pip install). This led to modifications in contrib/pyln-grpc-proto/pyln/grpc/node_pb2.py, contrib/pyln-grpc-proto/pyln/grpc/node_pb2_grpc.py, and contrib/pyln-grpc-proto/pyln/grpc/primitives_pb2.py.

Solution reference: #7376 (comment)

I also updated the Python version to 3.10.0 on Focal to ensure consistency across all three distributions, making it more future-proof.

  1. Test case 4: Used the following instructions locally, but substituted jammy for noble, as there was an error compiling lowdown with jammy because it required GLIBC_2.38. May be related to recent lowdown releases in last few days?

As discussed on Discord, lowdown releases cannot impact CLN because lowdown is locked to commit edca6ce for us..

In summary, I am pretty happy with the testing so far. The big question is whether the new changes have discovered that focal repro builds are always producing dirty, -modded versions.

Yeah, testing looks really great; we have already covered some good real-world scenarios. However, my last push did not trigger the action, and I was unable to trigger it manually too. Could you please confirm once again if the Focal build passes on the master?

@ShahanaFarooqui
Copy link
Collaborator

ACK c7f89c3.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack c7f89c3

@ShahanaFarooqui ShahanaFarooqui merged commit 859f795 into ElementsProject:master Nov 1, 2024
38 checks passed
@s373nZ s373nZ deleted the 7117-repro-nightly-builds branch November 1, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gci: Consider adding a nightly CI workflow for reprobuilds
3 participants