-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant build regressions on swift:6.0-noble
compared to 5.10-noble
#76555
Comments
Some notes, in no particular order (will update if I think of more):
|
To be clear:
It's very sad to see the [Linux] build situation constantly getting worse despite us asking for faster and less-RAM-hungry builds for ages. Anyways, let's head to the benchmarks I did. I likely ran 500+ CI jobs in the past 48 hours ... Tests CI
(Side note: I didn't know using higher RAM could hurt?! I don't think it's a machine-type problem since the deployment builds below show the expected behavior of some performance improvements when having access to more RAM.) Analyzing the results (Excluding the bigger
Deployment CI
Only notable change when I moved our app to Swift 6 compiler, is that we have 3 executable targets which Swift was throwing errors about e.g. using You may ask: "Didn't you say you needed 8x larger machine to run the tests? How come this time they ran even on the same machine as you had before when using So: Worth mentioning, when the build is stuck, I consistently see a sequence of logs like this, containing [6944/6948] Wrapping AST for MyLib for debugging
[6946/6950] Compiling MyExec Entrypoint.swift
[6947/6950] Emitting module MyExec
[6948/6951] Wrapping AST for MyExec for debugging
[6949/6951] Write Objects.LinkFileList |
This almost starts to sound like a recurrence of the infamous linker RAM usage problem due to huge command lines with repeated libraries. @al45tair Is there any chance we're still failing to deduplicate linker command lines fully? |
swift:6.0-noble
swift:6.0-noble
compared to 5.10-noble
Yes, I pinged you on that last month, but never got a response. A fix was just merged into @jmschonfeld or @shahmishal, can that fix be prioritized to get into the next patch release? |
@MahdiBM, thanks for all the build info. Do you do any CI builds of the 6.0 snapshot toolchains before the final release? That would help find and stop build regressions like this when they happen, rather than being surprised on the final release. If you can, I'd like to know how an earlier 6.0 July 19 snapshot toolchain build for jammy does on these same CI runs of yours. That might help figure out the regression, particularly if you compare it to the next July 21 build of the 6.0 toolchain. |
@finagolfin this is an "executable" work project, not a public library, so we don't test on multiple Swift versions.
We just use docker images in CI. To be clear, the image names above are exact Docker image names that we use (added this explanation to the comment). I haven't tried or figured out manually using nightly images, although I imagine I could just use swiftly to set up the machine with the specific nightly toolchain and there should be little problems. It will make the different benchmarks diverge a bit though in terms of environment / setup and all. |
Another visible issue is the Not sure where that comes from. Any ideas? |
I would guess that even on 5.10, the |
My guess is that difference comes from the updated glibc in |
I don't know. In my Android CI, the latest Swift release builds the same code about 70-80% faster than the development snapshot toolchains. But you'd be looking for regressions in relative build time, so those branch differences shouldn't matter.
I'd simply build with the snapshots of the next release, eg 6.1 once that's branched, and look for regressions in build time with the almost-daily snapshots.
The CI tags snapshots and provides official builds a couple times a week: I'd set it up to run whenever one of those drops.
I don't use Docker, so don't know anything about its identifiers, but I presume the 6.0 snapshot tag dates I listed should identify the right images.
Not yet. The fix was added to trunk a couple weeks ago, so you could try the Sep. 4 or latest 6.1 snapshot build with it and compare to the Aug. 29 build without it. You may also want to talk to @ktoso and the SSWG about what kind of toolchain benchmarking exists to catch these issues on linux and what needs to be done to either start or augment it. |
If you can log into the build machine and watch memory usage with |
@tbkka I can ssh into the containers since RunsOn provides such a feature, but as you already noticed, I'm not well-versed in compiler/build-system workings. I can use top but not sure what or how to derive conclusions about where the problem is. It does appear @gwynne was on point though, about linker issues. |
@finagolfin I imagine SSWG is already aware of these long standing problems and I expect they have already communicated their concerns in the past few years, just probably haven't managed to get it up in the priority list of the Swift team. I've seen some discussions in the public places. Even if there were no regressions, Swift's build system looks to me - purely from a user's perspective with no knowledge of the inner workings - pretty behind (dare I say, bad), and one of the biggest pain points of the language. We've just gotten used to our M-series devices build things fast enough before we get way too bored. Though I'm open to help in benchmarking things. I think one problem is that we need a real, big, and messy project like what most corporate projects are, so we can test things on real environments. |
@MahdiBM, I have submitted that linker fix for the next 6.0.1 patch release, but the branch managers would like some confirmation from you that this is fixed in trunk first. Specifically, you should compare the Aug. 29 trunk 6.1 snapshot build from last month to the Sep. 4 or subsequent trunk builds. |
Moving the thread about benchmarking the linker fix here, since it adds nothing to the review of that pull. I was off the internet for the last seven hours, so only seeing your messages now.
Maybe you can give some error info on that.
Ideally, you'd build against the same commit of your codebase as the test runs you measured above.
Hmm, this is building inside the linux image? A quick fix might be to remove that However, this seems entirely unrelated to the toolchain used: I'd try first to build the same working commit of your codebase that you used for the test runs you measured above. |
That's not possible for multiple reasons. Such as the fact that normally i use Swift Docker images, but here i need to install specific toolchains which means i need to use the ubuntu image.
Yes. That's how GitHub Actions works. (ubuntu jammy)
Not sure how that'd be helpful. Current commit is close enough though. Those tests above were made in a different environment (Swift Docker images, release images only) so while i trust that you know better than me about these stuff, i don't understand how you're going to be able to properly compare the numbers considering i had to make some adjustments. |
I figure you know it builds to completion at least, without hitting all these build errors.
There are Swift Docker images for all these snapshot toolchains too, why not use those? Basically, you can't compare snapshot build timings if you keep hitting compilation errors, so I'm saying you should try to reproduce the known-good environment where you measured the runs above, but only change one ingredient, ie swapping the 6.0 Docker image for the snapshot Docker images. If these build errors are because other factors, like your Swift project, are changing too, that should fix it. If that still doesn't build, I suggest you use the 6.0 snapshot toolchain tags given, as they will be most similar to the 6.0 release, and show any build error output for those. If you can't get anything but the final release builds to compile your codebase, you're stuck simply observing the build with some process monitor or file timestamps. If linking seems to be the problem, you could get the linker command from the verbose I took a look at some linux CI build times of swift-docc between the 6.0.0 and 6.0 branches, ie with and without the linker fix, and didn't see a big difference. I don't know if that's because they have a lot of RAM, unlike your baseline config that showed the most regression. |
@finagolfin but how do i figure out the hash of the exact image that relates to the specific main snapshots? I tried a bunch to fix the nio errors with no luck: https://github.com/MahdiBM/swift-nio/tree/mmbm-no-cniodarwin-on-linux This is not normal, and not a fault of the project. This is not the first time i'm building the app in a linux environment. |
It also complained about CNIOWASI, as well as CNIOLinux.
|
Hmm, looking it up now, I guess you can't. As I said, I don't use Docker, so I was unaware of that. My suggestion is that you get the 6.0 Docker image and make sure it builds some known-stable commit of your codebase. Then, use that same docker image to download the 6.0 snapshots given above, like the one I linked yesterday, and after unpacking them in the Docker image, use them to build your code instead. That way, you have a known-good Docker environment and source commit, with the only difference being the Swift 6.0 toolchain build date. The Docker files almost never change, so only swapping out the toolchain used inside the 6.0 image should minimize the differences. |
@finagolfin Good idea, didn't think of that, but still didn't work. For the reference: CI Filename: test build
on:
pull_request: { types: [opened, reopened, synchronize] }
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
unit-tests:
strategy:
fail-fast: false
matrix:
snapshot:
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
machine:
- name: "medium" # 16gb 8cpu c7i-flex
arch: amd64
- name: "large" # 32gb 16cpu c7i-flex
arch: amd64
- name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
arch: arm64
runs-on:
labels:
- runs-on
- runner=${{ matrix.machine.name }}
- run-id=${{ github.run_id }}
timeout-minutes: 60
steps:
- name: Check out ${{ github.event.repository.name }}
uses: actions/checkout@v4
- name: Build Docker Image
run: |
docker build \
--network=host \
--memory=128g \
-f SwiftDockerfile \
-t custom-swift:1 . \
--build-arg DOWNLOAD_DIR="${{ matrix.snapshot }}" \
--build-arg TARGETARCH="${{ matrix.machine.arch }}"
- name: Prepare
run: |
docker run --name swift-container custom-swift:1 bash -c 'apt-get update -y && apt-get install -y libjemalloc-dev && git config --global --add url."https://${{ secrets.GH_PAT }}@github.com/".insteadOf "https://github.com/" && git clone https://github.com/${{ github.repository }} && cd ${{ github.event.repository.name }} && git checkout ${{ github.head_ref }} && swift package resolve --force-resolved-versions --skip-update'
docker commit swift-container prepared-container:1
- name: Build ${{ matrix.snapshot }}
run: |
docker run prepared-container:1 bash -c 'cd ${{ github.event.repository.name }} && swift build --build-tests'
Modified DockerfileFROM ubuntu:22.04 AS base
LABEL maintainer="Swift Infrastructure <swift-infrastructure@forums.swift.org>"
LABEL description="Docker Container for the Swift programming language"
RUN export DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true && apt-get -q update && \
apt-get -q install -y \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4-openssl-dev \
libedit2 \
libgcc-11-dev \
libpython3-dev \
libsqlite3-0 \
libstdc++-11-dev \
libxml2-dev \
libz3-dev \
pkg-config \
tzdata \
zip \
zlib1g-dev \
&& rm -r /var/lib/apt/lists/*
# Everything up to here should cache nicely between Swift versions, assuming dev dependencies change little
# gpg --keyid-format LONG -k FAF6989E1BC16FEA
# pub rsa4096/FAF6989E1BC16FEA 2019-11-07 [SC] [expires: 2021-11-06]
# 8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
# uid [ unknown] Swift Automatic Signing Key #3 <swift-infrastructure@swift.org>
ARG SWIFT_SIGNING_KEY=8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
ARG SWIFT_PLATFORM=ubuntu
ARG OS_MAJOR_VER=22
ARG OS_MIN_VER=04
ARG SWIFT_WEBROOT=https://download.swift.org/development
ARG DOWNLOAD_DIR
# This is a small trick to enable if/else for arm64 and amd64.
# Because of https://bugs.swift.org/browse/SR-14872 we need adjust tar options.
FROM base AS base-amd64
ARG OS_ARCH_SUFFIX=
FROM base AS base-arm64
ARG OS_ARCH_SUFFIX=-aarch64
FROM base-$TARGETARCH AS final
ARG OS_VER=$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX
ARG PLATFORM_WEBROOT="$SWIFT_WEBROOT/$SWIFT_PLATFORM$OS_MAJOR_VER$OS_MIN_VER$OS_ARCH_SUFFIX"
RUN echo "${PLATFORM_WEBROOT}/latest-build.yml"
ARG download="$DOWNLOAD_DIR-$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX.tar.gz"
RUN echo "DOWNLOAD IS THIS: ${download} ; ${DOWNLOAD_DIR}"
RUN set -e; \
# - Grab curl here so we cache better up above
export DEBIAN_FRONTEND=noninteractive \
&& apt-get -q update && apt-get -q install -y curl && rm -rf /var/lib/apt/lists/* \
# - Latest Toolchain info
&& echo $DOWNLOAD_DIR > .swift_tag \
# - Download the GPG keys, Swift toolchain, and toolchain signature, and verify.
&& export GNUPGHOME="$(mktemp -d)" \
&& curl -fsSL ${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download} -o latest_toolchain.tar.gz \
${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download}.sig -o latest_toolchain.tar.gz.sig \
&& curl -fSsL https://swift.org/keys/all-keys.asc | gpg --import - \
&& gpg --batch --verify latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
# - Unpack the toolchain, set libs permissions, and clean up.
&& tar -xzf latest_toolchain.tar.gz --directory / --strip-components=1 \
&& chmod -R o+r /usr/lib/swift \
&& rm -rf "$GNUPGHOME" latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
&& apt-get purge --auto-remove -y curl
# Print Installed Swift Version
RUN swift --version
RUN echo "[ -n \"\${TERM:-}\" -a -r /etc/motd ] && cat /etc/motd" >> /etc/bash.bashrc; \
( \
printf "################################################################\n"; \
printf "# %-60s #\n" ""; \
printf "# %-60s #\n" "Swift Nightly Docker Image"; \
printf "# %-60s #\n" "Tag: $(cat .swift_tag)"; \
printf "# %-60s #\n" ""; \
printf "################################################################\n" \
) > /etc/motd |
To be clear, by "didn't work" I mean that I'm getting exactly the same errors. |
Tried the 6.0 snapshots, they complain about usage of |
Does simply building your code with the Swift 6.0 release still work? If so, I'd try to instrument the build to figure out the bottlenecks, as I suggested before. In particular, if you're building a large executable at the end, that might be taking the most time. As I said yesterday, you could try dumping all the commands that |
Of course there is no problem on the released Swift 6 😅. This whole issue is about CI slowness. We run CI on push, PR, etc... and they've all passed. |
@finagolfin snapshot:
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
machine:
- name: "medium" # 16gb 8cpu c7i-flex
arch: amd64
- name: "large" # 32gb 16cpu c7i-flex
arch: amd64
- name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
arch: arm64 The results are only marginally different: Ignore that it says "unit-tests". It's only a |
It's not the same low-RAM scenarios though, right? You initially gave data for 8 and 16 GB RAM servers that showed regressions, whereas this is from 16, 32, and 128 GB runners, with the smallest showing the biggest regression albeit less than with the final 6.0 release. Why don't you wait for the next 6.0 snapshot toolchain with the linker patch? That should be a good comparison to the 6.0 release. |
@finagolfin sorry you're right, i just wrote my conclusions here before looking at the smallest machine. For the record, 07-19's w/ "medium" machine took 9m 46s, and for 07-21 it took 10m 29s. |
@finagolfin this is not exactly the same scenario as before, these numbers are not comparable to the older number. With a smaller machine (half the "medium" machine) 07-19 takes 17m 3s and 07-21 takes 18m 5s (not the whole job run, only the build step). In the comparisons above, 6.0 jammy was 22% slower than 5.10 noble on the same machine (look at Tests CI, build-step time w/ no cache, which uses the same command but has to resolve the packages itself, 9m 14s vs 7m 34s). So like a third of the regression? I guess there can also be other factors here as well (like how snapshots behave vs release builds), so let me know what you think. |
The medium machine is the same as the "Same" machines in "Test CI" section. |
It seems to indicate the Foundation re-core in July definitely contributed to the slowdown, but wasn't all of it. Are you able to also compare the trunk snapshot toolchains from a couple weeks ago? That might be a better benchmark.
I was talking about your original test and deployment runs from a couple days ago, the smallest from each were 8 and 16 GB. |
Just let me know what the snapshot names are.
The deployment CI uses a smaller base machine because it only builds a specific product. Not everything including tests. |
See above: "you could try the Sep. 4 or latest 6.1 snapshot build with it and compare to the Aug. 29 build without it." |
@finagolfin ah i did try those already, the error situation was less solvable. |
It was the CNIO errors. Couldn't get them fixed although i made some decent effort. |
First victim: Heroku is a pretty popular platform for SSS apps, considering it once had a free tier. This problem will be have a decent negative impact. |
Ever since the last comment, I've seen 2 more users encountering this problem on Heroku. So a total of 3, and that's only those who cared to ask about their weird build failures before reverting back. Interesting bit is that one person was consistently having problems on Swift 6 jammy, but their project would consistently build fine on Swift 6 noble, in Heroku buildpack builders. So:
I was unable to find specs of Heroku builders (also asked around a bit, no luck) to see if anything has changed between their Ubuntu 22 and 24 stacks. |
Alright, @MahdiBM, got something for you to try: a new Swift 6.0 Oct. 8 snapshot tag was just released with the linker deduplication change for Foundation. Compare it with the last 6.0 Sep. 17 snapshot tag and see how much it improves your build. Comparing it to the 6.0.1 release may not help, as that has other stuff turned off for release builds. |
@finagolfin There are some improvements in the more-constrained environments: Reminder:
Also the name is wrong, this is just a The actual build-only steps took 9:45s vs 10:28s in the smaller machine in one of the 2 tries, which indicates a 7% improvement. |
Description
Absolutely massiveSignificant build regressions on linux, worsened by swift-testingEDIT: Please also read my next comment which includes more info.
Environment
swift:5.10-noble
toswift:6.0-noble
.build
cache. No behavior difference noticed at all.// swift-tools-version:6.0
inPackage.swift
What Happened?
c7x
-family machine with 64 CPU x 128 GB RAM (8x larger than before) runs the tests as well as they were being run before, on swift 5.10.--disable-swift-testing
.significant
toabsolutely massive
.--disable-swift-testing
flag into the mix.Reproduction
Not sure.
Expected behavior
No regressions. Preferably even faster than previous Swift versions.
Environment
Mentioned above.
Additional information
No response
The text was updated successfully, but these errors were encountered: