Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip fails to build wheels for cffi (gcc not found) in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 #1423

Open
3 tasks
carlossanlop opened this issue Nov 15, 2023 · 30 comments · Fixed by dotnet/runtime#95373
Assignees
Labels
Known Build Error Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports Proposed-for-DncDevOps Issues or epics which may represent operational tasks, for consideration

Comments

@carlossanlop
Copy link
Member

carlossanlop commented Nov 15, 2023

Build

https://dev.azure.com/dnceng-public/public/_build/results?buildId=469944

Build leg reported

https://dev.azure.com/dnceng-public/public/_build/results?buildId=469944&view=logs&j=fd49b2ea-3ee3-5298-d793-bdb9fd631e7e&t=26d44400-a1b8-5ed1-d719-0cc35b38251b

Pull Request

dotnet/runtime#93906

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "Building wheel for cffi (pyproject.toml) did not run successfully.",
    "BuildRetry": false,
    "ErrorPattern": "",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

Additional information about the issue reported

Happens in (ubuntu.1804.armarch.open) using docker image mcr.microsoft.com/dotnet-buildtools/prereqs:alpine-3.15-helix-arm32v7-20230807201723-7ea784e

The repo had some recent PRs modifying the alpine images:

My PR is not making any changes that could be causing this.

This error is only visible when there's an actual test error happening in these machines, otherwise, the CI leg passes.

Log file with the error: https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-93906-merge-9e1b570d2b3441d698/System.IO.Compression.Brotli.Tests/1/console.b000a501.log?helixlogtype=result

Collecting cryptography>=39.0.1
  Downloading cryptography-41.0.5.tar.gz (630 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 630.5/630.5 kB 10.1 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [124 lines of output]
      Collecting setuptools>=61.0.0
        Downloading setuptools-68.2.2-py3-none-any.whl (807 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 807.9/807.9 kB 13.9 MB/s eta 0:00:00
      Collecting wheel
        Downloading wheel-0.41.3-py3-none-any.whl (65 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.8/65.8 kB 3.7 MB/s eta 0:00:00
      Collecting cffi>=1.12
        Downloading cffi-1.16.0.tar.gz (512 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 512.9/512.9 kB 12.1 MB/s eta 0:00:00
        Installing build dependencies: started
        Installing build dependencies: finished with status 'done'
        Getting requirements to build wheel: started
        Getting requirements to build wheel: finished with status 'done'
        Installing backend dependencies: started
        Installing backend dependencies: finished with status 'done'
        Preparing metadata (pyproject.toml): started
        Preparing metadata (pyproject.toml): finished with status 'done'
      Collecting setuptools-rust>=0.11.4
        Downloading setuptools_rust-1.8.1-py3-none-any.whl (26 kB)
      Collecting pycparser
        Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
      Collecting tomli>=1.2.1
        Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
      Collecting semantic-version<3,>=2.8.2
        Using cached semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
      Building wheels for collected packages: cffi
        Building wheel for cffi (pyproject.toml): started
        Building wheel for cffi (pyproject.toml): finished with status 'error'
        error: subprocess-exited-with-error
      
        × Building wheel for cffi (pyproject.toml) did not run successfully.
        │ exit code: 1
        ╰─> [82 lines of output]
            Package libffi was not found in the pkg-config search path.
            Perhaps you should add the directory containing `libffi.pc'
            to the PKG_CONFIG_PATH environment variable
            Package 'libffi', required by 'virtual:world', not found
            Package libffi was not found in the pkg-config search path.
            Perhaps you should add the directory containing `libffi.pc'
            to the PKG_CONFIG_PATH environment variable
            Package 'libffi', required by 'virtual:world', not found
            Package libffi was not found in the pkg-config search path.
            Perhaps you should add the directory containing `libffi.pc'
            to the PKG_CONFIG_PATH environment variable
            Package 'libffi', required by 'virtual:world', not found
            Package libffi was not found in the pkg-config search path.
            Perhaps you should add the directory containing `libffi.pc'
            to the PKG_CONFIG_PATH environment variable
            Package 'libffi', required by 'virtual:world', not found
            Package libffi was not found in the pkg-config search path.
            Perhaps you should add the directory containing `libffi.pc'
            to the PKG_CONFIG_PATH environment variable
            Package 'libffi', required by 'virtual:world', not found
      
                No working compiler found, or bogus compiler options passed to
                the compiler from Python's standard "distutils" module.  See
                the error messages above.  Likely, the problem is not related
                to CFFI but generic to the setup.py of any Python package that
                tries to compile C code.  (Hints: on OS/X 10.8, for errors about
                -mno-fused-madd see http://stackoverflow.com/questions/22313407/
                Otherwise, see https://wiki.python.org/moin/CompLangPython or
                the IRC channel #python on irc.libera.chat.)
      
                Trying to continue anyway.  If you are trying to install CFFI from
                a build done in a different context, you can ignore this warning.
      
            /tmp/pip-build-env-bcny16u4/overlay/lib/python3.9/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
            !!
      
                    ********************************************************************************
                    The license_file parameter is deprecated, use license_files instead.
      
                    This deprecation is overdue, please update your project and remove deprecated
                    calls to avoid build errors in the future.
      
                    See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
                    ********************************************************************************
      
            !!
              parsed = self.parsers.get(option_name, lambda x: x)(value)
            running bdist_wheel
            running build
            running build_py
            creating build
            creating build/lib.linux-aarch64-cpython-39
            creating build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/ffiplatform.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/vengine_cpy.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/recompiler.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/__init__.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/commontypes.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/cparser.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/_shimmed_dist_utils.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/_imp_emulation.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/lock.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/api.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/backend_ctypes.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/error.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/model.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/pkgconfig.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/cffi_opcode.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/verifier.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/setuptools_ext.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/vengine_gen.py -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/_cffi_include.h -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/parse_c_type.h -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/_embedding.h -> build/lib.linux-aarch64-cpython-39/cffi
            copying src/cffi/_cffi_errors.h -> build/lib.linux-aarch64-cpython-39/cffi
            running build_ext
            building '_cffi_backend' extension
            creating build/temp.linux-aarch64-cpython-39
            creating build/temp.linux-aarch64-cpython-39/src
            creating build/temp.linux-aarch64-cpython-39/src/c
            gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fomit-frame-pointer -g -fno-semantic-interposition -fomit-frame-pointer -g -fno-semantic-interposition -fomit-frame-pointer -g -fno-semantic-interposition -DTHREAD_STACK_SIZE=0x100000 -fPIC -DFFI_BUILDING=1 -I/usr/include/ffi -I/usr/include/libffi -I/usr/include/python3.9 -c src/c/_cffi_backend.c -o build/temp.linux-aarch64-cpython-39/src/c/_cffi_backend.o
            error: command 'gcc' failed: No such file or directory
            [end of output]
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed building wheel for cffi
      Failed to build cffi
      ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects
      
      [notice] A new release of pip available: 22.2.2 -> 23.3.1
      [notice] To update, run: pip install --upgrade pip
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=469944
Error message validated: Building wheel for cffi (pyproject.toml) did not run successfully.
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 11/15/2023 12:17:17 AM UTC

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@carlossanlop carlossanlop changed the title pip fails to build wheels and cffi in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 pip fails to build wheels and cffi (gcc not found) in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 Nov 15, 2023
@carlossanlop carlossanlop changed the title pip fails to build wheels and cffi (gcc not found) in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 pip fails to build wheels for cffi (gcc not found) in ubuntu.1804.armarch.open using image alpine-3.15-helix-arm32v7 Nov 15, 2023
@dougbu
Copy link
Member

dougbu commented Nov 15, 2023

on one hand, I'm not sure why we set $HELIX_PYTHONPATH and $PYTHONPATH or otherwise mess w/ Python inside Docker containers. @dotnet/dnceng does anyone remember why we do this in every "contained" work item❔ (MattGal's !23438 made a small change in this area but history starts w/ alexperovich's !3044.) my best guess is we decided to validate queues using Python even inside containers.

on the other hand, the recent runtime changes were back to a previously-used Alpine Docker image. seems like something changed w/ the various Python packages while they were using a different (newer) container.

bottom line, we don't have much experience w/ the floating pip version we download and install in these containers.

"fix" choices may include

  1. remove the runtime_python_requirements.txt installation from Helix work items that execute in a container. https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/main/src/alpine/3.15/helix/arm32v7/Dockerfile installs helix-scrips/ and sets up an old-style virtualenv but it's not clear customers need anything more than some Python version in the container. our testing should also work fine w/o those packages installed if we stop checking the helix-scripts/ version.
  2. explicitly control the pip version installed in our *-helix-* containers. the ^^ Dockerfile just grabs the latest version at line 29
  3. change resources/helix-scripts/runtime_python_requirements.txt to limit the cffi version to one that has a usable .whl available for this platform. might be doable but could require some experimentation

/cc @epananth and @davfost due to Docker implications /fyi @lbussell as well
/cc @missymessa since doing something here seems urgent and you're on SSA
/cc @ilyas1974 and @garath in case you instead want to handle this issue in Operations

@missymessa missymessa added the Needs Triage A new issue that needs to be associated with an epic. label Nov 15, 2023
@garath
Copy link
Member

garath commented Nov 15, 2023

since doing something here seems urgent

Affects customers, so I believe the proecss is that SSA should triage and establish impact

@missymessa missymessa added Ops - First Responder and removed Needs Triage A new issue that needs to be associated with an epic. First Responder labels Nov 16, 2023
@dougbu
Copy link
Member

dougbu commented Nov 17, 2023

set aside my general comments about why we need helix-scripts/ or to meet its Python requirements in Docker containers. removing that everywhere could break some customers. we could add an issue to the Docker epic around finding whether our customers need Python in containers and (maybe) removing it more broadly. a total removal wasn't intended to be my (1) choice.


rethinking the options above, we seem to have four choices. three involve updates to https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/main/src/alpine/3.15/helix/arm32v7/Dockerfile and the runtime team picking up a new Docker tag; the last changes https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines?path=/resources/helix-scripts/runtime_python_requirements.txt and requires a helix-machines rollout (currently planned for the 29th). updating the Dockerfile likely involves reverting @hoyosjs' recent change which may have broken the runtime team (so that the next tag doesn't cause similar issues).

  1. remove the # Install Helix Dependencies portion of this particular Dockerfile. the intersection of pipelines testing in this container w/ those using Python in their tests should be very small, if it exists at all. we don't use this container in DncEng testing and have no other requirements on our side for helix-scripts/ in any container
    • I don't know if we already have a "non Helix" Mariner 3.1.5 image that provides what runtime needs. finding such an image would avoid everything except a change to the container the runtime team uses
  2. move the # build MsQuic as we don't have Alpine packages (yet) portion of the Dockerfile above the # Install Helix Dependencies portion. this at least has a chance of working if the former portion doesn't delete as much, leaving gcc and similar OS packages around for the cffi build
  3. reduce the pip version to 20.3.4 (changing the python3 -m pip install --upgrade pip==22.2.2 line). this should automatically limit the cffi version (see below) because lower pip versions narrow what package versions are available
  4. explicitly request cffi==1.15.1 in our runtime_python_requirements.txt file. Build #20231115.10 created a Mariner image, found this version of cffi, and got a compatible wheel i.e., it didn't need to compile anything. one caveat: that was a Mariner 2 image, not 3.15 but this shouldn't be relevant here

(1) sounds fastest but is also somewhat "dirty" in that it makes a -helix- container non-Helix and could be considered a breaking change. (1) gets cleaner if there's already a satisfactory MCR image or the edits create a new image w/o -helix- in its name.

@missymessa
Copy link
Member

@carlossanlop can you tell us what the overall impact of this error is for your PR?

@carlossanlop
Copy link
Member Author

I am unsure of the impact to my team, I was hoping dnceng could help initially determine if this was something to worry about or not. But maybe there's someone from @dotnet/runtime-infrastructure that might know if this could affect runtime.

I only happened to notice it in an expected test failure in an unrelated PR when reading the console log, but it probably happens in all runs using the same queue, it just does not get reported as an error at all.

@CarnaViire
Copy link
Member

rethinking the options above, we seem to have four choices. three involve updates to https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/main/src/alpine/3.15/helix/arm32v7/Dockerfile and the runtime team picking up a new Docker tag

We unfortunately cannot update the tag at the moment, as we are blocked by an external issue (microsoft/msquic#3958) which heavily impacts our CI (dotnet/runtime#91757) -- this was the reason we pinned the tag in the first place (dotnet/runtime#94609)

cc @wfurt @ManickaP

I wonder if this is a problem only 3.15 has? Because 3.15 is EOL now. We were looking into updating to 3.16 (but it is now blocked by the issue above as well)

@wfurt
Copy link
Member

wfurt commented Nov 22, 2023

Building and installing dependencies when the container start is going to be expensive and unreliable IMHO. I was under impression that the whole prereq repo exists to have everything ready to run the tests.

If the images are meant for testing, we should perhaps add step to verify that Helix is able to run.

And I think we can temporarily disable Quic tests on Alpine arm32 @CarnaViire. Not great but we should not block everything else IMHO. (or we can investigate and fix msquic ours selves or move back to released version instead of tracking current main)

@CarnaViire
Copy link
Member

CarnaViire commented Nov 22, 2023

@wfurt yeah, I was thinking about fixing the Arm32 alpine image on a specific dotnet/msquic commit instead of main, to be able to "unlock" the image for others -- let's continue this discussion in dotnet/runtime#91757

@dougbu
Copy link
Member

dougbu commented Nov 22, 2023

Building and installing dependencies when the container start is going to be expensive and unreliable IMHO.

what's happening here is a quirk of how Helix runs commands in a Docker container. we execute sudo $HELIX_PYTHONPATH -m pip install --disable-pip-version-check -r $HELIX_SCRIPT_ROOT/runtime_python_requirements.txt prior to running whatever command(s) the Helix job specified. similar commands that run on startup of a Helix VM don't seem to be hitting similar issues.

as I hinted above, I'm not quite sure of the overall reasoning except that we run pylint on our helix-scripts/ code even in Docker containers.

let's continue this discussion in dotnet/runtime#91757

please let us know whether we need to help anything once you have a Docker tag containing the "right" msquic. I suspect that'll mostly enable a change to the Dockerfile allowing the problematic command to succeed (see my suggestions a few comments up)

@dougbu
Copy link
Member

dougbu commented Nov 28, 2023

@carlossanlop, @CarnaViire, @wfurt I see runtime builds continue to hit this issue but I'm not sure why it remains in this repo. could we please move it to dotnet/runtime since we aren't helping❔

@CarnaViire
Copy link
Member

@dougbu we didn't update runtime yet to consume the rolling docker image (that's why it still shows up), but I will do that shortly.

But I didn't do any changes in the docker image beside fixing the msquic version. Would just a rolling image be enough, or are any additional changes needed?

@dougbu
Copy link
Member

dougbu commented Nov 28, 2023

are any additional changes needed?

I don't know what'll happen next. it's possible the msquic problem will block observation of the cffi issue.

if not, solution would be additional Dockerfile changes I described above in a fair amount of detail.

@CarnaViire
Copy link
Member

it's possible the msquic problem will block observation of the cffi issue

But there will not (should not :D) be a problem with msquic anymore... it is mitigated within the dockerfile in dotnet/dotnet-buildtools-prereqs-docker#933.
@wfurt can you please help with the additional changes?

@wfurt
Copy link
Member

wfurt commented Dec 1, 2023

this is more complicated and beyond Alpine IMHO. @lbussell disabled builds for Debian 11 on ARM32 as it was failing in similar way. I spent last few days fiddling with ARM32 builds and here is what I have found.

There are no binary wheels for ARM32. Forcing new pip will not bring the expected cryptography or other packages. There are binaries available from 3rd party providers but that would mean we would depend on untrusted sources. And I personally feel that is not desirable.

So building the dependencies from sources is only or option IMHO. That can happen either during image creation (like the Debian 11 in pre-req repo or when Helix queue us set up) or during test execution (like this particular case)

I would strongly suggest to focus on the first one as doing it on every run is going to be expensive and unreliable IMHO. I know current behavior was put in with the idea that the Helix can fix itself and make it working but the practical outcome does not seems to be the case. It would also require us to put build tools to test images - making them bigger and we would pay the cost over and over again.

Now, the problem with Debian is that the new Cryptography requires newer Rust. And here the story repeats.
We can fetch binaries from 3rd parties or use rustup.rs script. Both options having problem with trust and authenticity IMHO. We could build rust from sources like we did clang & gcc for Centos 6 but that is going to be high maintenance cost.

Me recommendation would be to relax the package version requirement and perhaps disable the runtime attempts to install what it wants. I feel it would be better to fail clearly and prompt action instead of consuming resources and drag unnecessary baggage. And all this is difficult to test e.g. we see vases like this when something start suddenly failing without obvious reason.

I know there was argument about security for the update. But as far as I can tell the cryptography 3.2.1 from Debian does not have any known flow and #929 was working fine AFAIK. While we may use newer packages on mainstream images having viable option on Alpine & Debian is also important. The security patches some from OS vendor - just like just anything else in the OS image.

So I think there is action for Helix/infra team @dougbu. The MsQuic regression is out of picture now as @CarnaViire mentioned. But we still need to figure out package and dependencies for ARM32 - unless we can convince PMs and leadership to drop support for it.

Long-term we should probably also improve test coverage for Helix. .NET supported OS matrix should be guiding document and we need some solution for odd-balls - like Centos 6 and Ubuntu 16 was until recently.

I'm happy to share more details and/or share my dev environment - Machine from Helix dev pool and my personal Raspberry.

@dougbu
Copy link
Member

dougbu commented Dec 2, 2023

@wfurt I'm on vacation but am wondering about options here. We can't generally lower the requirements on our side but we may be able to special-case the problematic Docker containers. What specific change are you requesting❔

@dougbu
Copy link
Member

dougbu commented Dec 11, 2023

Backing up a bit… Is the issue here about cffi, cryptography, or both Python packages❔ I'm a bit confused by your most recent message @wfurt.

@dougbu
Copy link
Member

dougbu commented Dec 11, 2023

I suspect the solution will be to special case an upper bound for the cryptography and / or cffi package using a platform_machine dependency specifier in our Python requirements. However I'm not positive how Python identifies the problematic ARM32 machines.

@wfurt (or anyone w/ access to an ARM32 Alpine 3.15 or Debian 11 box) could you please run the following short Python script on your test machines and let me know the results (I'm hoping the OSes are consistent)❔ I suspect armv6l is a potential value but that's just a suspicion.

import platform
print (platform.machine())

I could likely create PRs somewhere to test but it should be faster if someone already has a box or VM available.

@CarnaViire
Copy link
Member

@dougbu JFYI: @wfurt is OOF today and tomorrow (and I don't have anything like this)

@wfurt
Copy link
Member

wfurt commented Dec 13, 2023

I get armv7l on my Raspberry PI. For the Jetson machines I get aarch64. The CPU is arm64 but we run 32bit containers on top of it e.g. it cannot run and link 64bit code e.g. the 'machine` may be misleading.

Support for armv6l is community driven AFAIK. We have one pipeline AFAIK but that should not show up during "normal" runs.

@dougbu
Copy link
Member

dougbu commented Dec 13, 2023

I think it'll be pretty easy to special case armv61 and armv71, but…

For the Jetson machines I get aarch64.

that's not good b/c aarch64 wheels do exist and we should use them for safety wherever possible.

however, sys.platform, os.name, platform.release(), and platform.version() all have corresponding dependency specifiers. could you check which of those should clearly indicate the Docker images that armv61 and armv71 won't help with please❔

@wfurt
Copy link
Member

wfurt commented Dec 13, 2023

I'm not sure what the comment means @dougbu.
Can you use architecture together with the machine?

This is for 32bit docker image

>>> print (platform.machine())
aarch64
>>> print(platform.architecture())
('32bit', '')

base machine shows 64 as expected

>>> print (platform.machine())
aarch64
>>> print(platform.architecture())
('64bit', '')

@dougbu
Copy link
Member

dougbu commented Dec 13, 2023

@wfurt I'm looking for a way to specify a cffi and / or cryptography requirement specifically for the problematic platforms. something like cryptography>=39.0.1,<=41.0.0;platform_machine=="armv61 " or platform_machine=="armv71" or .... the https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers table shows all of the environment variables available and the corresponding Python calls e.g. import sys; print(sys.platform) for the sys_platform environment variable.

@dougbu dougbu added Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports and removed Ops - First Responder labels Dec 14, 2023
@dougbu
Copy link
Member

dougbu commented Dec 14, 2023

… new Cryptography requires newer Rust. And here the story repeats.

After checking old available wheels, I suspect this is the crux of the problem. There are no ARM32 wheels listed there, likely meaning the Docker container has always been building cryptography and perhaps other Python packages from source. cryptography's changelog indicates they've bumped their Rust requirements fairly frequently.

With that in mind, we likely need to downgrade cryptography (and more) to the point where it's buildable again. What is the minimum Rust version used in these containers❔ Better, what is minimum of the the maximum Rust versions available for installation in the problematic containers❔


There are alternatives but none of them are quick:

  • We could remove helix-scripts/ from our Docker testing and the commands executed when starting the containers in Helix. The problem here is we don't know for sure that every test scenario (e.g., dump file uploads) work w/o helix-scripts/ in the containers and it's difficult to test exhaustively.
  • We could explicitly remove support of ARM32 testing. The main thing here would be getting buy-in from our PM team.

Note that resolving the open questions about identifying the problematic systems in a way we can use in our requirements file and choosing the cryptography version to use there will not resolve anything this year. Changes will require a helix-machines roll out and we're not staffed to do that until next year.

@dougbu
Copy link
Member

dougbu commented Dec 14, 2023

/cc @ilyas1974 and @markwilkie ^^ for awareness of the state here 😦

@wfurt
Copy link
Member

wfurt commented Dec 14, 2023

I check the requirement semantics @dougbu and it is going to be difficult IMHO. The versions report kernel version not the actual user mode setup. In case of containers the kernel version comes from the base OS (like Ubuntu 18/20) and I did not figure out how to get any reasonable info about the actual container.

As far as the cryptography: It seems like prior to Cryptography 3.5 rust was optional and the container builds were configured as such. New documentation suggest that this is no longer the case e.g. building cryptography will alway need rust as well. Or we would need to depend on binaries provided and maintained by the distribution owners.

We ditched Alpine 3.15, 3.16 provides

Rust 1.60
py3-cryptography-3.4.8

And Debian 11

Rust 1.48
python3-cryptography  3.3.2-1

@dougbu
Copy link
Member

dougbu commented Dec 15, 2023

I did not figure out how to get any reasonable info about the actual container.

Unfortunate none of the Python "environment variable" options provide real information about the container.

I see https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/7d0f9b0c308f54dc80bf275b69dc41d1b1856109/src/debian/11/helix/arm32v7/Dockerfile#L4C5-L4C26 sets $_PYTHON_HOST_PLATFORM. Does that change any of the supported settings❔

It seems like prior to Cryptography 3.5 rust was optional

Right. We might however be able to muddle through w/ Rust 1.48 in the relevant containers (while installing a compatible cryptography) and other workarounds. That would not leave Rust in the containers (unless it's already there) if we took the same approach as the Alpine 3.1.6 Dockerfile currently does for e.g. cargo.


Must admit however that we're talking about layering hacks on hacks. I prefer the alternatives I mentioned, even if they take longer to implement or get buy-in for.

@dougbu
Copy link
Member

dougbu commented Jan 6, 2024

didn't forget this issue during the holiday break. current status is I'm waiting for DDFun to set up a Raspberry Pi system for me to test things out on. that at least will allow me to experiment w/ an ARM32 system and get things working again, hopefully with near-current cffi and cryptography packages.

on the other hand, the "real" fixes here are for just Alpine 3.17 and Debian 11 on ARM32, correct❔ want to make sure I'm not missing another platform combo you're having problems with what looks like this issue.

@wfurt
Copy link
Member

wfurt commented Jan 8, 2024

it is rolling battle - but I think Alpine & Debian 11 are the tail at this moment @dougbu

@dougbu
Copy link
Member

dougbu commented Feb 28, 2024

/fyi this remains very much on our radar, especially because we provide Raspbian queues likely affected in a similar fashion. any summary of how (or if) people get around these issues will help w/ VMs as well as Docker containers.

our resource constraints and urgent requirements elsewhere mean I'm not busy investigating at the moment

@ilyas1974 ilyas1974 added Ops - P1 Operations task, priority 1 (highest priority) and removed Ops - P1 Operations task, priority 1 (highest priority) labels Jul 24, 2024
@th0ma7
Copy link

th0ma7 commented Aug 28, 2024

I just found out about this issue which is surprisingly similar to the one I'm hitting in a completely different scenario: I'm maintaining python packages for the SynoCommunity using Synology Linux DSM for multiple archs (armv5-7-8, i686, ppc, x86_64).

Somehow, can't tell when exactly, cryptography stopped building correctly due to ?newer? cffi. This comment from @dougbu #1423 (comment) caught my attention. I am experimenting a ugly hack SynoCommunity/spksrc@0bb1f00 that make use of that part of my PR SynoCommunity/spksrc#6200 when generating my crossenv :

  1. severely downgrade pip|setuptools|wheel
  2. install cffi + cryptography (+ older msgpack as failing on arm later-on when installing maturin and friends)
  3. re-upgrade pip and friends
  4. pursue with the installation of my other perequisites into my virtualenv

Totally not elegant but looks like working?! Further, this problem seems specific to python 3.10 as building ok with 3.11. Just saying that I'll keep on eye on this thread, and let you know if I find a reasonable fix or workaround. Cheers!

@ilyas1974 ilyas1974 added the Proposed-for-DncDevOps Issues or epics which may represent operational tasks, for consideration label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Known Build Error Ops - Service Maintenance Used to track issues related to maintaining the services .NET Eng Supports Proposed-for-DncDevOps Issues or epics which may represent operational tasks, for consideration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants