Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU autodetect failing due to failing pip install reframe-hpc==4.3.3 #3023

Closed
casparvl opened this issue Oct 20, 2023 · 11 comments · Fixed by #3025
Closed

CPU autodetect failing due to failing pip install reframe-hpc==4.3.3 #3023

casparvl opened this issue Oct 20, 2023 · 11 comments · Fixed by #3025
Assignees
Milestone

Comments

@casparvl
Copy link

I'm (again) having some issues with CPU autodetect. Full output:

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.sh ---
#!/bin/bash
#SBATCH --job-name="rfm-detect-job"
#SBATCH --ntasks=1
#SBATCH --output=rfm-detect-job.out
#SBATCH --error=rfm-detect-job.err
#SBATCH --partition=aarch64-generic-node
#SBATCH --export=NONE

_onerror()
{
    exitcode=$?
    echo "-reframe: command \`$BASH_COMMAND' failed (exit code: $exitcode)"
    exit $exitcode
}

trap _onerror ERR

python3 -m venv venv.reframe
source venv.reframe/bin/activate
pip install reframe-hpc==4.3.3
reframe --detect-host-topology=topo.json
deactivate

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.sh ---
job finished
--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.out ---
Collecting reframe-hpc==4.3.3
  Using cached https://files.pythonhosted.org/packages/bc/cc/99e6cbb183c49edc21c3bb9afa91316797884ff8b6f0fb521fec54ef1869/ReFrame_HPC-4.3.3-py3-none-any.whl
Collecting lxml (from reframe-hpc==4.3.3)
  Using cached https://files.pythonhosted.org/packages/30/39/7305428d1c4f28282a4f5bdbef24e0f905d351f34cf351ceb131f5cddf78/lxml-4.9.3.tar.gz
    Complete output from command python setup.py egg_info:
    Building lxml version 4.9.3.
    Building without Cython.
    Error: Please make sure the libxml2 and libxslt development packages are installed.

    ----------------------------------------
-reframe: command `pip install reframe-hpc==4.3.3' failed (exit code: 1)

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.out ---
--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.err ---
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6t929r68/lxml/
You are using pip version 9.0.3, however version 23.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

--- /home/casparvl/rfm.hba5v3pz/rfm-detect-job.err ---
WARNING: failed to retrieve remote processor info: [Errno 2] No such file or directory: 'topo.json'
Traceback (most recent call last):
  File "/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/ReFrame/4.3.3/lib/python3.11/site-packages/reframe/frontend/autodetect.py", line 173, in _remot
e_detect
    topo_info = json.loads(_contents('topo.json'))
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/pilot.eessi-hpc.org/versions/2023.06/software/linux/aarch64/neoverse_n1/software/ReFrame/4.3.3/lib/python3.11/site-packages/reframe/frontend/autodetect.py", line 30, in _conten
ts
    with open(filename) as fp:
         ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'topo.json'

> device auto-detection is not supported

I'm having this only on some nodes (ARM) in our virtual cluster, probably because the libxml2 and libxslt are not in that image. However, as was pointed out to me by someone else: "you would not need libxml2 in the image if pip was up to date as lxml wheel is available for aarch64 in PyPI"

Interactively trying

python3 -m venv /tmp/reframe-venv
source /tmp/reframe-venv/bin/activate
python3 -m pip install reframe-hpc==4.3.3

indeed failed with the same error, while

python3 -m venv /tmp/reframe-venv
source /tmp/reframe-venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install reframe-hpc==4.3.3

completes just fine.

Now, I'm not sure what the right approach is here. One option would be if you injected a pip install --upgrade pip in the CPU detection script. On the other hand, I can imagine you might be reluctant to do it: it might cause other issues (though I would expect fewer). Another option is to somehow offer more customizeability to the user of what the CPU autodetection script should look like. I've addressed that topic before, although note that the suggested option of some form of prerun_cmds there wouldn't have helped in this case.

Any suggestions? Sure, you could argue "simply install those system packages", but I simply don't always have that kind of power or possibility everywhere.

@vkarak
Copy link
Contributor

vkarak commented Oct 20, 2023

Which is the default system Python version? Maybe upgrading pip in the generated script is not a bad idea.

@casparvl
Copy link
Author

casparvl commented Oct 20, 2023

[casparvl@login1 ~]$ python3 --version
Python 3.6.8
[casparvl@login1 ~]$ pip3 --version
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)

Just as a thought, since I think it might be hard to come up with something that works everywhere, all the time: you could make an optional configuration item that overwrites what is done to bootstrap the ReFrame installation in the CPU autodetection.

cpu_autodetect_reframe = [
    'python3 -m venv venv.reframe',
    'source venv.reframe/bin/activate',
    'pip install --upgrade pip',
    'pip install reframe-hpc==4.3.3'
]

The definition of that config item would be that users should list whatever commands are needed to make ReFrame available on the target node of the remote CPU autodetection. On some systems, that could even be as simple as loading a module (e.g. for us, a bootstrap is not needed: we have a ReFrame module specifically installed for the architecture of the target batch node). On others, it could be installing a virtualenv, with or without upgrading pip.

Note that you would still have a sensible default (your current, potentially with the addition of upgrading pip), so in that sense it doesn't break anything for current users.

@vkarak
Copy link
Contributor

vkarak commented Oct 20, 2023

Maybe both fixing this to work out-of-the-box + allowing to modify the detection makes sense. There is also #2292 that asks this. Allowing modifications of the reframe self-installation script makes sense.

I will try to reproduce this on a Python 3.6 system as I believe it's Python 3.6-specific problem.

@vkarak vkarak added triage and removed help wanted labels Oct 20, 2023
@vkarak
Copy link
Contributor

vkarak commented Oct 20, 2023

Actually, we do upgrade pip in ./bootstrap.sh which is known to work on all Python versions from 3.6 to 3.11. So we can do the same thing here.

CMD $python -m pip install --no-cache-dir -q --upgrade pip --target=external/

@vkarak vkarak self-assigned this Oct 20, 2023
@vkarak vkarak moved this from Todo to In Progress in ReFrame Backlog Oct 20, 2023
@vkarak
Copy link
Contributor

vkarak commented Oct 20, 2023

I couldn't reproduce it on a Centos 7 container with Python 3.6.8 and pip 9.0.3. But we could add the pip upgrade as an enhancement.

@vkarak vkarak added enhancement and removed triage labels Oct 20, 2023
@vkarak
Copy link
Contributor

vkarak commented Oct 20, 2023

Eventually, I reproduced it on an actual Centos system :-)

@casparvl
Copy link
Author

After your message I realized for us it only happens on the aarch64 nodes in our (virtual) cluster. I think it might be related to the fact that the aarch64-based wheels for xml and friends where added later, and thus maybe require newer pip to be found, but I'm not sure. Could also be that our aarch64 image just is slightly different,

@vkarak vkarak moved this from In Progress to Merge To Develop in ReFrame Backlog Oct 23, 2023
@boegel
Copy link
Contributor

boegel commented Nov 8, 2023

I agree with @casparvl here that there should be a way to instruct ReFrame how set the environment environment for running the CPU autodetect.
If that's not specified, then ReFrame could still go ahead and upgrade pip + install ReFrame via pip so it can perform the CPU autodetection, but that's a quite brittle approach imho, and should only be used as a fallback.

With the current approach, we're sort of stuck to get started with the EESSI test suite with the current version of ReFrame, since the CPU autodetect is broken (cfr. http://www.eessi.io/docs/test-suite/installation-configuration/#cpu-auto-detection).

@vkarak
Copy link
Contributor

vkarak commented Nov 8, 2023

@boegel I think that's #2979 and maybe #2292. This particular one is fixed on master and will be released in 4.4.1 asap (today or tomorrow).

Actually, the "pip path" for remote auto-detection can also be improved like we did for the ./bootstrap.sh in #3041. As we create a virtual env to pip install reframe into, we can create the venv without pip and install a fresh pip exclusively in the venv by fetching it with get-pip.py.

@vkarak
Copy link
Contributor

vkarak commented Nov 8, 2023

This way, we won't rely on any system-specific pip installation. All we need from the system is to be able to create a virtual environment without pip: python3 -m venv --without-pip venv.rfm.

@vkarak
Copy link
Contributor

vkarak commented Nov 8, 2023

we're sort of stuck to get started with the EESSI test suite with the current version of ReFrame, since the CPU autodetect is broken

If this is due to this issue, it will be solved in 4.4.1, which we will release today or tomorrow.

@vkarak vkarak closed this as completed Nov 9, 2023
@github-project-automation github-project-automation bot moved this from Merge To Develop to Done in ReFrame Backlog Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants