Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch for recent GCCcore versions to fix compability with CUDA 11 #13290

Conversation

bartoldeman
Copy link
Contributor

This adds a patch included in GCC 11.1.0 setting -misa=sm_35 by default
for compatibility with CUDA 11 which no longer supports the previous
default sm_30.

Fixes #12518

This adds a patch included in GCC 11.1.0 setting -misa=sm_35 by default
for compatibility with CUDA 11 which no longer supports the previous
default sm_30.

Fixes easybuilders#12518
@bartoldeman bartoldeman marked this pull request as draft June 29, 2021 18:08
@bartoldeman
Copy link
Contributor Author

setting to draft to check if upgrading embedded nvptx-tools does the trick already

@bartoldeman
Copy link
Contributor Author

after checking with an example
gcc -save-temps -fopenacc -foffload=nvptx-none=-save-temps pi.c
for

#include <stdio.h>

#define N 2000000000

#define vl 1024

int main(void) {

  double pi = 0.0f;
  long long i;

  #pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n",pi/N);

  return 0;

}

I found the generated ptxas asm file (pi.s) has .target sm_30 so will cause newer nvptx-tools to pass that to ptxas thereby not solving the issue. So I'll keep this change as is, as upgrading nvptx-tools is neither sufficient nor necessary to fix the reported issue.

@bartoldeman bartoldeman marked this pull request as ready for review June 29, 2021 19:11
@bartoldeman bartoldeman added this to the next release (4.4.1) milestone Jun 29, 2021
@bartoldeman
Copy link
Contributor Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@bartoldeman: Request for testing this PR well received on generoso

PR test command 'EB_PR=13290 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13290 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17653

Test results coming soon (I hope)...

- notification for comment with ID 871306870 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@verdurin
Copy link
Member

@bartoldeman my attempted fix for the problem based on your other comment (using a newer version of nvptx-tools) failed in a build on a node with CUDA enabled. I will try this one instead.

@bartoldeman
Copy link
Contributor Author

@verdurin thanks for confirming, as this is expected "as upgrading nvptx-tools is neither sufficient nor necessary to fix the reported issue.", see above.

@boegel boegel changed the title Backport "[nvptx] Set -misa=sm_35 by default" add patch for recent GCCcore versions to fix compability with CUDA 11 Jun 30, 2021
@boegel
Copy link
Member

boegel commented Jun 30, 2021

@boegelbot please test @ generoso
CORE_CNT=16

@easybuilders easybuilders deleted a comment from boegelbot Jun 30, 2021
@boegel
Copy link
Member

boegel commented Jun 30, 2021

@bartoldeman Building 3 GCCcore easyconfigs using only 4 cores is going to take forever, so I've cancelled the original test and requested a new one using 16 cores.

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=13290 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_13290 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17655

Test results coming soon (I hope)...

- notification for comment with ID 871344761 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@verdurin
Copy link
Member

verdurin commented Jun 30, 2021

Test build of GCCcore-9.3.0on CUDA-enabled node worked for me (it's not setup for test report upload).

@boegel
Copy link
Member

boegel commented Jun 30, 2021

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3302.joltik.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), Python 3.6.8
See https://gist.github.com/0830592c41013fdf51332ae9438850a9 for a full test report.

@bartoldeman
Copy link
Contributor Author

bartoldeman commented Jun 30, 2021

worked for me with GCC 10.2.
to reproduce at runtime instead of build-time you just need to load a CUDA 11 module and then
gcc -save-temps -fopenacc -foffload=nvptx-none pi.c
for the above example.

$ gcc -fopenacc -foffload=nvptx-none pi.c
<no output -- correct>

# with unmodified GCCcore:
$ gcc -fopenacc -foffload=nvptx-none pi.c
ptxas fatal   : Value 'sm_30' is not defined for option 'gpu-name'
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/9.3.0/libexec/gcc/x86_64-pc-linux-gnu/9.3.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/cvmfs/soft.computecanada.ca/gentoo/2020/usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ld.gold: fatal error: lto-wrapper failed
collect2: error: ld returned 1 exit status

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
generoso-x-2 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/cbbbe1af8ba3977af6e333ce448aec8d for a full test report.

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel merged commit 1df9aa0 into easybuilders:develop Jun 30, 2021
@bartoldeman
Copy link
Contributor Author

Test report by @bartoldeman
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
build-node.computecanada.ca - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.7.7
See https://gist.github.com/4a10ad15f00374ccfd5891190a7a0869 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCC-10.2 build fails due to CUDA 11.0
4 participants