Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades KernelAbstractions with pinned GPUCompiler #2900

Closed
wants to merge 3 commits into from

Conversation

tomchor
Copy link
Collaborator

@tomchor tomchor commented Feb 5, 2023

Same as #2899 but pinning GPUCompiler to version 0.16.4.

This pinning made tests pass in #2865

@tomchor tomchor closed this Feb 5, 2023
@tomchor tomchor deleted the tc/ka_upgrade branch February 5, 2023 16:41
@vchuravy
Copy link
Collaborator

vchuravy commented Feb 5, 2023

@tomchor Which version of GPUCompiler are you trying to pin to? It's not listed in Project.toml and so the pin is not effective.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 5, 2023

@tomchor Which version of GPUCompiler are you trying to pin to? It's not listed in Project.toml and so the pin is not effective.

@vchuravy Maybe pinning wasn't the right word to use. I'm doing ]add GPUCompiler@0.16.4 and then ]rm GPUCompiler, which ensures that Manifest.toml retains version 0.16.4 even though GPUCompiler doesn't appear in Project.toml.

That's probably not the best-practices way to do things, but afaik it works. That was the only way I could get tests passing in #2865. The issues I was seeing there were very similar to the issues that appeared in #2782 and to some degree in #2899, so I suspect keeping GPUCompiler.jl will help make those tests pass (or least narrow down what the issues are).

The reason tests didn't pass here here is because there's an error related to the always_inline=true flag, which (based on your comment here, hasn't made it into the main channel.

@glwagner glwagner changed the title Upgrades KernelAbstractions with pined GPUCompiler Upgrades KernelAbstractions with pinned GPUCompiler Feb 5, 2023
@glwagner
Copy link
Member

glwagner commented Feb 5, 2023

best-practices way to do things, but afaik it works

it only works within the Oceananigans environment, not for users of Oceananigans

@vchuravy
Copy link
Collaborator

vchuravy commented Feb 5, 2023

How sure are you about the precise version? Is GPUCompiler@0.16.5 the breaking release for you?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 5, 2023

best-practices way to do things, but afaik it works

it only works within the Oceananigans environment, not for users of Oceananigans

True. Although I wasn't able to reproduce the failing test results in any of the several GPUs I tried (they were either Tesla V100s or Quadro GP100). Every single time I tried to run GPU tests locally, they passed. Is it possible that the tests are running on a GPU that simply isn't supported anymore by one of the packages?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 5, 2023

How sure are you about the precise version? Is GPUCompiler@0.16.5 the breaking release for you?

I haven't tried to find the precise version, but GPUCompiler@0.16.7 already creates the errors, so the breaking release is either 0.16.7 or 0.16.6.

@vchuravy
Copy link
Collaborator

vchuravy commented Feb 5, 2023

The most suspicious PR is JuliaGPU/GPUCompiler.jl#359 by yours truly,

@glwagner
Copy link
Member

glwagner commented Feb 5, 2023

True. Although I wasn't able to reproduce the failing test results in any of the several GPUs I tried (they were either Tesla V100s or Quadro GP100). Every single time I tried to run GPU tests locally, they passed. Is it possible that the tests are running on a GPU that simply isn't supported anymore by one of the packages?

Here's GPU + driver info

glwagner@sverdrup:~$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Sun Feb  5 17:58:59 2023
Driver Version                            : 465.27
CUDA Version                              : 11.3

Attached GPUs                             : 1
GPU 00000000:82:00.0
    Product Name                          : NVIDIA Quadro P6000
    Product Brand                         : Quadro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants