-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes writing of 3d-reduced Field
s to NetCDF
#2865
Conversation
Can you use a descriptive link to the PR / issue in NCDatasets (rather than just "here")? |
The key issue is Alexander-Barth/NCDatasets.jl#197 |
We're getting errors when running the tests on buildkite that I'm not getting when running on a GPU locally. For example this: Field boundary conditions [GPU, RectilinearGrid]: Test Failed at /net/ocean/home/data44/data5/glwagner/.buildkite-agent/builds/sverdrup-7/clima/oceananigans/test/test_computed_field.jl:475
--
| Expression: #= /net/ocean/home/data44/data5/glwagner/.buildkite-agent/builds/sverdrup-7/clima/oceananigans/test/test_computed_field.jl:475 =# CUDA.@allowscalar all(ST.data[1:Nx, 1:Ny, 0] .== ST.data[1:Nx, 1:Ny, 1]) The above line works for me on GPU when doing @glwagner @simone-silvestri any ideas as to why? The fact that I can't reproduce these errors locally is making it hard for me to solve them |
test/test_computed_field.jl
Outdated
@. u.data = 1 + rand() | ||
@. v.data = 2 + rand() | ||
@. w.data = 3 + rand() | ||
CUDA.@allowscalar begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to do this without @allowscalar
? We should be reducing where allowscalar appears in tests, not adding new tests with this.
test/test_computed_field.jl
Outdated
@@ -301,7 +303,7 @@ function computations_with_averaged_field_derivative(model) | |||
|
|||
set!(model, T = (x, y, z) -> 3 * z) | |||
|
|||
return all(interior(shear)[2:3, 2:3, 2:3] .== interior(T)[2:3, 2:3, 2:3]) | |||
return CUDA.@allowscalar all(interior(shear)[2:3, 2:3, 2:3] .== interior(T)[2:3, 2:3, 2:3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. We should minimize usage of @allowscalar
in tests. This is one of the largest sources of technical debt in our tests and has incurred a lot of pain in the past
test/test_computed_field.jl
Outdated
@@ -320,7 +322,7 @@ function computations_with_computed_fields(model) | |||
tke = Field(tke_op) | |||
compute!(tke) | |||
|
|||
return all(interior(tke)[2:3, 2:3, 2:3] .== 9/2) | |||
return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There are a lot of new instances of When we find that we have to use |
I added these because it was the only way to make tests pass locally. However, I can't fully reproduce tests results locally anyway, like I mentioned in my previous comment, so these may well be unnecessary (since these lines might be passing on buildkite). |
test/test_computed_field.jl
Outdated
@@ -320,7 +322,7 @@ function computations_with_computed_fields(model) | |||
tke = Field(tke_op) | |||
compute!(tke) | |||
|
|||
return all(interior(tke)[2:3, 2:3, 2:3] .== 9/2) | |||
return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return CUDA.@allowscalar all(interior(tke)[2:3, 2:3, 2:3] .== 9/2) | |
return all(interior(tke, 2:3, 2:3, 2:3) .== 9/2) |
I suggested a syntax change that could help |
You're right it does help. I'll try replacing these one by one and see if that helps with the error on buildkite. Although if would be useful to figure out why I'm not getting the same errors locally. |
Awesome! |
Apparently the new syntax does help avoid
Any ideas on what might be the cause of the differences between builkite and my local server? If someone could also run one of the failing tests on a GPU locally and see if they get the same errors that buildkite is throwing, that would be helpful. |
Finally got the tests passing! It was something having to do with GPUCompiler.jl. This is ready to merge/review. |
@navidcy @simone-silvestri @glwagner with #2899 being merged, this bugfix PR is pretty trivial. Can I get a review whenever any of you have the time? |
set!(model, c=1) | ||
|
||
Δt = 1/64 # Nice floating-point number | ||
simulation = Simulation(model, Δt=Δt, stop_time=50Δt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do fewer steps? Our tests currently stretch the limits of our resources so we need to be as parsimonious as possible when adding new tests. Note also that compilation cost is the main thing. If this test can be combined with another test, that'd be ideal. For example, many different NetCDF tests could use the same simulation --- there's no need to run independent simulations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also suggest using stop_iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @glwagner I ended up merging before you commented. I copied the test template for others in the same file so there are more tests that we apply this change to. Would you like me to open another PR for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm pointing out that we don't want to copy/paste test code now without care, since a lot of our test code is poorly written / wasteful and our CI is straining under the pressure... :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any PRs that reduce test cost will be greatly appreciated! I can't tell if all the changes will make a big difference, you are better placed to analyze that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running these a few time for this PR, I'd say that this one won't make much of a difference. But I have identified several tests that instantiate its own model each that could be merged together. That, I think, will have a more significant impact. I'll open a PR about it once some of my other PRs are merged!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
This fixes the error we were getting when writing
Field
s reduced over 3 dimensions to disk withNetCDFOutputWriter
according to the upstream provided in a PR at NCDatasets: Alexander-Barth/NCDatasets.jl#197.This PR also adds a test to catch this in the future.
For now this is only working on the
master
branch ofNCDatasets
so tests should fail for now, but once a new version of NCDatasets is released I'll update the packages.Closes #2857