-
Notifications
You must be signed in to change notification settings - Fork 3
/
ToDo
73 lines (54 loc) · 2.13 KB
/
ToDo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
zS on equally spaced points,
Transformation to nodal points.
gives the llvm code of the kernel
# @device_code_llvm dump_module=true KGradKernel!(F,U,p,DS,dXdxI,J,M,MRho,Glob,Phys,ndrange=ndrange)
# exit(0)
gives a profile text vor Nsight
CUDA.@profile external=true RosenbrockSchur!(U,dtau,Fcn!,FcnPrepare!,Jac!,CG,Metric,Phys,Cache,JCache,Exchange,Global,Param,DiscType)
put a kernel in a block and output for the profile
NVTX.@range "KHyperViscKernel!" let
KHyperViscKernel!(CacheF,MRho,U,DS,DW,dXdxI,J,M,Glob,ndrange=ndrange)
KernelAbstractions.synchronize(backend)
end
DKRZ
module load nvhpc
4 A100
Simulation time 10 days
BaroWave Old
110 km Resolution Timestep 150s 163s
55 km Resolution Timestep 75s 759s
Test parallel Advection with horizontal limiter
HeldSuarezMoist 110 km, 150 s, 10 days
GPU Time
1 308
2 166
4 102
LUMI
[ Info: Iteration: 209.0 took 0.158415134, 3.6284722222222223% complete
0.014235 seconds (8.78 k allocations: 177.258 KiB)
0.019856 seconds (15.16 k allocations: 415.656 KiB)
0.000303 seconds (294 allocations: 11.555 KiB)
0.024612 seconds (14.60 k allocations: 260.508 KiB)
0.005107 seconds (3.41 k allocations: 82.898 KiB)
0.019861 seconds (15.12 k allocations: 414.953 KiB)
0.022027 seconds (13.08 k allocations: 233.789 KiB)
0.007595 seconds (4.87 k allocations: 108.492 KiB) Prepare
0.019825 seconds (15.11 k allocations: 414.672 KiB) Fcn
0.024533 seconds (14.57 k allocations: 259.945 KiB) Solve
TestKernels
CUDA
julia> include("TestAMD/testKernels.jl")
0.110832 seconds (8.10 k allocations: 668.750 KiB)
0.061212 seconds (8.10 k allocations: 648.438 KiB)
0.091087 seconds (7.10 k allocations: 503.125 KiB)
0.073409 seconds (9.30 k allocations: 601.562 KiB)
AMD
WARNING: using KernelAbstractions.GPU in module Main conflicts with an existing identifier.
0.159801 seconds (106.89 k allocations: 2.409 MiB)
0.151442 seconds (101.26 k allocations: 2.296 MiB)
0.183926 seconds (120.29 k allocations: 2.497 MiB)
0.164235 seconds (109.90 k allocations: 2.370 MiB)
HOMME
vi theta-l/share/prim_advance_mod.F90
vi share/derivative_mod_base.F90
HorLimit: Change Exchange