Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert(Matrix, x') is very slow #412

Open
gdkrmr opened this issue Jul 9, 2024 · 1 comment
Open

convert(Matrix, x') is very slow #412

gdkrmr opened this issue Jul 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@gdkrmr
Copy link
Contributor

gdkrmr commented Jul 9, 2024

convert(Matrix, x') is so slow that I haven't gotten it to finish. convert(Matrix, x) works just fine.

julia> using YAXArrays, Zarr

julia> path = "~/data/DataCube/v3.0.2/esdc-8d-0.25deg-1x720x1440-3.0.2.zarr"
"~/data/DataCube/v3.0.2/esdc-8d-0.25deg-1x720x1440-3.0.2.zarr"

julia> ds = open_dataset(path)
YAXArray Dataset
Shared Axes: 
 lon Sampled{Float64} -179.875:0.25:179.875 ForwardOrdered Regular Points,
 lat Sampled{Float64} -89.875:0.25:89.875 ForwardOrdered Regular Points,
↗ Ti  Sampled{DateTime} [1979-01-05T00:00:00, , 2021-12-31T00:00:00] ForwardOrdered Irregular Points
Variables: 
....


julia> mat = ds.air_temperature_2m[Ti = 1000]
╭──────────────────────────────╮
│ 1440×720 YAXArray{Float32,2} │
├──────────────────────────────┴──────────────────────────────────────── dims ┐
   lon Sampled{Float64} -179.875:0.25:179.875 ForwardOrdered Regular Points,
   lat Sampled{Float64} -89.875:0.25:89.875 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────
....

julia> @time convert(Matrix{Float32}, mat)
  1.809297 seconds (5.15 M allocations: 362.756 MiB, 5.92% gc time, 98.13% compilation time)
...

julia> @time convert(Matrix{Float32}, mat')

# has to be interrupted
@lazarusA lazarusA added the bug Something isn't working label Sep 21, 2024
@felixcremer
Copy link
Member

This is a DiskArrays LinearAlgebra interaction problem. mat' calls adjoint and this accesses the data with separate getindex calls for every value.
You can see that by setting up an AccessCountDiskArray see below.
If you don't need to have the adjoint you can use permutedims instead for switching the axes order. This should have special cases which would not access all data one by one.

julia> using DiskArrays
julia> mat = TestTypes.AccessCountDiskArray(rand(4,5));

julia> convert(Matrix, mat)
4×5 Matrix{Float64}:
 0.9342     0.00562529  0.554012  0.0577668  0.591995
 0.0644567  0.760773    0.364234  0.777978   0.103616
 0.268764   0.138414    0.296148  0.501075   0.177412
 0.993255   0.137569    0.111455  0.492843   0.216849

julia> mat.getindex_log
1-element Vector{Any}:
 (1:4, 1:5)

julia> mat = TestTypes.AccessCountDiskArray(rand(4,5));

julia> convert(Matrix, mat')
5×4 Matrix{Float64}:
 0.0455661  0.137063    0.0339512  0.146162
 0.567136   0.00509385  0.614943   0.817899
 0.569512   0.655161    0.185285   0.379507
 0.33343    0.59349     0.930645   0.999305
 0.906612   0.619647    0.171684   0.75579

julia> mat.getindex_log
20-element Vector{Any}:
 (1:1, 1:1)
 (2:2, 1:1)
 (3:3, 1:1)
 (4:4, 1:1)
 (1:1, 2:2)
 (2:2, 2:2)
 (3:3, 2:2)
 
 (2:2, 4:4)
 (3:3, 4:4)
 (4:4, 4:4)
 (1:1, 5:5)
 (2:2, 5:5)
 (3:3, 5:5)
 (4:4, 5:5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants