Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using OneHotArrays #2025

Merged
merged 6 commits into from
Aug 10, 2022
Merged

using OneHotArrays #2025

merged 6 commits into from
Aug 10, 2022

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Jul 23, 2022

This removes onehot.jl since the package is now registered, JuliaRegistries/General#64647

Tests could be removed too.

Maybe docs need though?

Closes #1544

@mcabbott
Copy link
Member Author

mcabbott commented Jul 24, 2022

GPU test failure seems to be this:

julia> using CUDA, OneHotArrays, NNlibCUDA

julia> CUDA.allowscalar(false)

julia> x = [1, 3, 2];

julia> y = onehotbatch(x, 0:3)
4×3 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
     
 1    
     1
   1  

julia> y2 = onehotbatch(x |> cu, 0:3)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/rSIl2/src/GPUArraysCore.jl:78
 [3] getindex
   @ ~/.julia/packages/GPUArrays/gok9K/src/host/indexing.jl:9 [inlined]
 [4] iterate
   @ ./abstractarray.jl:1144 [inlined]
 [5] iterate
   @ ./abstractarray.jl:1142 [inlined]
 [6] _onehotbatch(data::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, labels::NTuple{4, Int64})
   @ OneHotArrays ~/.julia/packages/OneHotArrays/Moo4n/src/onehot.jl:87
 [7] onehotbatch(::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, ::UnitRange{Int64})
   @ OneHotArrays ~/.julia/packages/OneHotArrays/Moo4n/src/onehot.jl:84
 [8] top-level scope

Because #1959 doesn't exist in OneHotArrays

@mcabbott
Copy link
Member Author

mcabbott commented Jul 24, 2022

Downstream failure for Transformers seems to be related:

Precompiling project...
[469](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:470)
  ✗ Transformers
[470](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:471)
  0 dependencies successfully precompiled in 13 seconds (102 already precompiled)
[471](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:472)
1 dependency errored. To see a full report either run `import Pkg; Pkg.precompile()` or load the package
[472](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:473)
     Testing Running tests...
[473](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:474)
WARNING: both PrimitiveOneHot and Flux export "OneHotArray"; uses of it in module Basic must be qualified
[474](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:475)
ERROR: LoadError: UndefVarError: OneHotArray not defined
[475](https://github.com/FluxML/Flux.jl/runs/7489381487?check_suite_focus=true#step:6:476)

Maybe it shouldn't export OneHotArray? Cc @chengchingwen

@chengchingwen
Copy link
Member

chengchingwen commented Jul 25, 2022

Personally, I would prefer not having OneHotArray exported from Flux. And usually you won't need that either because the most-used api for onehot encoding is onehot and onehotbatch.

btw. exporting onehot would also conflict with Enzyme.onehot.

@chengchingwen
Copy link
Member

Just out of Curiosity, do we really need a dependency on OneHotArrays.jl directly? It seems that none of code/functions in Flux explicitly need OneHotArray. It could be a complete separate package and people need that just using OneHotArrays directly, though that would be breaking change.

@mcabbott
Copy link
Member Author

Ok. I think nothing was exported before, so for now this PR shouldn't call @reexport, not sure why I put that initially.

And longer term, indeed, there's no strong reason for Flux to depend on this. Maybe Flux@0.14 can simply drop it?

@chengchingwen
Copy link
Member

And longer term, indeed, there's no strong reason for Flux to depend on this. Maybe Flux@0.14 can simply drop it?

Then do we need a deprecate warning for accessing those function from Flux in the next patch release?

@mcabbott
Copy link
Member Author

CI tells me that the Embedding layer has special methods for OneHotArrays. So it can't trivially be removed.

Possibly those methods be changed to dispatch on AbstractMatrix{<:Bool} & call generic * etc, but not this PR.

@chengchingwen
Copy link
Member

Also these:

function (m::RNNCell{F,A,V,<:AbstractMatrix{T}})(h, x::Union{AbstractVecOrMat{T},OneHotArray}) where {F,A,V,T}

I think OneHotArray should overload * and NNlib.gather so we can just call those functions.

@Saransh-cpp
Copy link
Member

Saransh-cpp commented Jul 25, 2022

OneHotArrays should also be added as a doc dependency or the API reference would not show up in Flux's docs (missing docstring is a warning, hence the test won't error out). We could also remove the page completely, given that we are not re-exporting the package, but there should be some reference to the package in Flux's docs. Maybe adding it to ecosystem.md if we decide to remove the page?

(I can take this up in another PR or the changes can be made to this PR itself!)

@mcabbott
Copy link
Member Author

Sorting out docs in another PR sounds great.

Not so clear whether it wants to be included like NNlib / MLUtils or pushed out to ecosystem.md; maybe that depends on whether Flux@0.14 is going to load it at all.

Then the goal of this PR is only to remove the code, so that we don't have two versions current -- e.g. #1959 happened after the package was created, which is confusing.

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with a rebase this should be good to go. The longer we wait, the more things will depend on the Flux-internal version (e.g. #2031).

The downstream errors appear unrelated (FastAI and Metalhead for sure). AtomicGraphNets.jl has not run CI for 2 months, but the error appears to be related to the SciML stack.

@darsnack
Copy link
Member

I just ran the AtomicGraphNets.jl test locally against the current release. They throw the same errors, so we can safely ignore those. @rkurchin you may want to look into those.

@mcabbott mcabbott merged commit 1914f38 into FluxML:master Aug 10, 2022
@mcabbott mcabbott deleted the rm_onehot branch August 10, 2022 16:20
Saransh-cpp pushed a commit to Saransh-cpp/Flux.jl that referenced this pull request Aug 11, 2022
* using OneHotArrays

* rm tests

* skip a test

* don't export, add depwarns

* back to using
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneHotArrays.jl?
4 participants