-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify Embedding
#2084
Simplify Embedding
#2084
Conversation
(m::Embedding)(x::AbstractVector{Bool}) = m.weight * x # usually OneHotVector | ||
(m::Embedding)(x::AbstractMatrix{Bool}) = m.weight * x # usually OneHotMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could instead call Flux.onecold
. The result will differ on e.g. [true, true, false]
, not sure we care too much either way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance in the one hot case? If it's onecold
-compatible, then folks should use OneHotArray
for performance. At least with *
, we do the mathematically expected operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For OneHotArray
these should be identical, right? Result and performance.
For a one-hot BitArray, the results will agree. I would guess that onecold is faster but haven't checked.
For a generic BitArray, I'm not sure which is mathematically expected really. I think you're saying that *
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what you wrote is what I meant re: performance. I was adding that in the one-hot bit array case, we can direct people to OneHotArray
if their concern is performance.
Yeah whenever I've come across this type of operation in papers, I see it written as *
. There's an implicit assumption that x
is one-hot, so maybe onecold
could be better here if it were made to error for [true, true, false]
, etc. But I think silently choosing the first "hot" index is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Mixing two embedding vectors seems less wrong. But probably nobody ever hits this & it's just a way to decouple from OneHotArray types. I don't think we should document that boolean indexing is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we are happy with the current implementation in the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so.
I see we had a very similar discussion in #1656 (comment) BTW, I forgot... but same conclusion.
…t without 5 named variables, and show that the point of onehot is variables which aren't 1:n already. Also show result of higher-rank input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to me. Is there more you wanted to do here?
(m::Embedding)(x::AbstractVector{Bool}) = m.weight * x # usually OneHotVector | ||
(m::Embedding)(x::AbstractMatrix{Bool}) = m.weight * x # usually OneHotMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we are happy with the current implementation in the PR?
The "more" is #2088, really. Will merge when green. |
Embedding has some special code for OneHotMatrix which (1) will break with latest changes, and (2) doesn't allow higher-rank arrays the way that "index" input does:
So this PR simplifies & adds reshape.
I did this after forgetting that #1656 exists, some overlap. This PR does not attempt to fix
outputsize
. Some other changes there have already happened elsewhere.