-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding Layers #121
Comments
I was thinking this could be an Tensor2D<M, N>, where indices can be between 0 and M, and then the |
How would the gather operation work? I was thinking turning indexes into one hots and doing matmuls is too inefficient, all but one of the embeds would be multiplied by 0, so I was going to just sequentially grab the vectors. Not sure how to move them to the output tensor though and keep the tape intact. |
It'd be similar to the let embeddings: Tensor2D<10, 3> = Tensor2D::zeros();
let batch_embeddings: Tensor2D<2, 3> = gather(embeddings, &[4, 6], tape); ? |
This should now be possible via the struct Embeddings<const N: usize, const D: usize> {
data: Tensor2D<N, D>,
}
impl<const N: usize, const D: usize, const Z: usize> Module<[usize; Z]> for Embeddings<N, D> {
type Output = Tensor2D<Z, D, OwnedTape>;
fn forward(&self, input: [usize; Z]) -> Self::Output {
self.data.trace().select(&input)
}
} Will need to figure out where the tape comes from (e.g. the above impl creates a tape inside, but not sure that's what it should be). May want to pass tape as input (maybe tuple of indices & tuple) |
There should also be a |
Posting partial progress here. One thing I need to figure out is the padding index, which has 0 embeddings and always has gradient of 0. It's easy to initialize it to 0, but unsure of how to mask out specific indices. Perhaps some masking operation? #[derive(Clone, Default)]
pub struct Embedding<
const NUM_EMBEDDINGS: usize,
const EMBED_DIM: usize,
const PADDING_IDX: usize = 0,
> {
pub weight: Tensor2D<NUM_EMBEDDINGS, EMBED_DIM>,
}
impl<const N: usize, const D: usize, const P: usize> ResetParams for Embedding<N, D, P> {
fn reset_params<R: rand::Rng>(&mut self, rng: &mut R) {
self.weight.randomize(rng, &StandardNormal);
Cpu::fill(&mut self.weight.mut_data()[P], &mut |x| *x = 0.0);
}
}
impl<const N: usize, const D: usize, const P: usize> CanUpdateWithGradients for Embedding<N, D, P> {
fn update<G: GradientProvider>(&mut self, grads: &mut G, unused: &mut UnusedTensors) {
self.weight.update(grads, unused);
}
}
impl<const N: usize, const D: usize> SaveToNpz for Embedding<N, D> {
fn write<W: Write + Seek>(&self, pre: &str, w: &mut ZipWriter<W>) -> ZipResult<()> {
todo!();
}
}
impl<const N: usize, const D: usize> LoadFromNpz for Embedding<N, D> {
fn read<R: Read + Seek>(&mut self, pre: &str, r: &mut ZipArchive<R>) -> Result<(), NpzError> {
todo!();
}
}
impl<const N: usize, const D: usize, const P: usize, T: Tape, I> Module<(&I, T)>
for Embedding<N, D, P>
where
Tensor2D<N, D, T>: Select<I, Axis<0>>,
{
type Output = <Tensor2D<N, D, T> as Select<I, Axis<0>>>::Output;
fn forward(&self, (inds, tape): (&I, T)) -> Self::Output {
self.weight.duplicate().put_tape(tape).select(inds)
}
} |
@coreylowman Can't masks just be another token? Not sure if that's what's done normally. |
The mask tokens are mapped to |
@coreylowman , so I was trying to build upon the last code snippet you posted, but I think the code on master has changed since then, and this is my attempt so far ( to simplify, not considering who owns the tape at all right now): impl<const M: usize, const N: usize, const Z: usize> Module<&[usize; Z]> for Embedding<M, N>
where
Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
type Output = Tensor2D<Z, N>;
fn forward(&self, inds: &[usize; Z]) -> Tensor2D<Z, N> {
// Rust errors with: expected reference `&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices`, found reference `&[usize; Z]`
self.weight.select(inds)
}
} And so I tried to solve the above issue, but I get another error: // Error here reads: the const parameter `Z` is not constrained by the impl trait, self type, or predicates expressions using a const parameter must map each value to a distinct output value proving the result of expressions other than the parameter are unique is not supported.
impl<const M: usize, const N: usize, const Z: usize>
Module<&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices> for Embedding<M, N>
where
Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
type Output = Tensor2D<Z, N>;
fn forward(
&self,
inds: &<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices,
) -> Tensor2D<Z, N> {
self.weight.select(inds)
}
} Wdyt? |
@davidatsurge You should be able to set the indices in the trait bound with SelectTo's associated type: where
Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>, Indices = [usize; Z]>, rust is probably complaining because there technically could be an implementation of SelectTo that doesn't use those indices. If you still get the Z not constrained by impl trait..., the Select trait still exists which expects Indices & Axes https://github.com/coreylowman/dfdx/blob/main/src/tensor_ops/select.rs#L10 |
That fixed it. Thanks! |
This was recently merged in #406. Closing for now - can re-open with a padding specific embedding at a later time when someone asks for it |
In NLP, tokens are converted from tensors of
usize
to tensors off32
, where eachusize
is an index into a "library" of token embedding vectors. I'm pretty sure the inner workings of this work by converting eachusize
into a one-hot vector (for instance, 2 can be converted to[0, 0, 1, 0, 0, ...]
) and then multiplied by the embedding weight matrix. Since only one value in the vector is 1, only one of the vectors in the weight matrix will be "selected".The text was updated successfully, but these errors were encountered: