Embedding Layers #121

jafioti · 2022-07-26T18:19:51Z

In NLP, tokens are converted from tensors of usize to tensors of f32, where each usize is an index into a "library" of token embedding vectors. I'm pretty sure the inner workings of this work by converting each usize into a one-hot vector (for instance, 2 can be converted to [0, 0, 1, 0, 0, ...]) and then multiplied by the embedding weight matrix. Since only one value in the vector is 1, only one of the vectors in the weight matrix will be "selected".

The text was updated successfully, but these errors were encountered:

coreylowman · 2022-07-26T19:20:21Z

I was thinking this could be an Tensor2D<M, N>, where indices can be between 0 and M, and then the usize is just used to index into that tensor to grab the m'th row (resulting in a 1D). could add a gather operation for this I think

jafioti · 2022-07-26T20:46:46Z

How would the gather operation work? I was thinking turning indexes into one hots and doing matmuls is too inefficient, all but one of the embeds would be multiplied by 0, so I was going to just sequentially grab the vectors. Not sure how to move them to the output tensor though and keep the tape intact.

coreylowman · 2022-07-29T22:03:46Z

It'd be similar to the gather_last_dim() function but I guess instead it would be gathering from the first dimension?

let embeddings: Tensor2D<10, 3> = Tensor2D::zeros();
let batch_embeddings: Tensor2D<2, 3> = gather(embeddings, &[4, 6], tape);

?

coreylowman · 2022-08-14T20:07:48Z

This should now be possible via the .select() method of Select1 trait. Should be something close to:

struct Embeddings<const N: usize, const D: usize> {
    data: Tensor2D<N, D>,
}

impl<const N: usize, const D: usize, const Z: usize> Module<[usize; Z]> for Embeddings<N, D> {
    type Output = Tensor2D<Z, D, OwnedTape>;
    fn forward(&self, input: [usize; Z]) -> Self::Output {
        self.data.trace().select(&input)
    }
}

Will need to figure out where the tape comes from (e.g. the above impl creates a tape inside, but not sure that's what it should be). May want to pass tape as input (maybe tuple of indices & tuple)

coreylowman · 2022-08-31T12:44:41Z

There should also be a PaddedEmbedding which is equivalent to using the pad_idx in pytorch

coreylowman · 2022-10-05T12:55:29Z

Posting partial progress here. One thing I need to figure out is the padding index, which has 0 embeddings and always has gradient of 0. It's easy to initialize it to 0, but unsure of how to mask out specific indices. Perhaps some masking operation?

#[derive(Clone, Default)]
pub struct Embedding<
    const NUM_EMBEDDINGS: usize,
    const EMBED_DIM: usize,
    const PADDING_IDX: usize = 0,
> {
    pub weight: Tensor2D<NUM_EMBEDDINGS, EMBED_DIM>,
}

impl<const N: usize, const D: usize, const P: usize> ResetParams for Embedding<N, D, P> {
    fn reset_params<R: rand::Rng>(&mut self, rng: &mut R) {
        self.weight.randomize(rng, &StandardNormal);
        Cpu::fill(&mut self.weight.mut_data()[P], &mut |x| *x = 0.0);
    }
}

impl<const N: usize, const D: usize, const P: usize> CanUpdateWithGradients for Embedding<N, D, P> {
    fn update<G: GradientProvider>(&mut self, grads: &mut G, unused: &mut UnusedTensors) {
        self.weight.update(grads, unused);
    }
}

impl<const N: usize, const D: usize> SaveToNpz for Embedding<N, D> {
    fn write<W: Write + Seek>(&self, pre: &str, w: &mut ZipWriter<W>) -> ZipResult<()> {
        todo!();
    }
}

impl<const N: usize, const D: usize> LoadFromNpz for Embedding<N, D> {
    fn read<R: Read + Seek>(&mut self, pre: &str, r: &mut ZipArchive<R>) -> Result<(), NpzError> {
        todo!();
    }
}

impl<const N: usize, const D: usize, const P: usize, T: Tape, I> Module<(&I, T)>
    for Embedding<N, D, P>
where
    Tensor2D<N, D, T>: Select<I, Axis<0>>,
{
    type Output = <Tensor2D<N, D, T> as Select<I, Axis<0>>>::Output;
    fn forward(&self, (inds, tape): (&I, T)) -> Self::Output {
        self.weight.duplicate().put_tape(tape).select(inds)
    }
}

jafioti · 2022-10-05T14:27:45Z

@coreylowman Can't masks just be another token? Not sure if that's what's done normally.

coreylowman · 2022-10-06T11:46:42Z

The mask tokens are mapped to PADDING_IDX, which in forward maps to an embedding that is all 0s. However we also want to stop/mask all gradients for this index, which is what the masking would be for. You can see this in pytorch's embedding implementation

davidatsurge · 2022-11-02T12:23:28Z

@coreylowman , so I was trying to build upon the last code snippet you posted, but I think the code on master has changed since then, and this is my attempt so far ( to simplify, not considering who owns the tape at all right now):

impl<const M: usize, const N: usize, const Z: usize> Module<&[usize; Z]> for Embedding<M, N>
where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
    type Output = Tensor2D<Z, N>;

    fn forward(&self, inds: &[usize; Z]) -> Tensor2D<Z, N> {
        // Rust errors with: expected reference `&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices`,  found reference `&[usize; Z]`
        self.weight.select(inds)
    }
}

And so I tried to solve the above issue, but I get another error:

// Error here reads: the const parameter `Z` is not constrained by the impl trait, self type, or predicates expressions using a const parameter must map each value to a distinct output value proving the result of expressions other than the parameter are unique is not supported.
impl<const M: usize, const N: usize, const Z: usize>
    Module<&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices> for Embedding<M, N>
where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
    type Output = Tensor2D<Z, N>;

    fn forward(
        &self,
        inds: &<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices,
    ) -> Tensor2D<Z, N> {
        self.weight.select(inds)
    }
}

Wdyt?

coreylowman · 2022-11-02T12:33:08Z

@davidatsurge You should be able to set the indices in the trait bound with SelectTo's associated type:

where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>, Indices = [usize; Z]>,

rust is probably complaining because there technically could be an implementation of SelectTo that doesn't use those indices.

If you still get the Z not constrained by impl trait..., the Select trait still exists which expects Indices & Axes https://github.com/coreylowman/dfdx/blob/main/src/tensor_ops/select.rs#L10

davidatsurge · 2022-11-02T12:36:55Z

That fixed it. Thanks!

coreylowman · 2023-02-07T15:18:45Z

This was recently merged in #406. Closing for now - can re-open with a padding specific embedding at a later time when someone asks for it

coreylowman mentioned this issue Aug 14, 2022

Broadcasting, reductions, and select rework #138

Merged

coreylowman mentioned this issue Aug 21, 2022

v0.10.0 tracking issue #159

Closed

19 tasks

coreylowman mentioned this issue Aug 31, 2022

Add batched select #177

Closed

coreylowman mentioned this issue Sep 9, 2022

Add Batched Select for devices and tensor_ops #182

Merged

coreylowman mentioned this issue Oct 30, 2022

0.11.0 release #278

Closed

47 tasks

coreylowman closed this as completed Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding Layers #121

Embedding Layers #121

jafioti commented Jul 26, 2022

coreylowman commented Jul 26, 2022

jafioti commented Jul 26, 2022

coreylowman commented Jul 29, 2022

coreylowman commented Aug 14, 2022

coreylowman commented Aug 31, 2022

coreylowman commented Oct 5, 2022

jafioti commented Oct 5, 2022

coreylowman commented Oct 6, 2022

davidatsurge commented Nov 2, 2022 •

edited

Loading

coreylowman commented Nov 2, 2022

davidatsurge commented Nov 2, 2022

coreylowman commented Feb 7, 2023

Embedding Layers #121

Embedding Layers #121

Comments

jafioti commented Jul 26, 2022

coreylowman commented Jul 26, 2022

jafioti commented Jul 26, 2022

coreylowman commented Jul 29, 2022

coreylowman commented Aug 14, 2022

coreylowman commented Aug 31, 2022

coreylowman commented Oct 5, 2022

jafioti commented Oct 5, 2022

coreylowman commented Oct 6, 2022

davidatsurge commented Nov 2, 2022 • edited Loading

coreylowman commented Nov 2, 2022

davidatsurge commented Nov 2, 2022

coreylowman commented Feb 7, 2023

davidatsurge commented Nov 2, 2022 •

edited

Loading