Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding Layers #121

Closed
Tracked by #278
jafioti opened this issue Jul 26, 2022 · 12 comments
Closed
Tracked by #278

Embedding Layers #121

jafioti opened this issue Jul 26, 2022 · 12 comments

Comments

@jafioti
Copy link
Contributor

jafioti commented Jul 26, 2022

In NLP, tokens are converted from tensors of usize to tensors of f32, where each usize is an index into a "library" of token embedding vectors. I'm pretty sure the inner workings of this work by converting each usize into a one-hot vector (for instance, 2 can be converted to [0, 0, 1, 0, 0, ...]) and then multiplied by the embedding weight matrix. Since only one value in the vector is 1, only one of the vectors in the weight matrix will be "selected".

@coreylowman
Copy link
Owner

I was thinking this could be an Tensor2D<M, N>, where indices can be between 0 and M, and then the usize is just used to index into that tensor to grab the m'th row (resulting in a 1D). could add a gather operation for this I think

@jafioti
Copy link
Contributor Author

jafioti commented Jul 26, 2022

How would the gather operation work? I was thinking turning indexes into one hots and doing matmuls is too inefficient, all but one of the embeds would be multiplied by 0, so I was going to just sequentially grab the vectors. Not sure how to move them to the output tensor though and keep the tape intact.

@coreylowman
Copy link
Owner

It'd be similar to the gather_last_dim() function but I guess instead it would be gathering from the first dimension?

let embeddings: Tensor2D<10, 3> = Tensor2D::zeros();
let batch_embeddings: Tensor2D<2, 3> = gather(embeddings, &[4, 6], tape);

?

@coreylowman
Copy link
Owner

This should now be possible via the .select() method of Select1 trait. Should be something close to:

struct Embeddings<const N: usize, const D: usize> {
    data: Tensor2D<N, D>,
}

impl<const N: usize, const D: usize, const Z: usize> Module<[usize; Z]> for Embeddings<N, D> {
    type Output = Tensor2D<Z, D, OwnedTape>;
    fn forward(&self, input: [usize; Z]) -> Self::Output {
        self.data.trace().select(&input)
    }
}

Will need to figure out where the tape comes from (e.g. the above impl creates a tape inside, but not sure that's what it should be). May want to pass tape as input (maybe tuple of indices & tuple)

@coreylowman coreylowman mentioned this issue Aug 21, 2022
19 tasks
@coreylowman
Copy link
Owner

There should also be a PaddedEmbedding which is equivalent to using the pad_idx in pytorch

@coreylowman
Copy link
Owner

Posting partial progress here. One thing I need to figure out is the padding index, which has 0 embeddings and always has gradient of 0. It's easy to initialize it to 0, but unsure of how to mask out specific indices. Perhaps some masking operation?

#[derive(Clone, Default)]
pub struct Embedding<
    const NUM_EMBEDDINGS: usize,
    const EMBED_DIM: usize,
    const PADDING_IDX: usize = 0,
> {
    pub weight: Tensor2D<NUM_EMBEDDINGS, EMBED_DIM>,
}

impl<const N: usize, const D: usize, const P: usize> ResetParams for Embedding<N, D, P> {
    fn reset_params<R: rand::Rng>(&mut self, rng: &mut R) {
        self.weight.randomize(rng, &StandardNormal);
        Cpu::fill(&mut self.weight.mut_data()[P], &mut |x| *x = 0.0);
    }
}

impl<const N: usize, const D: usize, const P: usize> CanUpdateWithGradients for Embedding<N, D, P> {
    fn update<G: GradientProvider>(&mut self, grads: &mut G, unused: &mut UnusedTensors) {
        self.weight.update(grads, unused);
    }
}

impl<const N: usize, const D: usize> SaveToNpz for Embedding<N, D> {
    fn write<W: Write + Seek>(&self, pre: &str, w: &mut ZipWriter<W>) -> ZipResult<()> {
        todo!();
    }
}

impl<const N: usize, const D: usize> LoadFromNpz for Embedding<N, D> {
    fn read<R: Read + Seek>(&mut self, pre: &str, r: &mut ZipArchive<R>) -> Result<(), NpzError> {
        todo!();
    }
}

impl<const N: usize, const D: usize, const P: usize, T: Tape, I> Module<(&I, T)>
    for Embedding<N, D, P>
where
    Tensor2D<N, D, T>: Select<I, Axis<0>>,
{
    type Output = <Tensor2D<N, D, T> as Select<I, Axis<0>>>::Output;
    fn forward(&self, (inds, tape): (&I, T)) -> Self::Output {
        self.weight.duplicate().put_tape(tape).select(inds)
    }
}

@jafioti
Copy link
Contributor Author

jafioti commented Oct 5, 2022

@coreylowman Can't masks just be another token? Not sure if that's what's done normally.

@coreylowman
Copy link
Owner

The mask tokens are mapped to PADDING_IDX, which in forward maps to an embedding that is all 0s. However we also want to stop/mask all gradients for this index, which is what the masking would be for. You can see this in pytorch's embedding implementation

@coreylowman coreylowman mentioned this issue Oct 30, 2022
47 tasks
@davidatsurge
Copy link

davidatsurge commented Nov 2, 2022

@coreylowman , so I was trying to build upon the last code snippet you posted, but I think the code on master has changed since then, and this is my attempt so far ( to simplify, not considering who owns the tape at all right now):

impl<const M: usize, const N: usize, const Z: usize> Module<&[usize; Z]> for Embedding<M, N>
where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
    type Output = Tensor2D<Z, N>;

    fn forward(&self, inds: &[usize; Z]) -> Tensor2D<Z, N> {
        // Rust errors with: expected reference `&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices`,  found reference `&[usize; Z]`
        self.weight.select(inds)
    }
}

And so I tried to solve the above issue, but I get another error:

// Error here reads: the const parameter `Z` is not constrained by the impl trait, self type, or predicates expressions using a const parameter must map each value to a distinct output value proving the result of expressions other than the parameter are unique is not supported.
impl<const M: usize, const N: usize, const Z: usize>
    Module<&<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices> for Embedding<M, N>
where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>>,
{
    type Output = Tensor2D<Z, N>;

    fn forward(
        &self,
        inds: &<Tensor2D<M, N> as SelectTo<Tensor2D<Z, N>, Axis<0>>>::Indices,
    ) -> Tensor2D<Z, N> {
        self.weight.select(inds)
    }
}

Wdyt?

@coreylowman
Copy link
Owner

@davidatsurge You should be able to set the indices in the trait bound with SelectTo's associated type:

where
    Tensor2D<M, N>: SelectTo<Tensor2D<Z, N>, Axis<0>, Indices = [usize; Z]>,

rust is probably complaining because there technically could be an implementation of SelectTo that doesn't use those indices.

If you still get the Z not constrained by impl trait..., the Select trait still exists which expects Indices & Axes https://github.com/coreylowman/dfdx/blob/main/src/tensor_ops/select.rs#L10

@davidatsurge
Copy link

That fixed it. Thanks!

@coreylowman
Copy link
Owner

This was recently merged in #406. Closing for now - can re-open with a padding specific embedding at a later time when someone asks for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants