Multiple inputs to a network #253

Dimev · 2022-10-17T08:38:08Z

Multiple inputs is a bit tricky atm, but multiple outputs is easily done with SplitInto,

Maybe AddInto can do something like that?
it would add all the values of it's inputs into one tensor.

Dimev · 2022-10-17T08:42:59Z

In code, it would look like this:

type Model = (
    AddInto<(
        Linear<2, 8>, 
        Linear<4, 8>
    )>, ReLU, 
   Linear<8, 1>
);

// ...

model.forward((in_a.trace(), in_b.trace()));

coreylowman · 2022-10-17T13:34:06Z

There is GeneralizedResidual, which takes two sub modules and adds the result together. Though this only works for two submodules at the moment.

I like the idea of having something that accepts the tuple!

Dimev · 2022-10-17T13:55:28Z

If I can figure out how to split the gradient tape I can have a go at a PR for this

coreylowman · 2022-10-17T14:00:11Z

Awesome! I would say to start out just implementing a 2-tuple and 3-tuple by hand and then see if you can turn it into a macro for n-tuples (unfortunately no variadic tuples).

You'll want to only accept 1 tensor with a tape in the input, and then do some split_tape(), put_tape() to make sure every add is captured on the tape.

Dimev · 2022-10-17T20:24:12Z

Looking at GeneralizedResidual, I should

split the input tape
duplicate x and give it the split tape
feed that into one layer
split it's tape and put the new split tape there

so in code it would expand to this

fn forward(&self, x: T) -> Self::Output {

    let (x, tape) = x.split_tape();

    // base case
    let (base, tape) = self.0.forward(x.duplicate().put_tape(tape)).split_tape();

    // recursive case
    let rec = self.1.forward(x.duplicate().put_tape(tape));

    let (base, tape) = add(rec, &base).split_tape();

    // repeat the above untill no more tuple is left
    // at the end
    base.put_tape(tape)
}

Haven't put it in the compiler yet, but seems like the macro should generate this

Dimev · 2022-10-18T08:28:05Z

Ok, thinking again

fn forward(&self, x: T) -> Self::Output {

    let (x, tape) = x.split_tape();

    // head and tail
    let (head, tail..) = self.0;

    // put tail into an add into again
    let tail = AddInto<Tail>(tail);

    // base case
    let (base, tape) = head.forward(x.duplicate().put_tape(tape)).split_tape();

    // recursive case
    let rec = tail.forward(x.duplicate().put_tape(tape));
    
    // and add together
    add(rec, &base)
}

This only needs a special impl for forward on a 1 tuple

Dimev · 2022-10-18T08:38:37Z

Also started a draft pr: AddInto #256

Dimev · 2022-10-19T12:02:31Z

Ok, just a few more trait bound complaints from rustc left
(then gotta add some tests and docs)

Dimev · 2022-10-22T15:03:35Z

How should I think about tapes?
I'm still having trouble understanding when I should split and put them

Right now my macros does this:

let (head, tails ...) = x;
let (head, tape) = head.split_tape();

let (first, rest ..) = input_networks;

let (result, others ...) = (first.forward(head.put_tape), rest.forward(tails))

let head = add(head, others ..)

head

but the add complains a lot about expecting no tape, and I'm not sure how I should properly pass it around

coreylowman · 2022-10-22T16:17:34Z

This just got a quite a bit easier now that i've just merged #268. Basically now you shouldn't have to worry about split/put, and instead can use x.with_new_tape(). that pr also makes it so add will merge the tapes from the args together.

So for example a 3 tuple i think would be:

let accum = self.0.forward(x.with_new_tape());
let accum = add(accum, self.1.forward(x.with_new_tape()));
add(accum, self.2.forward(x))

Dimev · 2022-10-22T19:56:23Z

Thanks, will update my branch to use that!

Dimev · 2022-10-24T17:40:06Z

Yep, much easier!

I think I've added everything needed, so I'll make the PR proper now

Dimev · 2022-10-31T10:08:36Z

Looks like doing type Model = (AddInto<...>, ...) because tuples don't implement Tensor

Dimev · 2022-10-31T10:26:11Z

so either implement tensor for tuples containing of tensors only or drop the requirement?

Maybe keep this as a tracking issue for AddInto, for discussion of the trace stuff too.

Dimev · 2022-10-31T10:44:32Z

Also looks like only the last element in SplitInto passes along the tape, so maybe that causes some breakage when passing it into an AddInto?
(and makes training a bit harder because you only get the tape on one of the outputs, although I guess that's intentional?)

coreylowman · 2022-10-31T19:59:04Z

Yeah tape only on one output was intentional, willing to revisit though. Do we have an example architecture to use as a reference that might use AddInto and SplitInto together so we can think about how to set them up?

Dimev · 2022-10-31T20:20:51Z

I guess the splitinto/addinto can be used as a more powerful GeneralizedResidual, in case you want 3 or more different processing steps before adding them back together.

My own architecture I wanted to use AddInto for was this:

type Model = (
    AddInto<(
        Linear<5, 16>, // input
        Linear<16, 16>, // state
    )>,
    Relu,
    SplitInto<(
        Linear<16, 1>, // output
        Linear<16, 16>, // internal state
    )>
);

Effectively a recurrent network if state is fed back into the network.
Doesn't work because tuples don't implement Tensor, and because output doesn't have a tape.

coreylowman · 2022-10-31T20:40:49Z

Oh this is because the sequential tuple modules require input to be a tensor? I think we can just straight up remove those actually (according to local testing). Nothing in the impls requires a Tensor so pretty sure this is safe! That makes your example work too

Dimev · 2022-10-31T21:08:49Z

Ah nice

Dimev · 2022-11-07T15:11:48Z

#297 Allows doing full networks, but looks like this still breaks:
panics on this code on line 77 due to unwrapping on a None

type Model = (
    AddInto<(
        // phoneme a
        Linear<1, HIDDEN_SIZE>,
        
        // phoneme b         
        Linear<1, HIDDEN_SIZE>, 
        
        // noise
        Linear<1, HIDDEN_SIZE>, 
        
        // state
        Linear<STATE_SIZE, HIDDEN_SIZE>
    )>, 
    ReLU,
    SplitInto<(
        // state
        Linear<HIDDEN_SIZE, STATE_SIZE>,
        
        // next
        Linear<HIDDEN_SIZE, 1>,        

        // sample
        Linear<HIDDEN_SIZE, 1>
    )>
);

fn main() {
    // make rng
    let mut rng = StdRng::seed_from_u64(0);

    // make model
    let mut model = Model::default();
    
    // data TODO
    let x: Tensor1D<2> = Tensor1D::randn(&mut rng);
    let y: Tensor1D<8> = Tensor1D::randn(&mut rng);

    // gradient descent
    let mut sgd = Sgd::new(SgdConfig {
        lr: 0.01,
        momentum: Some(Momentum::Nesterov(0.9)),
        weight_decay: None,
    });

    // other idea:
    // generate a voice line
    // split on phonemes
    // train on one phoneme and that way you can still do batching

    // train
    for _ in 0..10 {
        // internal state
        let mut state = Tensor1D::<STATE_SIZE>::zeros().traced();

        for _ in 0..5 {

            // input
            let phoneme_a = Tensor1D::new([0.0]).traced();
            let phoneme_b = Tensor1D::new([0.0]).traced();
            let noise = Tensor1D::new([0.0]).traced();
            
            // forward
            let (new_state, next, sample) = model.forward((phoneme_a, phoneme_b, noise, state));

            // loss
            let loss = mse_loss(sample, Tensor1D::new([0.0]));

            // gradients, breaks here
            // thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/username/.cargo/git/checkouts/dfdx-318e6e5ad83eea79/3fc7be4/src/gradients.rs:273:14 
            let gradients = loss.backward();

            // update
            sgd.update(&mut model, gradients).expect("nn machine broke");
            
            // keep state
            state = new_state.traced();
        }
    }

    //println!("{:?}", model);
}

Dimev · 2022-11-11T22:24:45Z

Ok, ^^ only seems to happen if I use the splitInto

coreylowman · 2022-12-19T14:06:37Z

Closing as this was addressed in the recent update. re-open if you're still having the issue

coreylowman mentioned this issue Nov 2, 2022

Remove Tensor bound on Input from nn/impl_module_for_tuples.rs #286

Closed

coreylowman closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple inputs to a network #253

Multiple inputs to a network #253

Dimev commented Oct 17, 2022

Dimev commented Oct 17, 2022

coreylowman commented Oct 17, 2022

Dimev commented Oct 17, 2022

coreylowman commented Oct 17, 2022

Dimev commented Oct 17, 2022

Dimev commented Oct 18, 2022

Dimev commented Oct 18, 2022

Dimev commented Oct 19, 2022

Dimev commented Oct 22, 2022

coreylowman commented Oct 22, 2022 •

edited

Loading

Dimev commented Oct 22, 2022

Dimev commented Oct 24, 2022

Dimev commented Oct 31, 2022

Dimev commented Oct 31, 2022

Dimev commented Oct 31, 2022

coreylowman commented Oct 31, 2022

Dimev commented Oct 31, 2022

coreylowman commented Oct 31, 2022

Dimev commented Oct 31, 2022

Dimev commented Nov 7, 2022

Dimev commented Nov 11, 2022

coreylowman commented Dec 19, 2022

Multiple inputs to a network #253

Multiple inputs to a network #253

Comments

Dimev commented Oct 17, 2022

Dimev commented Oct 17, 2022

coreylowman commented Oct 17, 2022

Dimev commented Oct 17, 2022

coreylowman commented Oct 17, 2022

Dimev commented Oct 17, 2022

Dimev commented Oct 18, 2022

Dimev commented Oct 18, 2022

Dimev commented Oct 19, 2022

Dimev commented Oct 22, 2022

coreylowman commented Oct 22, 2022 • edited Loading

Dimev commented Oct 22, 2022

Dimev commented Oct 24, 2022

Dimev commented Oct 31, 2022

Dimev commented Oct 31, 2022

Dimev commented Oct 31, 2022

coreylowman commented Oct 31, 2022

Dimev commented Oct 31, 2022

coreylowman commented Oct 31, 2022

Dimev commented Oct 31, 2022

Dimev commented Nov 7, 2022

Dimev commented Nov 11, 2022

coreylowman commented Dec 19, 2022

coreylowman commented Oct 22, 2022 •

edited

Loading