Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple inputs to a network #253

Closed
Dimev opened this issue Oct 17, 2022 · 22 comments
Closed

Multiple inputs to a network #253

Dimev opened this issue Oct 17, 2022 · 22 comments

Comments

@Dimev
Copy link
Contributor

Dimev commented Oct 17, 2022

Multiple inputs is a bit tricky atm, but multiple outputs is easily done with SplitInto,

Maybe AddInto can do something like that?
it would add all the values of it's inputs into one tensor.

@Dimev
Copy link
Contributor Author

Dimev commented Oct 17, 2022

In code, it would look like this:

type Model = (
    AddInto<(
        Linear<2, 8>, 
        Linear<4, 8>
    )>, ReLU, 
   Linear<8, 1>
);

// ...

model.forward((in_a.trace(), in_b.trace()));

@coreylowman
Copy link
Owner

There is GeneralizedResidual, which takes two sub modules and adds the result together. Though this only works for two submodules at the moment.

I like the idea of having something that accepts the tuple!

@Dimev
Copy link
Contributor Author

Dimev commented Oct 17, 2022

If I can figure out how to split the gradient tape I can have a go at a PR for this

@coreylowman
Copy link
Owner

Awesome! I would say to start out just implementing a 2-tuple and 3-tuple by hand and then see if you can turn it into a macro for n-tuples (unfortunately no variadic tuples).

You'll want to only accept 1 tensor with a tape in the input, and then do some split_tape(), put_tape() to make sure every add is captured on the tape.

@Dimev
Copy link
Contributor Author

Dimev commented Oct 17, 2022

Looking at GeneralizedResidual, I should

  • split the input tape
  • duplicate x and give it the split tape
  • feed that into one layer
  • split it's tape and put the new split tape there

so in code it would expand to this

fn forward(&self, x: T) -> Self::Output {

    let (x, tape) = x.split_tape();

    // base case
    let (base, tape) = self.0.forward(x.duplicate().put_tape(tape)).split_tape();

    // recursive case
    let rec = self.1.forward(x.duplicate().put_tape(tape));

    let (base, tape) = add(rec, &base).split_tape();

    // repeat the above untill no more tuple is left
    // at the end
    base.put_tape(tape)
}

Haven't put it in the compiler yet, but seems like the macro should generate this

@Dimev
Copy link
Contributor Author

Dimev commented Oct 18, 2022

Ok, thinking again

fn forward(&self, x: T) -> Self::Output {

    let (x, tape) = x.split_tape();

    // head and tail
    let (head, tail..) = self.0;

    // put tail into an add into again
    let tail = AddInto<Tail>(tail);

    // base case
    let (base, tape) = head.forward(x.duplicate().put_tape(tape)).split_tape();

    // recursive case
    let rec = tail.forward(x.duplicate().put_tape(tape));
    
    // and add together
    add(rec, &base)
}

This only needs a special impl for forward on a 1 tuple

@Dimev
Copy link
Contributor Author

Dimev commented Oct 18, 2022

Also started a draft pr: AddInto #256

@Dimev
Copy link
Contributor Author

Dimev commented Oct 19, 2022

Ok, just a few more trait bound complaints from rustc left
(then gotta add some tests and docs)

@Dimev
Copy link
Contributor Author

Dimev commented Oct 22, 2022

How should I think about tapes?
I'm still having trouble understanding when I should split and put them

Right now my macros does this:

let (head, tails ...) = x;
let (head, tape) = head.split_tape();

let (first, rest ..) = input_networks;

let (result, others ...) = (first.forward(head.put_tape), rest.forward(tails))

let head = add(head, others ..)

head

but the add complains a lot about expecting no tape, and I'm not sure how I should properly pass it around

@coreylowman
Copy link
Owner

coreylowman commented Oct 22, 2022

This just got a quite a bit easier now that i've just merged #268. Basically now you shouldn't have to worry about split/put, and instead can use x.with_new_tape(). that pr also makes it so add will merge the tapes from the args together.

So for example a 3 tuple i think would be:

let accum = self.0.forward(x.with_new_tape());
let accum = add(accum, self.1.forward(x.with_new_tape()));
add(accum, self.2.forward(x))

@Dimev
Copy link
Contributor Author

Dimev commented Oct 22, 2022

Thanks, will update my branch to use that!

@Dimev
Copy link
Contributor Author

Dimev commented Oct 24, 2022

Yep, much easier!

I think I've added everything needed, so I'll make the PR proper now

@Dimev
Copy link
Contributor Author

Dimev commented Oct 31, 2022

Looks like doing type Model = (AddInto<...>, ...) because tuples don't implement Tensor

@Dimev
Copy link
Contributor Author

Dimev commented Oct 31, 2022

so either implement tensor for tuples containing of tensors only or drop the requirement?

Maybe keep this as a tracking issue for AddInto, for discussion of the trace stuff too.

@Dimev
Copy link
Contributor Author

Dimev commented Oct 31, 2022

Also looks like only the last element in SplitInto passes along the tape, so maybe that causes some breakage when passing it into an AddInto?
(and makes training a bit harder because you only get the tape on one of the outputs, although I guess that's intentional?)

@coreylowman
Copy link
Owner

Yeah tape only on one output was intentional, willing to revisit though. Do we have an example architecture to use as a reference that might use AddInto and SplitInto together so we can think about how to set them up?

@Dimev
Copy link
Contributor Author

Dimev commented Oct 31, 2022

I guess the splitinto/addinto can be used as a more powerful GeneralizedResidual, in case you want 3 or more different processing steps before adding them back together.

My own architecture I wanted to use AddInto for was this:

type Model = (
    AddInto<(
        Linear<5, 16>, // input
        Linear<16, 16>, // state
    )>,
    Relu,
    SplitInto<(
        Linear<16, 1>, // output
        Linear<16, 16>, // internal state
    )>
);

Effectively a recurrent network if state is fed back into the network.
Doesn't work because tuples don't implement Tensor, and because output doesn't have a tape.

@coreylowman
Copy link
Owner

Oh this is because the sequential tuple modules require input to be a tensor? I think we can just straight up remove those actually (according to local testing). Nothing in the impls requires a Tensor so pretty sure this is safe! That makes your example work too

@Dimev
Copy link
Contributor Author

Dimev commented Oct 31, 2022

Ah nice

@Dimev
Copy link
Contributor Author

Dimev commented Nov 7, 2022

#297 Allows doing full networks, but looks like this still breaks:
panics on this code on line 77 due to unwrapping on a None

type Model = (
    AddInto<(
        // phoneme a
        Linear<1, HIDDEN_SIZE>,
        
        // phoneme b         
        Linear<1, HIDDEN_SIZE>, 
        
        // noise
        Linear<1, HIDDEN_SIZE>, 
        
        // state
        Linear<STATE_SIZE, HIDDEN_SIZE>
    )>, 
    ReLU,
    SplitInto<(
        // state
        Linear<HIDDEN_SIZE, STATE_SIZE>,
        
        // next
        Linear<HIDDEN_SIZE, 1>,        

        // sample
        Linear<HIDDEN_SIZE, 1>
    )>
);

fn main() {
    // make rng
    let mut rng = StdRng::seed_from_u64(0);

    // make model
    let mut model = Model::default();
    
    // data TODO
    let x: Tensor1D<2> = Tensor1D::randn(&mut rng);
    let y: Tensor1D<8> = Tensor1D::randn(&mut rng);

    // gradient descent
    let mut sgd = Sgd::new(SgdConfig {
        lr: 0.01,
        momentum: Some(Momentum::Nesterov(0.9)),
        weight_decay: None,
    });

    // other idea:
    // generate a voice line
    // split on phonemes
    // train on one phoneme and that way you can still do batching

    // train
    for _ in 0..10 {
        // internal state
        let mut state = Tensor1D::<STATE_SIZE>::zeros().traced();

        for _ in 0..5 {

            // input
            let phoneme_a = Tensor1D::new([0.0]).traced();
            let phoneme_b = Tensor1D::new([0.0]).traced();
            let noise = Tensor1D::new([0.0]).traced();
            
            // forward
            let (new_state, next, sample) = model.forward((phoneme_a, phoneme_b, noise, state));

            // loss
            let loss = mse_loss(sample, Tensor1D::new([0.0]));

            // gradients, breaks here
            // thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/username/.cargo/git/checkouts/dfdx-318e6e5ad83eea79/3fc7be4/src/gradients.rs:273:14 
            let gradients = loss.backward();

            // update
            sgd.update(&mut model, gradients).expect("nn machine broke");
            
            // keep state
            state = new_state.traced();
        }
    }

    //println!("{:?}", model);
}

@Dimev
Copy link
Contributor Author

Dimev commented Nov 11, 2022

Ok, ^^ only seems to happen if I use the splitInto

@coreylowman
Copy link
Owner

Closing as this was addressed in the recent update. re-open if you're still having the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants