-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple inputs to a network #253
Comments
In code, it would look like this: type Model = (
AddInto<(
Linear<2, 8>,
Linear<4, 8>
)>, ReLU,
Linear<8, 1>
);
// ...
model.forward((in_a.trace(), in_b.trace())); |
There is GeneralizedResidual, which takes two sub modules and adds the result together. Though this only works for two submodules at the moment. I like the idea of having something that accepts the tuple! |
If I can figure out how to split the gradient tape I can have a go at a PR for this |
Awesome! I would say to start out just implementing a 2-tuple and 3-tuple by hand and then see if you can turn it into a macro for n-tuples (unfortunately no variadic tuples). You'll want to only accept 1 tensor with a tape in the input, and then do some split_tape(), put_tape() to make sure every add is captured on the tape. |
Looking at GeneralizedResidual, I should
so in code it would expand to this fn forward(&self, x: T) -> Self::Output {
let (x, tape) = x.split_tape();
// base case
let (base, tape) = self.0.forward(x.duplicate().put_tape(tape)).split_tape();
// recursive case
let rec = self.1.forward(x.duplicate().put_tape(tape));
let (base, tape) = add(rec, &base).split_tape();
// repeat the above untill no more tuple is left
// at the end
base.put_tape(tape)
} Haven't put it in the compiler yet, but seems like the macro should generate this |
Ok, thinking again fn forward(&self, x: T) -> Self::Output {
let (x, tape) = x.split_tape();
// head and tail
let (head, tail..) = self.0;
// put tail into an add into again
let tail = AddInto<Tail>(tail);
// base case
let (base, tape) = head.forward(x.duplicate().put_tape(tape)).split_tape();
// recursive case
let rec = tail.forward(x.duplicate().put_tape(tape));
// and add together
add(rec, &base)
} This only needs a special impl for forward on a 1 tuple |
Also started a draft pr: AddInto #256 |
Ok, just a few more trait bound complaints from rustc left |
How should I think about tapes? Right now my macros does this: let (head, tails ...) = x;
let (head, tape) = head.split_tape();
let (first, rest ..) = input_networks;
let (result, others ...) = (first.forward(head.put_tape), rest.forward(tails))
let head = add(head, others ..)
head but the add complains a lot about expecting no tape, and I'm not sure how I should properly pass it around |
This just got a quite a bit easier now that i've just merged #268. Basically now you shouldn't have to worry about split/put, and instead can use So for example a 3 tuple i think would be: let accum = self.0.forward(x.with_new_tape());
let accum = add(accum, self.1.forward(x.with_new_tape()));
add(accum, self.2.forward(x)) |
Thanks, will update my branch to use that! |
Yep, much easier! I think I've added everything needed, so I'll make the PR proper now |
Looks like doing |
so either implement tensor for tuples containing of tensors only or drop the requirement? Maybe keep this as a tracking issue for AddInto, for discussion of the trace stuff too. |
Also looks like only the last element in SplitInto passes along the tape, so maybe that causes some breakage when passing it into an AddInto? |
Yeah tape only on one output was intentional, willing to revisit though. Do we have an example architecture to use as a reference that might use AddInto and SplitInto together so we can think about how to set them up? |
I guess the splitinto/addinto can be used as a more powerful GeneralizedResidual, in case you want 3 or more different processing steps before adding them back together. My own architecture I wanted to use AddInto for was this: type Model = (
AddInto<(
Linear<5, 16>, // input
Linear<16, 16>, // state
)>,
Relu,
SplitInto<(
Linear<16, 1>, // output
Linear<16, 16>, // internal state
)>
); Effectively a recurrent network if state is fed back into the network. |
Oh this is because the sequential tuple modules require input to be a tensor? I think we can just straight up remove those actually (according to local testing). Nothing in the impls requires a Tensor so pretty sure this is safe! That makes your example work too |
Ah nice |
#297 Allows doing full networks, but looks like this still breaks: type Model = (
AddInto<(
// phoneme a
Linear<1, HIDDEN_SIZE>,
// phoneme b
Linear<1, HIDDEN_SIZE>,
// noise
Linear<1, HIDDEN_SIZE>,
// state
Linear<STATE_SIZE, HIDDEN_SIZE>
)>,
ReLU,
SplitInto<(
// state
Linear<HIDDEN_SIZE, STATE_SIZE>,
// next
Linear<HIDDEN_SIZE, 1>,
// sample
Linear<HIDDEN_SIZE, 1>
)>
);
fn main() {
// make rng
let mut rng = StdRng::seed_from_u64(0);
// make model
let mut model = Model::default();
// data TODO
let x: Tensor1D<2> = Tensor1D::randn(&mut rng);
let y: Tensor1D<8> = Tensor1D::randn(&mut rng);
// gradient descent
let mut sgd = Sgd::new(SgdConfig {
lr: 0.01,
momentum: Some(Momentum::Nesterov(0.9)),
weight_decay: None,
});
// other idea:
// generate a voice line
// split on phonemes
// train on one phoneme and that way you can still do batching
// train
for _ in 0..10 {
// internal state
let mut state = Tensor1D::<STATE_SIZE>::zeros().traced();
for _ in 0..5 {
// input
let phoneme_a = Tensor1D::new([0.0]).traced();
let phoneme_b = Tensor1D::new([0.0]).traced();
let noise = Tensor1D::new([0.0]).traced();
// forward
let (new_state, next, sample) = model.forward((phoneme_a, phoneme_b, noise, state));
// loss
let loss = mse_loss(sample, Tensor1D::new([0.0]));
// gradients, breaks here
// thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/username/.cargo/git/checkouts/dfdx-318e6e5ad83eea79/3fc7be4/src/gradients.rs:273:14
let gradients = loss.backward();
// update
sgd.update(&mut model, gradients).expect("nn machine broke");
// keep state
state = new_state.traced();
}
}
//println!("{:?}", model);
} |
Ok, ^^ only seems to happen if I use the splitInto |
Closing as this was addressed in the recent update. re-open if you're still having the issue |
Multiple inputs is a bit tricky atm, but multiple outputs is easily done with SplitInto,
Maybe AddInto can do something like that?
it would add all the values of it's inputs into one tensor.
The text was updated successfully, but these errors were encountered: