Authors: Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell
Year: 2016
Algorithm: Progressive Networks
-
Problems to solve
- The usual transfer learning practice is unsuitable for transferring across multiple tasks: If we wish to leverage knowledge acquired across a sequence of experiences, which model should we use to initialize subsequent models? —> Requires transfer learning without catastrophic forgetting, but also foreknowledge of task similarity.
-
Method: Progressive Networks
-
The goals:
- Solve K independent tasks at the end of training
- Accelerate learning via transfer when possible
- Avoid catastrophic forgetting
-
It retains a pool of pretrained models throughout training, and learn lateral connecctions from these to extract useful features for the new task.
-
The progressive network starts with a single column, where the network parameters are trained to convergence. When switching to a new task, a new set of network parameters are randomly initialized, where each layer receives input from both the previous layer in the first and the second network. See the figure below:
-
This generalizes to K tasks as follows:
-
Downside of this approach: The growth in number of parameters with the number of tasks.
-