A lightweight implementation of Information-Ordered Bottlenecks (IOBs) in PyTorch. For theory details, see the paper.
The IOB implementation is extremely simple and designed to natively wrap existing torch.nn
layers. The core of our implementation is the IOBLayer
object, which allows one to mimic a bottleneck of variable width by selectively masking portions of its input.
The IOBLayer
can be used in PyTorch's functional API. Consider the following 2-layer linear autoencoder.
from torch import nn
from iobs.layers import IOBLayer
class AutoEncoder(nn.Module):
def __init__(self, data_dim):
super().__init__()
self.encoder = nn.Linear(data_dim, 8)
self.decoder = nn.Linear(8, data_dim)
self.bottleneck = IOBLayer(8)
def forward(self, batch_features, n_open):
code = self.encoder(batch_features)
code = self.bottleneck.forward_neck(code, n_open)
reconstructed = self.decoder(code)
return reconstructed
Here, the forward_neck
function only allows information to pass through the first n_open
nodes of the previous hidden layer, by masking the other inputs. However, it outputs a fixed dimensional tensor, allowing functional consistency with the downstream layers. As an example, running the above forward
function with n_open=5
is equivalent to an autoencoder with a bottleneck width of 5 nodes.
The IOBLayer
has four differentiable methods to pass forward information.
IOBLayer.forward(input)
passes all the information through all the nodes and is equivalent to an Identity matrix multiplication.IOBLayer.forward_neck(input, n_open)
only passes information through the firstn_open
nodes.IOBLayer.forward_mask(input, mask)
allows one to pass a custom mask which will be applied to the latents. For a batch size of 1, this is functionally equivalent toinput*mask
.IOBLayer.forward_all(input)
passes information through all possiblen_open
bottlenecks, and aggregates all configurations into a new batch dimension. For example, ifinput.shape=(64,8)
, wherein the max width of the IOBLayer is 8, then the output ofIOBLayer.forward_all(input)
is of shape(64,9,8)
, where each element of the second axis represents a differentn_open
(includingn_open=0
). This can be later flattened into a larger batch (e.g.(64*9,8)
) and passed as a regular tensor to the downstream architecture.
The installation of pytorch-iobs is designed to be a lightweight wrapper on top of pytorch. The only dependencies are recent versions of numpy
, torch
, and tqdm
. To install, simply clone the repository and use pip install
.
git clone git@github.com:maho3/pytorch-iobs.git
cd pytorch-iobs
pip install -e .
See the practical examples in notebooks/
for demonstrations on using IOBLayers during training and inference.