Skip to content

Releases: fkodom/yet-another-retnet

0.5.1

15 Nov 14:50
34cc30d
Compare
Choose a tag to compare

What's Changed

  • Bug fix: decay mask for bf16, bf32 by @fkodom in #25

Full Changelog: 0.5.0...0.5.1

0.5.0

11 Nov 15:09
588bf7b
Compare
Choose a tag to compare

Significant efficiency improvements to the chunkwise formulation, thanks to @leor-c 🎉

What's Changed

  • a more efficient computation of the state in the chunkwise formulation by @leor-c in #22

Full Changelog: 0.4.2...0.5.0

0.4.2

09 Nov 14:14
7d9c1a7
Compare
Choose a tag to compare

What's Changed

  • Slightly more efficient / cleaner implementation of the chunkwise relative pos. enc. by @leor-c in #21

New Contributors

Full Changelog: 0.4.1...0.4.2

0.4.1

09 Nov 13:46
3cf9797
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.4.0...0.4.1

0.4.0

28 Sep 15:40
c0c4327
Compare
Choose a tag to compare

What's Changed

  • Fixed issue where the dimensions of the Group norm seems to be incorrect by @draguve in #11

New Contributors

Full Changelog: 0.3.1...0.4.0

0.3.1

15 Aug 13:35
505dff7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.3.0...0.3.1

0.3.0

11 Aug 17:16
ee3979c
Compare
Choose a tag to compare

More streamlined support for training

  • example training script
  • RetNet.forward is no longer just a wrapper for RetNet.forward_parallel. It accepts inputs, labels Tensors, and returns a loss value.
    class RetNet:
        ...
        def forward(self, inputs: Tensor, labels: Tensor) -> Tensor:
            pred = self.forward_parallel(inputs)
            criterion = nn.CrossEntropyLoss()
            return criterion(rearrange(pred, "b n c -> (b n) c"), labels.flatten())
  • include example TorchData datapipe -- top 100 project gutenberg books
  • example streaming text generation with trained RetNet

0.2.0

10 Aug 16:57
3da9e6c
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.1.3...0.2.0

0.1.3

10 Aug 14:22
Compare
Choose a tag to compare

Set default layer_norm_eps=1e-6, as updated in the official implementation:
microsoft/torchscale@2c29de0

0.1.2

08 Aug 02:45
Compare
Choose a tag to compare

Remove extra complex conjugation from the relative position embedding.
Reference: microsoft/torchscale#49