diff --git a/users/zeyer/experiments/exp2023_04_25_rf/README.md b/users/zeyer/experiments/exp2023_04_25_rf/README.md index 4b9cd712d..c3ef54838 100644 --- a/users/zeyer/experiments/exp2023_04_25_rf/README.md +++ b/users/zeyer/experiments/exp2023_04_25_rf/README.md @@ -83,3 +83,4 @@ TODO model changes: - Second decoder LSTM - ZoneoutLSTM use_zoneout_output=True - (cnnblstmf2) +- QK Norm (as in QK Norm paper with L2 norm, or as in Scaling ViT paper with LayerNorm)