AdEMAMix Optimizer Implementation

Introduction

This repository provides an implementation of the AdEMAMix optimizer, which is designed to improve upon the traditional Adam optimizer by utilizing a mixture of two Exponential Moving Averages (EMAs). AdEMAMix was proposed in the paper The AdEMAMix Optimizer: Better, Faster, Older, and it leverages both recent and older gradients to enhance model convergence, particularly in complex architectures like Convolutional Neural Networks (CNNs) and Transformer models.

About AdEMAMix

AdEMAMix is a novel optimizer that modifies the standard Adam optimizer by incorporating two EMA momentum terms:

Fast EMA (m1): Tracks recent gradients, similar to Adam's momentum term.
Slow EMA (m2): Tracks older gradients, allowing the optimizer to benefit from long-term gradient accumulation.

This combination allows AdEMAMix to be responsive to local changes in the loss landscape while leveraging historical gradients to improve generalization and stability.

Example implementaion can be found in jupyter notebook test_ademamix.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
ademamix.py		ademamix.py
test_ademamix.ipynb		test_ademamix.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdEMAMix Optimizer Implementation

Introduction

About AdEMAMix

About

Releases

Packages

Languages

riteshshergill/ademamix

Folders and files

Latest commit

History

Repository files navigation

AdEMAMix Optimizer Implementation

Introduction

About AdEMAMix

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages