Re-initializing moments of momentum-based optimizer after gradient update using flax and optax APIs #3111

dantp-ai · 2023-05-23T18:12:25Z

dantp-ai
May 23, 2023

Hello friends,

I want to reinitialize the moments mu and nu of an adam optimizer from time to time. The train state is stored using the flax.training.train_state.TrainState API. The important parts of the code look like this:

# This is inspired by the tutorial on 
# [Surgery with Optimizers](https://flax.readthedocs.io/en/latest/guides/model_surgery.html#surgery-with-optimizers) 
# which explains how to update the optimizer state
def init_opt_params(params):

    # flatten
    flat_params = traverse_util.flatten_dict(params, sep="/")

    # modify
    flat_params = tree_util.tree_map(lambda x: jnp.zeros_like(x), flat_params)

    # unflatten
    return traverse_util.unflatten_dict(flat_params, sep="/")


def create_train_state(self, module, input_shape, rng, learning_rate):
    params = module.init(rng, jnp.ones(input_shape), jnp.float32)["params"]
    tx = optax.adam(learning_rate=learning_rate)
    opt_state = tx.init(params)
    return TrainState(
        apply_fn=module.apply,
        params=params,
        tx=tx,
        opt_state=opt_state,
        step=jnp.array(0),
        metrics=Metrics.empty(),
    )

...

# Then training goes like this:

for batch in dataloader:

    # does a adam gradient descent update on the parameters of the model
    state = self.train_step(state, batch)

    if reinit_weight:
        # initialize moment estimates
        params_mu = init_opt_params(state.opt_state[0].mu)
        params_nu = init_opt_params(state.opt_state[0].nu)

        new_opt_state = (state.opt_state[0]._replace(
            mu=params_mu,
            nu=params_nu,
            )
        ) + state.opt_state[1:]

        state = state.replace(opt_state=new_opt_state)

I get the following error:

ValueError: The number of updates and states has to be the same in chain! Make sure you have called init first!

Basically, I want to update the params of the module using the optimizer, and after that, based on that update, reset some of the params of the two moments of the optimizer state to zero.

SOLVED: See below

EDIT:

new_opt_state is not a tuple of states -> add trailing comma to first operand in the addition
newly initialized moments need to be read-only -> flax.core.freeze(params_mu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-initializing moments of momentum-based optimizer after gradient update using flax and optax APIs #3111

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Re-initializing moments of momentum-based optimizer after gradient update using flax and optax APIs #3111

dantp-ai May 23, 2023

Replies: 0 comments

dantp-ai
May 23, 2023