Legacy checkpoint read, fix checkpoint re-training issue #434

maljoras · 2022-10-06T14:29:45Z

Related issues

Legacy checkpoints loading did not load correctly with the new mapping scales in case of InferenceRPUConfig
Re-training would not update the learning of the out_scaling factors

Description

Checkpoint load modified the parameter in the tile reconstruction, which however caused a mismatch in the module._parameters dictionary, which meant that model.parameters() picked the wrong references.

This caused the model to not train after checkpoint loading, in case when out_scaling_alpha were set to be trained

Now the parameters are re-registered correctly for each load.

Details

tests added to check out_scaling_alpha loading
small fixes regarding the runners and experiments
bump version to 0.6.2

Fabio-83

Thank you @maljoras

Signed-off-by: Henry Ye <yehenry11@gmail.com>

legacy checkointing. training after checkpoint fix

68308aa

maljoras requested a review from Fabio-83 October 6, 2022 14:31

maljoras added the bug Something isn't working label Oct 6, 2022

changelog

401b0cc

Fabio-83 approved these changes Oct 6, 2022

View reviewed changes

maljoras merged commit 647efaa into IBM:master Oct 6, 2022

maljoras deleted the checkpointing branch October 6, 2022 15:42

HCY-11 pushed a commit to HCY-11/aihwkit that referenced this pull request Dec 6, 2022

Legacy checkpoint read, fix checkpoint re-training issue (IBM#434)

3668e4b

Signed-off-by: Henry Ye <yehenry11@gmail.com>

HCY-11 pushed a commit to HCY-11/aihwkit that referenced this pull request Dec 7, 2022

Legacy checkpoint read, fix checkpoint re-training issue (IBM#434)

f9d7c22

Signed-off-by: Henry Ye <yehenry11@gmail.com>

HCY-11 pushed a commit to HCY-11/aihwkit that referenced this pull request Dec 7, 2022

Legacy checkpoint read, fix checkpoint re-training issue (IBM#434)

eb9d38b

Signed-off-by: Henry Ye <yehenry11@gmail.com>

HCY-11 pushed a commit to HCY-11/aihwkit that referenced this pull request Dec 7, 2022

Legacy checkpoint read, fix checkpoint re-training issue (IBM#434)

c1ccc01

Signed-off-by: Henry Ye <yehenry11@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legacy checkpoint read, fix checkpoint re-training issue #434

Legacy checkpoint read, fix checkpoint re-training issue #434

maljoras commented Oct 6, 2022 •

edited

Loading

Fabio-83 left a comment

Legacy checkpoint read, fix checkpoint re-training issue #434

Legacy checkpoint read, fix checkpoint re-training issue #434

Conversation

maljoras commented Oct 6, 2022 • edited Loading

Related issues

Description

Details

Fabio-83 left a comment

Choose a reason for hiding this comment

maljoras commented Oct 6, 2022 •

edited

Loading