Integrate new cache system for training #472

michaelbenayoun · 2024-02-09T11:04:20Z

What does this PR do?

Integrate the native cache system to training.

To do in following PRs:

Update the documentation
Remove former cache util code that is not useful anymore, this includes the functions and the tests
Unify optimum/neuron/utils/cache_utils.py and optimum/neuron/utils/hub_neuronx_cache.py.

HuggingFaceDocBuilderDev · 2024-02-09T11:10:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/neuron/utils/neuron_cc_wrapper

optimum/neuron/utils/hub_neuronx_cache.py

dacorvo

One minor comment: if you could move the check for parallel compile in the parallel compile related code that would be awesome.
Otherwise LGTM.

optimum/neuron/utils/hub_neuronx_cache.py

This runs export and pipelines tests in dedicated pipelines with stricter path filters to avoid running them on every change.

dacorvo

LGTM, thanks for the pull-request: a brand new training cache !

Integrate new cache system for training

e35e453

michaelbenayoun added 4 commits February 9, 2024 18:36

Cache system works

c19d511

Cleanup

69f144b

Cleanup

6687070

Cleanup

aafeecc

michaelbenayoun requested a review from dacorvo February 9, 2024 17:57

michaelbenayoun marked this pull request as ready for review February 9, 2024 17:57

Cleanup

39c9a02

5cp reviewed Feb 9, 2024

View reviewed changes

optimum/neuron/utils/neuron_cc_wrapper Outdated Show resolved Hide resolved

5cp reviewed Feb 9, 2024

View reviewed changes

optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved

michaelbenayoun added 8 commits February 15, 2024 10:51

Fix minor issues

230458f

Adapt commands

fcc7180

Remove unused command

738ef38

Fix

3522de8

Fix

17c6f38

neuron parallel compile

11b2797

Fix

2e0917c

Cleanup

aefbebf

dacorvo reviewed Feb 15, 2024

View reviewed changes

optimum/neuron/utils/hub_neuronx_cache.py Show resolved Hide resolved

optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved

optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved

michaelbenayoun and others added 6 commits February 15, 2024 17:28

Move the check for neuron_parallel_compile

ad864e0

Improve the way we patch

457290e

Use more nested naming convention for registries

3df691e

Apply changes

326cdfd

Remove unrelevant test

43ff13e

ci: reduce export and pipelines test frequency

e94630c

This runs export and pipelines tests in dedicated pipelines with stricter path filters to avoid running them on every change.

dacorvo approved these changes Feb 16, 2024

View reviewed changes

dacorvo merged commit d319856 into main Feb 16, 2024
10 of 11 checks passed

dacorvo deleted the cache_training branch February 16, 2024 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate new cache system for training #472

Integrate new cache system for training #472

michaelbenayoun commented Feb 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 9, 2024

dacorvo left a comment

dacorvo left a comment

Integrate new cache system for training #472

Integrate new cache system for training #472

Conversation

michaelbenayoun commented Feb 9, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 9, 2024

dacorvo left a comment

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

michaelbenayoun commented Feb 9, 2024 •

edited

Loading