Change back to `Thread` for SF conversion #35236

ydshieh · 2024-12-12T13:49:07Z

What does this PR do?

On other platforms likke Mac / Windows, using Process (see #34966) is even much cost than Thread (on Linux, the cost is also higher but not that much).

In a CI environment, it may occurs many tests will eventually call auto_conversion (within from_pretrained), and the accumulation of using Process is too high and cause PEFT (Mac / Windows) CI running time increase a lot:

Windows (~1h:22min) and MacOS (~52min) compared to Ubuntu (~22min)

This PR changes it back to using Thread and achieve the original goal of #34966 in another way.

ydshieh · 2024-12-12T13:50:33Z

src/transformers/testing_utils.py

+            if thread_id != self._thread_id:
+                continue


This is to achieve what #34966 was trying to fix:

if the log is not from the same thread as the test thread itself, let's ignore it

LysandreJik

Ok, this sounds good to me! Thanks for the quick fix @ydshieh!

Wauplin

Looks good in principle!

I was about to say it's no best practice to make such a big change just for testing purposes, just before realizing the Process has been added for testing purposes in the first place 😄 So I guess all good if CI passes :)

HuggingFaceDocBuilderDev · 2024-12-12T14:20:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ues in tests (#6822) Changes from huggingface/transformers#34966 caused the `nv-torch-latest-v100` tests to fail with the following error: ``` File "/tmp/azureml/cr/j/e4bfd57a509846d6bbc4914639ad248d/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3941, in from_pretrained raise EnvironmentError( OSError: Can't load the model for 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. ``` Sample failure here: https://github.com/microsoft/DeepSpeed/actions/runs/12169422174/job/33942348835?pr=6794#step:8:3506 This was resolved on the Transformers side here: huggingface/transformers#35236

…ues in tests (deepspeedai#6822) Changes from huggingface/transformers#34966 caused the `nv-torch-latest-v100` tests to fail with the following error: ``` File "/tmp/azureml/cr/j/e4bfd57a509846d6bbc4914639ad248d/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3941, in from_pretrained raise EnvironmentError( OSError: Can't load the model for 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. ``` Sample failure here: https://github.com/microsoft/DeepSpeed/actions/runs/12169422174/job/33942348835?pr=6794#step:8:3506 This was resolved on the Transformers side here: huggingface/transformers#35236 Signed-off-by: siqi <siqi@tecorigin.com>

…ues in tests (deepspeedai#6822) Changes from huggingface/transformers#34966 caused the `nv-torch-latest-v100` tests to fail with the following error: ``` File "/tmp/azureml/cr/j/e4bfd57a509846d6bbc4914639ad248d/exe/wd/actions-runner/_work/DeepSpeed/DeepSpeed/unit-test-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3941, in from_pretrained raise EnvironmentError( OSError: Can't load the model for 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'hf-internal-testing/tiny-random-VisionEncoderDecoderModel-vit-gpt2' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. ``` Sample failure here: https://github.com/microsoft/DeepSpeed/actions/runs/12169422174/job/33942348835?pr=6794#step:8:3506 This was resolved on the Transformers side here: huggingface/transformers#35236

ydshieh commented Dec 12, 2024

View reviewed changes

ydshieh added 2 commits December 12, 2024 14:52

fix

f83e16a

fix

c1e1bcd

ydshieh force-pushed the debug-thread branch from 8373cb8 to c1e1bcd Compare December 12, 2024 13:53

ydshieh requested review from LysandreJik and Wauplin December 12, 2024 14:06

LysandreJik approved these changes Dec 12, 2024

View reviewed changes

Wauplin approved these changes Dec 12, 2024

View reviewed changes

fix

3a14cf8

ydshieh merged commit a691ccb into main Dec 12, 2024
25 checks passed

ydshieh deleted the debug-thread branch December 12, 2024 15:05

ydshieh mentioned this pull request Dec 12, 2024

Fix flaky test execution caused by Thread #34966

Merged

This was referenced Dec 12, 2024

Attempt to fix performance issue on CI runners huggingface/peft#2268

Closed

CI Further testing Mac and Win 2 huggingface/peft#2272

Closed

CI Test removing multiple tested models huggingface/peft#2269

Closed

loadams mentioned this pull request Dec 12, 2024

Remove pin from transformers version and fix Processing/Threading issues in tests deepspeedai/DeepSpeed#6822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change back to `Thread` for SF conversion #35236

Change back to `Thread` for SF conversion #35236

ydshieh commented Dec 12, 2024 •

edited

Loading

ydshieh Dec 12, 2024

LysandreJik left a comment

Wauplin left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 12, 2024

Change back to Thread for SF conversion #35236

Change back to Thread for SF conversion #35236

Conversation

ydshieh commented Dec 12, 2024 • edited Loading

What does this PR do?

ydshieh Dec 12, 2024

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Wauplin left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 12, 2024

Change back to `Thread` for SF conversion #35236

Change back to `Thread` for SF conversion #35236

ydshieh commented Dec 12, 2024 •

edited

Loading

Wauplin left a comment •

edited

Loading