Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

ZhengHongming888 · 2024-03-24T21:59:29Z

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training.

This is the first PR including the items as below -

Add 'hpu' device name in get_device_name() under sentence_transformers/util.py
Add padding strategy argument in tokenize() for Gaudi2 device in sentence_transformers/SentenceTransformer.py and sentence_transformers/models/Transformer.py. Keep the original padding strategy for cuda/cpu as the first priority.
Enable graph mode for Gaudi2 device with better performance in sentence_transformers/SentenceTransformer.py.

There is no modification for any inference examples which will seamlessly choose the default device like 'cuda' in cuda system, 'hpu' in Gaudi2 system, neither of above two will choose 'cpu', etc.

Welcome for any questions/comments!

Thanks.

tomaarsen · 2024-04-02T10:16:54Z

Hello!

Thanks for the PR. I've taken a few minutes to fix some of the things we talked about during our meeting. Feel free to look at the individual commits to get a feeling for the changes.
I haven't yet had time to test this on a HPU device. Perhaps you can give it a try yourself after my updates?

Tom Aarsen

ZhengHongming888 · 2024-04-02T23:24:23Z

Thanks Tom for the modification which are all very good especially for the "padding" argument! :-)

Also i have tested all test cases under sentence-transformers/tests and all passed for your commit in my machine with 'hpu' device.

Besides your commits i also made a little change. I moved the initialization of HPU graph mode for hpu device from init() into encode() part and initialize only once time. The enable for HPU training will be different from inference and will enable later for training side. I also make a little change for tests/test_compute_embeddings.py due to the new padding argument.

So right now from my side i can pass all test cases. please help check. ..

Thanks.
Hongming

tomaarsen · 2024-04-08T08:07:56Z

I think this is looking good now, thanks for these changes!

Tom Aarsen

… ( 'hpu' ) - Follow up for #2557 (#2630) * revision for padding argument and truncate dim test * add new padding for hpu graph mode * ruff format * Return dict encoding rather than BatchEncoding for CLIPModel * Remove unused import * remove padding argument * modify the graph enable position * ruff format * add check for optimum install * ruff format * Simplify tokenization --------- Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com> Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

ZhengHongming888 and others added 10 commits March 13, 2024 12:06

enable gaudi in get_device_name

f74a283

padding strategy for gaudi

d5ff6ab

add gaudi graph mode

3945da0

Merge branch 'UKPLab:master' into intel_gaudi_inference

5be9b87

Merge branch 'UKPLab:master' into intel_gaudi_inference

08b18d8

Merge branch 'UKPLab:master' into intel_gaudi_inference

96878ac

Apply formatting via 'make style'

3b7dc6f

Remove unnecessary print statement

f555ace

Fix edge case where 'habana_frameworks' exists, but is not available

f33ae22

Replace 'device_type' with 'padding'; prevent crashes for other modules

99b288c

move hpu.wrap_in_hpu_graph() only inencode()

e18c7d4

ZhengHongming888 and others added 2 commits April 3, 2024 07:52

remove unused import in test_compute_embeddings.py

36c49c9

Run 'make style'

71a51db

tomaarsen merged commit 1cee15c into UKPLab:master Apr 8, 2024
9 checks passed

tomaarsen mentioned this pull request Apr 17, 2024

[clip] Prevent warning with padding when tokenizing for CLIP #2599

Merged

ZhengHongming888 mentioned this pull request Apr 23, 2024

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2608

Closed

ZhengHongming888 mentioned this pull request May 6, 2024

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

ZhengHongming888 commented Mar 24, 2024

tomaarsen commented Apr 2, 2024

ZhengHongming888 commented Apr 2, 2024

tomaarsen commented Apr 8, 2024

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

Conversation

ZhengHongming888 commented Mar 24, 2024

tomaarsen commented Apr 2, 2024

ZhengHongming888 commented Apr 2, 2024

tomaarsen commented Apr 8, 2024