TF Sharded #17713

ArthurZucker · 2022-06-15T13:20:43Z

What does this PR do?

Introduces the sharding of TF models following the pytroch implementation.

A simple working example is the following :

from transformers import TFOPTModel
save_directory = "opt-350m"
model = TFOPTModel.from_pretrained("facebook/opt-350m")
model.save_pretrained(save_directory, max_shard_size = "1GB")
tf_model = TFOPTModel.from_pretrained(save_directory)

HuggingFaceDocBuilderDev · 2022-06-15T14:06:26Z

The documentation is not available anymore as the PR was closed or merged.

…to tf-sharded

ArthurZucker · 2022-06-15T15:58:26Z

Okay so the tfopt_for_causal_lm/tfopt_model
prefix from the tfopt_for_causal_lm/model/decoder/embed_positions/weight:0 in the index json comes from the actual name of the layer (so tf side). This also creates the hack that we sometime need when some layer is shared : for OPT we have the following : 'decoder.embed_tokens/model.decoder.embed_tokens/weight:0' which thus becomes model.decoder.embed_tokens/weight:0 . Most interesting part is that the ‘decoder.embed_tokens’ comes from https://github.com/ArthurZucker/transformers/blob/e950ff48a91840e30966abaf86bdb02dc16fcdab/src/transformers/models/opt/modeling_tf_opt.py#L499-L511 (the load weight prefix hack using load_weight_prefix) I am sure that there is something to do about that so I will detail that and dig a bit further

…to tf-sharded

gante

In general, LGTM 👍

There are two main sets of comments which I think it would be nice to address before merging:

Some documentation is copy/paste from the PT side, which means it needs some tweaks for TF. Also, some typos were copied over :D
There is duplicated functionality that I think we could move to a shared module

gante · 2022-06-20T14:25:52Z

src/transformers/utils/hub.py

@@ -1075,3 +1080,114 @@ def send_example_telemetry(example_name, *example_args, framework="pytorch"):
    except Exception:
        # We don't want to error in case of connection errors of any kind.
        pass
+
+
+def convert_file_size_to_int(size: Union[int, str]):


This function is the same as in here.

Perhaps move the function to some file with shared functionality, like this one, and import from there? (cc @sgugger )

Yes! It will be removed from modeling_utils

gante · 2022-06-20T14:26:43Z

src/transformers/utils/hub.py

+    raise ValueError("`size` is not in a valid format. Use an integer followed by the unit, e.g., '5GB'.")
+
+
+def get_checkpoint_shard_files(


The same comment as above -- there is a very similar function here

The docstring also needs a minor update: PreTrainedModel has a corresponding TF version, TFPreTrainedModel

Same as above, should remove the functions from modeling_utils in a next PR

src/transformers/modeling_tf_utils.py

patrickvonplaten · 2022-06-20T16:43:00Z

Looks very nice to me!

Only did a very high-level review. Defering to @gante and @sgugger here :-)

src/transformers/modeling_tf_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

…mers into tf-sharded

gante

If there are plans to move shared functionality in a future PR, I'm happy to approve 👍

sgugger

Thanks a lot for working on this!

src/transformers/modeling_tf_utils.py

sgugger · 2022-06-21T12:46:58Z

src/transformers/modeling_tf_utils.py

+    unexpected_keys = set()
+    # Read the H5 file
+    try:
+        with h5py.File(resolved_archive_file, "r") as f:


Since it's used a lot in the below, can we give this f a better (more descriptive) name?

sgugger · 2022-06-21T12:48:04Z

src/transformers/modeling_tf_utils.py

-            )
+            if is_sharded:
+                for file in resolved_archive_file:
+                    assert os.path.isfile(file), f"Error retrieving files {file}"


No new asserts in the codebase ;-) Please use a test and raise the appropriate error.

Error is pretty much already handled with the OSError as the call to load_tf_... is already in a try/catch clause

In this case you can remove the assert :-p

sgugger · 2022-06-21T12:48:10Z

src/transformers/modeling_tf_utils.py

+                    ignore_mismatched_sizes=ignore_mismatched_sizes,
+                )
+            else:
+                assert os.path.isfile(resolved_archive_file), f"Error retrieving file {resolved_archive_file}"


Same here :-)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…mers into tf-sharded

* initial commit * update modeeling tf utils * quality * clean and update args * update * remove potential bug * code quality * update * update max shard * update tests for sharding from pretrained * fix remaining test * make style * h5py if tf available * update and fix test * fix test * style * modified push to hub to support shard for TF * quick fix * update code * merge branch main and style * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * update based on reviews * update doc * update and style * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update based on reviews * fix typo * style Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ArthurZucker added 3 commits June 14, 2022 17:13

initial commit

b74fa4c

update modeeling tf utils

3e97d15

quality

f2928eb

ArthurZucker self-assigned this Jun 15, 2022

ArthurZucker linked an issue Jun 15, 2022 that may be closed by this pull request

Shard checkpoint for tf and flax #17564

Closed

ArthurZucker added 2 commits June 15, 2022 15:37

clean and update args

0e9be9b

update

6fe4108

ArthurZucker added TensorFlow Anything TensorFlow Core: Modeling Internals of the library; Models. labels Jun 15, 2022

ArthurZucker added 2 commits June 15, 2022 15:55

remove potential bug

75af420

code quality

26e60a9

Merge branch 'main' of https://github.com/huggingface/transformers in…

e950ff4

…to tf-sharded

update

fb80cc0

ArthurZucker mentioned this pull request Jun 16, 2022

Fix tf shared embedding #17730

Merged

ArthurZucker added 12 commits June 16, 2022 14:40

update max shard

2f38c03

update tests for sharding from pretrained

77c4d28

fix remaining test

2099d3b

make style

42ce4db

Merge branch 'main' of https://github.com/huggingface/transformers in…

d930a1c

…to tf-sharded

h5py if tf available

caa2d1a

update and fix test

416588d

fix test

329c602

style

2a00591

Merge branch 'main' of https://github.com/huggingface/transformers in…

58c03fa

…to tf-sharded

modified push to hub to support shard for TF

98e09da

quick fix

1c2ae2e

ArthurZucker marked this pull request as ready for review June 17, 2022 12:06

update code

ece8521

ArthurZucker added 2 commits June 20, 2022 15:01

Merge branch 'main' of https://github.com/huggingface/transformers in…

941cfa4

…to tf-sharded

merge branch main and style

b1fdf93

ArthurZucker requested review from gante, sgugger and patrickvonplaten June 20, 2022 13:32

ArthurZucker changed the title ~~load and save tensorflow sharded checkpoints~~ TF Sharded Jun 20, 2022

gante reviewed Jun 20, 2022

View reviewed changes

patrickvonplaten reviewed Jun 20, 2022

View reviewed changes

src/transformers/modeling_tf_utils.py Outdated Show resolved Hide resolved

ArthurZucker and others added 5 commits June 20, 2022 22:25

Apply suggestions from code review

52f86e3

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

update based on reviews

9bbc1cd

Merge branch 'tf-sharded' of https://github.com/ArthurZucker/transfor…

a586f61

…mers into tf-sharded

update doc

dfe4523

update and style

edbc781

gante approved these changes Jun 21, 2022

View reviewed changes

sgugger approved these changes Jun 21, 2022

View reviewed changes

ArthurZucker and others added 5 commits June 21, 2022 15:52

Apply suggestions from code review

5e48351

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update based on reviews

8a90875

Merge branch 'tf-sharded' of https://github.com/ArthurZucker/transfor…

0148db8

…mers into tf-sharded

fix typo

4ce615a

style

0bea70d

ArthurZucker merged commit 7cced02 into huggingface:main Jun 21, 2022

ArthurZucker mentioned this pull request Jun 22, 2022

Clean modeling utils, linked to #17760 and #17713 #17818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF Sharded #17713

TF Sharded #17713

ArthurZucker commented Jun 15, 2022

HuggingFaceDocBuilderDev commented Jun 15, 2022 •

edited

Loading

ArthurZucker commented Jun 15, 2022

gante left a comment

gante Jun 20, 2022

ArthurZucker Jun 20, 2022 •

edited

Loading

gante Jun 20, 2022

ArthurZucker Jun 20, 2022

patrickvonplaten commented Jun 20, 2022 •

edited

Loading

gante left a comment

sgugger left a comment

sgugger Jun 21, 2022

sgugger Jun 21, 2022

ArthurZucker Jun 21, 2022

sgugger Jun 21, 2022

sgugger Jun 21, 2022

		raise ValueError("`size` is not in a valid format. Use an integer followed by the unit, e.g., '5GB'.")


		def get_checkpoint_shard_files(

TF Sharded #17713

TF Sharded #17713

Conversation

ArthurZucker commented Jun 15, 2022

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 15, 2022 • edited Loading

ArthurZucker commented Jun 15, 2022

gante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Jun 20, 2022 • edited Loading

gante left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 15, 2022 •

edited

Loading

ArthurZucker Jun 20, 2022 •

edited

Loading

patrickvonplaten commented Jun 20, 2022 •

edited

Loading