Remove deprecated `shard_size` arg from `.push_to_hub()` #5469

polinaeterna · 2023-01-26T15:40:56Z

The docstrings say that it was supposed to be deprecated since version 2.4.0, can we remove it?

HuggingFaceDocBuilderDev · 2023-01-26T15:45:11Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Yes good catch ! thanks

github-actions · 2023-01-26T17:37:51Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008272 / 0.011353 (-0.003081)	0.004494 / 0.011008 (-0.006515)	0.100764 / 0.038508 (0.062256)	0.028741 / 0.023109 (0.005632)	0.309020 / 0.275898 (0.033122)	0.354184 / 0.323480 (0.030704)	0.007455 / 0.007986 (-0.000531)	0.003377 / 0.004328 (-0.000951)	0.078472 / 0.004250 (0.074222)	0.034719 / 0.037052 (-0.002333)	0.312787 / 0.258489 (0.054298)	0.342878 / 0.293841 (0.049037)	0.033326 / 0.128546 (-0.095221)	0.011519 / 0.075646 (-0.064127)	0.323556 / 0.419271 (-0.095716)	0.039929 / 0.043533 (-0.003604)	0.304627 / 0.255139 (0.049488)	0.322876 / 0.283200 (0.039677)	0.086410 / 0.141683 (-0.055273)	1.502607 / 1.452155 (0.050453)	1.577953 / 1.492716 (0.085237)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.192861 / 0.018006 (0.174855)	0.406008 / 0.000490 (0.405519)	0.001075 / 0.000200 (0.000875)	0.000071 / 0.000054 (0.000016)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.023351 / 0.037411 (-0.014060)	0.096086 / 0.014526 (0.081561)	0.104641 / 0.176557 (-0.071915)	0.141940 / 0.737135 (-0.595195)	0.109266 / 0.296338 (-0.187073)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.416496 / 0.215209 (0.201287)	4.161581 / 2.077655 (2.083926)	1.815357 / 1.504120 (0.311238)	1.609536 / 1.541195 (0.068341)	1.654105 / 1.468490 (0.185615)	0.693947 / 4.584777 (-3.890830)	3.349029 / 3.745712 (-0.396683)	1.883968 / 5.269862 (-3.385893)	1.287988 / 4.565676 (-3.277688)	0.081765 / 0.424275 (-0.342511)	0.012373 / 0.007607 (0.004766)	0.517186 / 0.226044 (0.291142)	5.200892 / 2.268929 (2.931964)	2.247414 / 55.444624 (-53.197211)	1.910601 / 6.876477 (-4.965876)	1.965407 / 2.142072 (-0.176666)	0.814386 / 4.805227 (-3.990841)	0.149295 / 6.500664 (-6.351369)	0.064667 / 0.075469 (-0.010802)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.247258 / 1.841788 (-0.594530)	13.837355 / 8.074308 (5.763047)	13.850454 / 10.191392 (3.659062)	0.136078 / 0.680424 (-0.544346)	0.028322 / 0.534201 (-0.505878)	0.391394 / 0.579283 (-0.187889)	0.407494 / 0.434364 (-0.026870)	0.473784 / 0.540337 (-0.066554)	0.562953 / 1.386936 (-0.823983)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006559 / 0.011353 (-0.004794)	0.004546 / 0.011008 (-0.006462)	0.099527 / 0.038508 (0.061019)	0.027428 / 0.023109 (0.004319)	0.344276 / 0.275898 (0.068377)	0.377897 / 0.323480 (0.054417)	0.004913 / 0.007986 (-0.003072)	0.003338 / 0.004328 (-0.000990)	0.077589 / 0.004250 (0.073339)	0.038819 / 0.037052 (0.001766)	0.343165 / 0.258489 (0.084676)	0.386228 / 0.293841 (0.092387)	0.031753 / 0.128546 (-0.096794)	0.011756 / 0.075646 (-0.063890)	0.322537 / 0.419271 (-0.096735)	0.049865 / 0.043533 (0.006332)	0.340493 / 0.255139 (0.085354)	0.372179 / 0.283200 (0.088980)	0.099669 / 0.141683 (-0.042013)	1.487841 / 1.452155 (0.035686)	1.527400 / 1.492716 (0.034683)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.180782 / 0.018006 (0.162776)	0.393494 / 0.000490 (0.393004)	0.003004 / 0.000200 (0.002804)	0.000076 / 0.000054 (0.000022)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024997 / 0.037411 (-0.012415)	0.098232 / 0.014526 (0.083707)	0.107869 / 0.176557 (-0.068688)	0.141042 / 0.737135 (-0.596093)	0.109551 / 0.296338 (-0.186787)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.477115 / 0.215209 (0.261906)	4.783928 / 2.077655 (2.706273)	2.435725 / 1.504120 (0.931605)	2.233111 / 1.541195 (0.691916)	2.341097 / 1.468490 (0.872607)	0.694304 / 4.584777 (-3.890473)	3.345687 / 3.745712 (-0.400025)	1.886932 / 5.269862 (-3.382929)	1.155585 / 4.565676 (-3.410092)	0.082867 / 0.424275 (-0.341408)	0.012420 / 0.007607 (0.004813)	0.576575 / 0.226044 (0.350530)	5.777691 / 2.268929 (3.508762)	2.882219 / 55.444624 (-52.562405)	2.543613 / 6.876477 (-4.332864)	2.578939 / 2.142072 (0.436866)	0.803143 / 4.805227 (-4.002084)	0.151929 / 6.500664 (-6.348735)	0.067777 / 0.075469 (-0.007693)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.282711 / 1.841788 (-0.559077)	13.942771 / 8.074308 (5.868463)	13.376206 / 10.191392 (3.184814)	0.152916 / 0.680424 (-0.527508)	0.016619 / 0.534201 (-0.517582)	0.375141 / 0.579283 (-0.204142)	0.381660 / 0.434364 (-0.052704)	0.465090 / 0.540337 (-0.075247)	0.555068 / 1.386936 (-0.831868)

polinaeterna added 2 commits January 26, 2023 16:36

remove deprecated shard_size arg

d33c6f7

remove deprecated shard_size arg from arrow ds

45a5ce1

polinaeterna requested a review from lhoestq January 26, 2023 15:57

lhoestq approved these changes Jan 26, 2023

View reviewed changes

polinaeterna merged commit 10a6a63 into huggingface:main Jan 26, 2023

polinaeterna deleted the remove-deprecated-shard-size branch January 26, 2023 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove deprecated `shard_size` arg from `.push_to_hub()` #5469

Remove deprecated `shard_size` arg from `.push_to_hub()` #5469

polinaeterna commented Jan 26, 2023

HuggingFaceDocBuilderDev commented Jan 26, 2023 •

edited

Loading

lhoestq left a comment

github-actions bot commented Jan 26, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Remove deprecated shard_size arg from .push_to_hub() #5469

Remove deprecated shard_size arg from .push_to_hub() #5469

Conversation

polinaeterna commented Jan 26, 2023

HuggingFaceDocBuilderDev commented Jan 26, 2023 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Remove deprecated `shard_size` arg from `.push_to_hub()` #5469

Remove deprecated `shard_size` arg from `.push_to_hub()` #5469

HuggingFaceDocBuilderDev commented Jan 26, 2023 •

edited

Loading