Speed up batched PyTorch DataLoader #5512

lhoestq · 2023-02-08T13:38:59Z

I implemented __getitems__ to speed up batched data loading in PyTorch

close #5505

HuggingFaceDocBuilderDev · 2023-02-08T13:44:01Z

The documentation is not available anymore as the PR was closed or merged.

github-actions · 2023-02-08T13:44:13Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008882 / 0.011353 (-0.002471)	0.004562 / 0.011008 (-0.006446)	0.100035 / 0.038508 (0.061527)	0.030654 / 0.023109 (0.007545)	0.298745 / 0.275898 (0.022847)	0.356869 / 0.323480 (0.033389)	0.007170 / 0.007986 (-0.000815)	0.003471 / 0.004328 (-0.000858)	0.077975 / 0.004250 (0.073725)	0.037861 / 0.037052 (0.000809)	0.311643 / 0.258489 (0.053154)	0.343504 / 0.293841 (0.049663)	0.033768 / 0.128546 (-0.094778)	0.011342 / 0.075646 (-0.064304)	0.323953 / 0.419271 (-0.095319)	0.040818 / 0.043533 (-0.002715)	0.298492 / 0.255139 (0.043353)	0.327292 / 0.283200 (0.044092)	0.088423 / 0.141683 (-0.053260)	1.489520 / 1.452155 (0.037366)	1.532962 / 1.492716 (0.040245)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.223654 / 0.018006 (0.205647)	0.415134 / 0.000490 (0.414644)	0.007394 / 0.000200 (0.007194)	0.000080 / 0.000054 (0.000026)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.023616 / 0.037411 (-0.013795)	0.096652 / 0.014526 (0.082126)	0.105239 / 0.176557 (-0.071318)	0.148637 / 0.737135 (-0.588498)	0.107937 / 0.296338 (-0.188402)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.426816 / 0.215209 (0.211607)	4.241533 / 2.077655 (2.163878)	1.946493 / 1.504120 (0.442373)	1.735765 / 1.541195 (0.194570)	1.781424 / 1.468490 (0.312934)	0.688082 / 4.584777 (-3.896694)	3.396444 / 3.745712 (-0.349268)	1.920333 / 5.269862 (-3.349528)	1.293833 / 4.565676 (-3.271843)	0.081967 / 0.424275 (-0.342308)	0.012911 / 0.007607 (0.005304)	0.536928 / 0.226044 (0.310884)	5.452327 / 2.268929 (3.183399)	2.505785 / 55.444624 (-52.938840)	2.173627 / 6.876477 (-4.702850)	2.119978 / 2.142072 (-0.022095)	0.809012 / 4.805227 (-3.996215)	0.149124 / 6.500664 (-6.351540)	0.066008 / 0.075469 (-0.009461)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.215702 / 1.841788 (-0.626085)	13.757525 / 8.074308 (5.683217)	13.999208 / 10.191392 (3.807816)	0.164875 / 0.680424 (-0.515549)	0.028517 / 0.534201 (-0.505684)	0.394829 / 0.579283 (-0.184454)	0.404962 / 0.434364 (-0.029401)	0.484455 / 0.540337 (-0.055882)	0.575008 / 1.386936 (-0.811928)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006754 / 0.011353 (-0.004598)	0.004579 / 0.011008 (-0.006430)	0.076617 / 0.038508 (0.038109)	0.027902 / 0.023109 (0.004793)	0.346278 / 0.275898 (0.070380)	0.398060 / 0.323480 (0.074580)	0.004938 / 0.007986 (-0.003047)	0.004681 / 0.004328 (0.000353)	0.076336 / 0.004250 (0.072086)	0.038018 / 0.037052 (0.000966)	0.358701 / 0.258489 (0.100212)	0.408413 / 0.293841 (0.114572)	0.031772 / 0.128546 (-0.096774)	0.011604 / 0.075646 (-0.064042)	0.085964 / 0.419271 (-0.333308)	0.042030 / 0.043533 (-0.001502)	0.343568 / 0.255139 (0.088429)	0.381805 / 0.283200 (0.098605)	0.090759 / 0.141683 (-0.050924)	1.504553 / 1.452155 (0.052398)	1.594006 / 1.492716 (0.101289)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.227395 / 0.018006 (0.209389)	0.403097 / 0.000490 (0.402608)	0.000413 / 0.000200 (0.000213)	0.000060 / 0.000054 (0.000006)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024693 / 0.037411 (-0.012718)	0.100470 / 0.014526 (0.085944)	0.108481 / 0.176557 (-0.068076)	0.142791 / 0.737135 (-0.594345)	0.109949 / 0.296338 (-0.186389)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.443674 / 0.215209 (0.228465)	4.412207 / 2.077655 (2.334553)	2.073752 / 1.504120 (0.569632)	1.863153 / 1.541195 (0.321958)	1.940063 / 1.468490 (0.471573)	0.696456 / 4.584777 (-3.888321)	3.422120 / 3.745712 (-0.323592)	1.902579 / 5.269862 (-3.367282)	1.184948 / 4.565676 (-3.380729)	0.083079 / 0.424275 (-0.341196)	0.012649 / 0.007607 (0.005042)	0.542035 / 0.226044 (0.315991)	5.421826 / 2.268929 (3.152897)	2.525092 / 55.444624 (-52.919532)	2.177144 / 6.876477 (-4.699332)	2.225224 / 2.142072 (0.083151)	0.804739 / 4.805227 (-4.000488)	0.151000 / 6.500664 (-6.349664)	0.066987 / 0.075469 (-0.008482)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.277199 / 1.841788 (-0.564589)	14.184146 / 8.074308 (6.109838)	13.413348 / 10.191392 (3.221956)	0.128551 / 0.680424 (-0.551872)	0.016461 / 0.534201 (-0.517740)	0.379963 / 0.579283 (-0.199320)	0.381350 / 0.434364 (-0.053014)	0.439044 / 0.540337 (-0.101293)	0.521559 / 1.386936 (-0.865377)

github-actions · 2023-02-08T13:57:11Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008876 / 0.011353 (-0.002477)	0.004629 / 0.011008 (-0.006379)	0.101697 / 0.038508 (0.063189)	0.030373 / 0.023109 (0.007264)	0.302206 / 0.275898 (0.026308)	0.365835 / 0.323480 (0.042355)	0.007877 / 0.007986 (-0.000109)	0.004473 / 0.004328 (0.000144)	0.077334 / 0.004250 (0.073084)	0.038066 / 0.037052 (0.001014)	0.308064 / 0.258489 (0.049575)	0.347329 / 0.293841 (0.053488)	0.034478 / 0.128546 (-0.094068)	0.011651 / 0.075646 (-0.063995)	0.323481 / 0.419271 (-0.095791)	0.043515 / 0.043533 (-0.000018)	0.299885 / 0.255139 (0.044746)	0.328959 / 0.283200 (0.045760)	0.095308 / 0.141683 (-0.046375)	1.474058 / 1.452155 (0.021903)	1.535335 / 1.492716 (0.042619)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.197416 / 0.018006 (0.179410)	0.421935 / 0.000490 (0.421446)	0.003490 / 0.000200 (0.003290)	0.000074 / 0.000054 (0.000020)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024519 / 0.037411 (-0.012892)	0.100710 / 0.014526 (0.086185)	0.104520 / 0.176557 (-0.072036)	0.142048 / 0.737135 (-0.595087)	0.109274 / 0.296338 (-0.187064)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.408766 / 0.215209 (0.193557)	4.101720 / 2.077655 (2.024065)	1.812375 / 1.504120 (0.308256)	1.605819 / 1.541195 (0.064624)	1.688923 / 1.468490 (0.220433)	0.691198 / 4.584777 (-3.893579)	3.422137 / 3.745712 (-0.323575)	1.921318 / 5.269862 (-3.348544)	1.168770 / 4.565676 (-3.396906)	0.082840 / 0.424275 (-0.341435)	0.012740 / 0.007607 (0.005133)	0.524333 / 0.226044 (0.298289)	5.258077 / 2.268929 (2.989149)	2.273177 / 55.444624 (-53.171447)	1.931919 / 6.876477 (-4.944558)	1.988415 / 2.142072 (-0.153658)	0.812227 / 4.805227 (-3.993000)	0.150043 / 6.500664 (-6.350622)	0.066422 / 0.075469 (-0.009047)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.188069 / 1.841788 (-0.653718)	13.942681 / 8.074308 (5.868373)	14.104658 / 10.191392 (3.913266)	0.151966 / 0.680424 (-0.528458)	0.028833 / 0.534201 (-0.505368)	0.395125 / 0.579283 (-0.184158)	0.408512 / 0.434364 (-0.025852)	0.487587 / 0.540337 (-0.052751)	0.570023 / 1.386936 (-0.816913)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006860 / 0.011353 (-0.004493)	0.004582 / 0.011008 (-0.006426)	0.079902 / 0.038508 (0.041394)	0.027565 / 0.023109 (0.004456)	0.341393 / 0.275898 (0.065495)	0.378911 / 0.323480 (0.055431)	0.005847 / 0.007986 (-0.002138)	0.004681 / 0.004328 (0.000353)	0.079422 / 0.004250 (0.075171)	0.039135 / 0.037052 (0.002083)	0.342026 / 0.258489 (0.083537)	0.387510 / 0.293841 (0.093669)	0.031999 / 0.128546 (-0.096547)	0.011782 / 0.075646 (-0.063865)	0.088563 / 0.419271 (-0.330709)	0.042435 / 0.043533 (-0.001098)	0.343055 / 0.255139 (0.087916)	0.367437 / 0.283200 (0.084237)	0.091578 / 0.141683 (-0.050104)	1.506828 / 1.452155 (0.054673)	1.599590 / 1.492716 (0.106874)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.217939 / 0.018006 (0.199932)	0.408352 / 0.000490 (0.407863)	0.000394 / 0.000200 (0.000194)	0.000063 / 0.000054 (0.000009)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.026344 / 0.037411 (-0.011067)	0.102968 / 0.014526 (0.088442)	0.110340 / 0.176557 (-0.066217)	0.145696 / 0.737135 (-0.591439)	0.111632 / 0.296338 (-0.184707)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.440764 / 0.215209 (0.225555)	4.423179 / 2.077655 (2.345524)	2.057016 / 1.504120 (0.552896)	1.848741 / 1.541195 (0.307546)	1.939827 / 1.468490 (0.471337)	0.699370 / 4.584777 (-3.885407)	3.472521 / 3.745712 (-0.273191)	3.232557 / 5.269862 (-2.037305)	1.755534 / 4.565676 (-2.810143)	0.083469 / 0.424275 (-0.340807)	0.012980 / 0.007607 (0.005373)	0.557662 / 0.226044 (0.331618)	5.435657 / 2.268929 (3.166729)	2.545106 / 55.444624 (-52.899519)	2.168047 / 6.876477 (-4.708430)	2.234070 / 2.142072 (0.091997)	0.804662 / 4.805227 (-4.000565)	0.152832 / 6.500664 (-6.347833)	0.069372 / 0.075469 (-0.006097)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.299189 / 1.841788 (-0.542598)	14.752880 / 8.074308 (6.678572)	13.607676 / 10.191392 (3.416284)	0.150773 / 0.680424 (-0.529650)	0.016701 / 0.534201 (-0.517500)	0.379507 / 0.579283 (-0.199776)	0.389401 / 0.434364 (-0.044963)	0.444199 / 0.540337 (-0.096139)	0.524264 / 1.386936 (-0.862672)

github-actions · 2023-02-08T14:16:49Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008694 / 0.011353 (-0.002659)	0.004549 / 0.011008 (-0.006459)	0.101164 / 0.038508 (0.062656)	0.029644 / 0.023109 (0.006535)	0.294849 / 0.275898 (0.018950)	0.366755 / 0.323480 (0.043275)	0.007205 / 0.007986 (-0.000780)	0.004255 / 0.004328 (-0.000074)	0.077433 / 0.004250 (0.073183)	0.038024 / 0.037052 (0.000972)	0.310380 / 0.258489 (0.051891)	0.347093 / 0.293841 (0.053252)	0.033232 / 0.128546 (-0.095314)	0.011404 / 0.075646 (-0.064242)	0.323341 / 0.419271 (-0.095930)	0.040586 / 0.043533 (-0.002946)	0.296083 / 0.255139 (0.040944)	0.321870 / 0.283200 (0.038671)	0.087377 / 0.141683 (-0.054306)	1.466869 / 1.452155 (0.014715)	1.514763 / 1.492716 (0.022046)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.010272 / 0.018006 (-0.007734)	0.414645 / 0.000490 (0.414155)	0.003730 / 0.000200 (0.003530)	0.000076 / 0.000054 (0.000021)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024093 / 0.037411 (-0.013318)	0.098718 / 0.014526 (0.084192)	0.105526 / 0.176557 (-0.071030)	0.141578 / 0.737135 (-0.595557)	0.109679 / 0.296338 (-0.186660)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.412907 / 0.215209 (0.197698)	4.134934 / 2.077655 (2.057280)	1.881180 / 1.504120 (0.377060)	1.693207 / 1.541195 (0.152012)	1.753725 / 1.468490 (0.285235)	0.693077 / 4.584777 (-3.891700)	3.367409 / 3.745712 (-0.378303)	2.749035 / 5.269862 (-2.520827)	1.565015 / 4.565676 (-3.000662)	0.082609 / 0.424275 (-0.341666)	0.012500 / 0.007607 (0.004892)	0.523619 / 0.226044 (0.297575)	5.250188 / 2.268929 (2.981259)	2.314255 / 55.444624 (-53.130369)	1.962357 / 6.876477 (-4.914120)	2.020632 / 2.142072 (-0.121441)	0.812504 / 4.805227 (-3.992724)	0.149921 / 6.500664 (-6.350743)	0.065816 / 0.075469 (-0.009653)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.230811 / 1.841788 (-0.610977)	14.008566 / 8.074308 (5.934258)	14.371285 / 10.191392 (4.179893)	0.166323 / 0.680424 (-0.514101)	0.029702 / 0.534201 (-0.504499)	0.408629 / 0.579283 (-0.170654)	0.410529 / 0.434364 (-0.023835)	0.484482 / 0.540337 (-0.055855)	0.572360 / 1.386936 (-0.814576)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006873 / 0.011353 (-0.004480)	0.004609 / 0.011008 (-0.006400)	0.075492 / 0.038508 (0.036984)	0.028560 / 0.023109 (0.005450)	0.340321 / 0.275898 (0.064423)	0.376758 / 0.323480 (0.053278)	0.005271 / 0.007986 (-0.002715)	0.004786 / 0.004328 (0.000457)	0.074843 / 0.004250 (0.070592)	0.041072 / 0.037052 (0.004019)	0.339952 / 0.258489 (0.081463)	0.384375 / 0.293841 (0.090534)	0.031771 / 0.128546 (-0.096775)	0.011607 / 0.075646 (-0.064039)	0.084338 / 0.419271 (-0.334933)	0.042251 / 0.043533 (-0.001282)	0.338904 / 0.255139 (0.083765)	0.365360 / 0.283200 (0.082160)	0.093151 / 0.141683 (-0.048532)	1.449833 / 1.452155 (-0.002322)	1.601946 / 1.492716 (0.109229)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.225149 / 0.018006 (0.207142)	0.409855 / 0.000490 (0.409365)	0.000384 / 0.000200 (0.000184)	0.000060 / 0.000054 (0.000006)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025914 / 0.037411 (-0.011497)	0.100443 / 0.014526 (0.085917)	0.108557 / 0.176557 (-0.067999)	0.150338 / 0.737135 (-0.586798)	0.111472 / 0.296338 (-0.184866)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.440221 / 0.215209 (0.225012)	4.409268 / 2.077655 (2.331613)	2.096008 / 1.504120 (0.591888)	1.849443 / 1.541195 (0.308248)	1.934901 / 1.468490 (0.466410)	0.704072 / 4.584777 (-3.880705)	3.371370 / 3.745712 (-0.374343)	3.185478 / 5.269862 (-2.084384)	1.514541 / 4.565676 (-3.051135)	0.083724 / 0.424275 (-0.340551)	0.012674 / 0.007607 (0.005067)	0.542155 / 0.226044 (0.316111)	5.413456 / 2.268929 (3.144528)	2.508567 / 55.444624 (-52.936057)	2.163235 / 6.876477 (-4.713242)	2.193914 / 2.142072 (0.051842)	0.810955 / 4.805227 (-3.994272)	0.152769 / 6.500664 (-6.347895)	0.068009 / 0.075469 (-0.007460)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.272511 / 1.841788 (-0.569276)	14.334861 / 8.074308 (6.260553)	13.555445 / 10.191392 (3.364053)	0.160520 / 0.680424 (-0.519904)	0.018363 / 0.534201 (-0.515838)	0.384937 / 0.579283 (-0.194346)	0.409138 / 0.434364 (-0.025225)	0.484037 / 0.540337 (-0.056300)	0.565595 / 1.386936 (-0.821341)

github-actions · 2023-02-08T14:38:34Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.010077 / 0.011353 (-0.001276)	0.005650 / 0.011008 (-0.005359)	0.101285 / 0.038508 (0.062777)	0.039571 / 0.023109 (0.016462)	0.291855 / 0.275898 (0.015957)	0.363582 / 0.323480 (0.040102)	0.008513 / 0.007986 (0.000527)	0.004472 / 0.004328 (0.000144)	0.077314 / 0.004250 (0.073064)	0.050707 / 0.037052 (0.013654)	0.317282 / 0.258489 (0.058792)	0.342348 / 0.293841 (0.048507)	0.042951 / 0.128546 (-0.085595)	0.012295 / 0.075646 (-0.063351)	0.337269 / 0.419271 (-0.082003)	0.048953 / 0.043533 (0.005420)	0.292547 / 0.255139 (0.037408)	0.325436 / 0.283200 (0.042236)	0.111859 / 0.141683 (-0.029824)	1.501958 / 1.452155 (0.049804)	1.522281 / 1.492716 (0.029565)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.011775 / 0.018006 (-0.006231)	0.513283 / 0.000490 (0.512793)	0.002941 / 0.000200 (0.002741)	0.000099 / 0.000054 (0.000044)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.028702 / 0.037411 (-0.008710)	0.108465 / 0.014526 (0.093940)	0.121806 / 0.176557 (-0.054750)	0.158424 / 0.737135 (-0.578712)	0.128077 / 0.296338 (-0.168262)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.395392 / 0.215209 (0.180183)	3.944138 / 2.077655 (1.866483)	1.773698 / 1.504120 (0.269578)	1.588907 / 1.541195 (0.047712)	1.697794 / 1.468490 (0.229304)	0.690281 / 4.584777 (-3.894496)	3.819661 / 3.745712 (0.073948)	3.228006 / 5.269862 (-2.041856)	1.755625 / 4.565676 (-2.810052)	0.083169 / 0.424275 (-0.341106)	0.012337 / 0.007607 (0.004730)	0.504730 / 0.226044 (0.278686)	5.016916 / 2.268929 (2.747988)	2.245484 / 55.444624 (-53.199141)	1.911682 / 6.876477 (-4.964795)	1.957659 / 2.142072 (-0.184413)	0.818361 / 4.805227 (-3.986866)	0.162386 / 6.500664 (-6.338279)	0.062461 / 0.075469 (-0.013008)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.197654 / 1.841788 (-0.644134)	15.465611 / 8.074308 (7.391303)	14.409126 / 10.191392 (4.217734)	0.171776 / 0.680424 (-0.508647)	0.028749 / 0.534201 (-0.505452)	0.439666 / 0.579283 (-0.139618)	0.445159 / 0.434364 (0.010795)	0.543992 / 0.540337 (0.003655)	0.643911 / 1.386936 (-0.743025)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007036 / 0.011353 (-0.004317)	0.005273 / 0.011008 (-0.005735)	0.075314 / 0.038508 (0.036806)	0.033075 / 0.023109 (0.009966)	0.350133 / 0.275898 (0.074235)	0.399366 / 0.323480 (0.075886)	0.005945 / 0.007986 (-0.002041)	0.004276 / 0.004328 (-0.000052)	0.074975 / 0.004250 (0.070725)	0.051758 / 0.037052 (0.014706)	0.355077 / 0.258489 (0.096588)	0.430296 / 0.293841 (0.136455)	0.036257 / 0.128546 (-0.092290)	0.012376 / 0.075646 (-0.063270)	0.087441 / 0.419271 (-0.331830)	0.049066 / 0.043533 (0.005534)	0.339867 / 0.255139 (0.084728)	0.384379 / 0.283200 (0.101179)	0.104843 / 0.141683 (-0.036840)	1.498897 / 1.452155 (0.046742)	1.551400 / 1.492716 (0.058684)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.334504 / 0.018006 (0.316498)	0.516551 / 0.000490 (0.516061)	0.000450 / 0.000200 (0.000250)	0.000057 / 0.000054 (0.000003)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.029313 / 0.037411 (-0.008099)	0.110667 / 0.014526 (0.096141)	0.124001 / 0.176557 (-0.052556)	0.159154 / 0.737135 (-0.577981)	0.129503 / 0.296338 (-0.166836)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.416749 / 0.215209 (0.201540)	4.171163 / 2.077655 (2.093508)	1.981071 / 1.504120 (0.476951)	1.788303 / 1.541195 (0.247108)	1.912118 / 1.468490 (0.443628)	0.708764 / 4.584777 (-3.876013)	3.815222 / 3.745712 (0.069510)	2.121633 / 5.269862 (-3.148229)	1.347866 / 4.565676 (-3.217811)	0.086340 / 0.424275 (-0.337935)	0.012646 / 0.007607 (0.005039)	0.525286 / 0.226044 (0.299241)	5.254922 / 2.268929 (2.985994)	2.488743 / 55.444624 (-52.955881)	2.128069 / 6.876477 (-4.748408)	2.180358 / 2.142072 (0.038286)	0.841011 / 4.805227 (-3.964216)	0.168732 / 6.500664 (-6.331932)	0.065559 / 0.075469 (-0.009910)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.270518 / 1.841788 (-0.571270)	15.557563 / 8.074308 (7.483255)	13.660757 / 10.191392 (3.469365)	0.185636 / 0.680424 (-0.494788)	0.018152 / 0.534201 (-0.516049)	0.423553 / 0.579283 (-0.155730)	0.412718 / 0.434364 (-0.021646)	0.528455 / 0.540337 (-0.011882)	0.635274 / 1.386936 (-0.751662)

github-actions · 2023-02-08T16:16:03Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.011194 / 0.011353 (-0.000159)	0.006344 / 0.011008 (-0.004664)	0.122013 / 0.038508 (0.083505)	0.044323 / 0.023109 (0.021214)	0.356665 / 0.275898 (0.080767)	0.439871 / 0.323480 (0.116391)	0.010694 / 0.007986 (0.002709)	0.004648 / 0.004328 (0.000320)	0.091140 / 0.004250 (0.086890)	0.052457 / 0.037052 (0.015404)	0.369282 / 0.258489 (0.110793)	0.403279 / 0.293841 (0.109438)	0.054075 / 0.128546 (-0.074472)	0.014484 / 0.075646 (-0.061162)	0.407932 / 0.419271 (-0.011340)	0.060681 / 0.043533 (0.017148)	0.350889 / 0.255139 (0.095750)	0.392041 / 0.283200 (0.108841)	0.121252 / 0.141683 (-0.020431)	1.809527 / 1.452155 (0.357373)	1.835141 / 1.492716 (0.342425)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.227372 / 0.018006 (0.209366)	0.481908 / 0.000490 (0.481418)	0.007262 / 0.000200 (0.007062)	0.000148 / 0.000054 (0.000093)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.031039 / 0.037411 (-0.006372)	0.133947 / 0.014526 (0.119421)	0.141935 / 0.176557 (-0.034622)	0.197854 / 0.737135 (-0.539281)	0.152393 / 0.296338 (-0.143945)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.517400 / 0.215209 (0.302191)	4.899972 / 2.077655 (2.822317)	2.171023 / 1.504120 (0.666903)	2.008706 / 1.541195 (0.467511)	1.988777 / 1.468490 (0.520287)	0.859872 / 4.584777 (-3.724905)	4.673923 / 3.745712 (0.928211)	2.703189 / 5.269862 (-2.566672)	1.891680 / 4.565676 (-2.673997)	0.109601 / 0.424275 (-0.314674)	0.014622 / 0.007607 (0.007015)	0.618990 / 0.226044 (0.392946)	6.255608 / 2.268929 (3.986679)	2.822199 / 55.444624 (-52.622425)	2.457684 / 6.876477 (-4.418793)	2.500041 / 2.142072 (0.357968)	1.054529 / 4.805227 (-3.750698)	0.209501 / 6.500664 (-6.291163)	0.074929 / 0.075469 (-0.000540)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.532780 / 1.841788 (-0.309008)	19.159455 / 8.074308 (11.085147)	17.817063 / 10.191392 (7.625671)	0.194078 / 0.680424 (-0.486346)	0.038211 / 0.534201 (-0.495990)	0.537366 / 0.579283 (-0.041917)	0.538995 / 0.434364 (0.104631)	0.679431 / 0.540337 (0.139094)	0.801960 / 1.386936 (-0.584976)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008729 / 0.011353 (-0.002624)	0.005711 / 0.011008 (-0.005297)	0.091570 / 0.038508 (0.053062)	0.039805 / 0.023109 (0.016696)	0.413507 / 0.275898 (0.137609)	0.456342 / 0.323480 (0.132862)	0.006201 / 0.007986 (-0.001785)	0.009700 / 0.004328 (0.005372)	0.089146 / 0.004250 (0.084896)	0.057543 / 0.037052 (0.020490)	0.420806 / 0.258489 (0.162317)	0.471962 / 0.293841 (0.178121)	0.043940 / 0.128546 (-0.084606)	0.014457 / 0.075646 (-0.061190)	0.106674 / 0.419271 (-0.312598)	0.058930 / 0.043533 (0.015397)	0.419111 / 0.255139 (0.163972)	0.452974 / 0.283200 (0.169774)	0.124573 / 0.141683 (-0.017110)	1.864753 / 1.452155 (0.412599)	1.935387 / 1.492716 (0.442670)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.275657 / 0.018006 (0.257651)	0.498096 / 0.000490 (0.497606)	0.000480 / 0.000200 (0.000280)	0.000066 / 0.000054 (0.000012)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.034377 / 0.037411 (-0.003035)	0.138050 / 0.014526 (0.123524)	0.153718 / 0.176557 (-0.022838)	0.201445 / 0.737135 (-0.535690)	0.160346 / 0.296338 (-0.135992)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.540670 / 0.215209 (0.325461)	5.376291 / 2.077655 (3.298636)	2.581799 / 1.504120 (1.077679)	2.328858 / 1.541195 (0.787663)	2.446458 / 1.468490 (0.977968)	0.923005 / 4.584777 (-3.661772)	4.815977 / 3.745712 (1.070265)	4.205725 / 5.269862 (-1.064137)	2.400466 / 4.565676 (-2.165211)	0.107207 / 0.424275 (-0.317068)	0.015427 / 0.007607 (0.007819)	0.657267 / 0.226044 (0.431222)	6.491256 / 2.268929 (4.222327)	3.179099 / 55.444624 (-52.265525)	2.722434 / 6.876477 (-4.154042)	2.788202 / 2.142072 (0.646129)	1.060016 / 4.805227 (-3.745211)	0.206899 / 6.500664 (-6.293766)	0.077868 / 0.075469 (0.002399)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.567894 / 1.841788 (-0.273893)	19.314330 / 8.074308 (11.240022)	17.597614 / 10.191392 (7.406222)	0.195777 / 0.680424 (-0.484647)	0.022160 / 0.534201 (-0.512041)	0.530592 / 0.579283 (-0.048691)	0.508591 / 0.434364 (0.074227)	0.619794 / 0.540337 (0.079457)	0.749773 / 1.386936 (-0.637163)

github-actions · 2023-02-09T14:44:29Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.012431 / 0.011353 (0.001078)	0.006526 / 0.011008 (-0.004482)	0.132266 / 0.038508 (0.093757)	0.043199 / 0.023109 (0.020089)	0.405230 / 0.275898 (0.129332)	0.494643 / 0.323480 (0.171163)	0.009927 / 0.007986 (0.001941)	0.005227 / 0.004328 (0.000899)	0.110914 / 0.004250 (0.106664)	0.047815 / 0.037052 (0.010763)	0.419099 / 0.258489 (0.160610)	0.463405 / 0.293841 (0.169564)	0.057858 / 0.128546 (-0.070688)	0.018918 / 0.075646 (-0.056728)	0.450584 / 0.419271 (0.031313)	0.060457 / 0.043533 (0.016924)	0.408234 / 0.255139 (0.153095)	0.433722 / 0.283200 (0.150523)	0.119403 / 0.141683 (-0.022280)	1.966742 / 1.452155 (0.514587)	1.980685 / 1.492716 (0.487969)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.292853 / 0.018006 (0.274847)	0.619697 / 0.000490 (0.619207)	0.002135 / 0.000200 (0.001935)	0.000117 / 0.000054 (0.000062)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.031283 / 0.037411 (-0.006129)	0.128649 / 0.014526 (0.114123)	0.150116 / 0.176557 (-0.026441)	0.187605 / 0.737135 (-0.549530)	0.153334 / 0.296338 (-0.143005)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.659660 / 0.215209 (0.444451)	6.459749 / 2.077655 (4.382094)	2.764566 / 1.504120 (1.260446)	2.362630 / 1.541195 (0.821435)	2.426421 / 1.468490 (0.957931)	1.282407 / 4.584777 (-3.302370)	5.668865 / 3.745712 (1.923153)	3.236255 / 5.269862 (-2.033606)	2.248836 / 4.565676 (-2.316841)	0.145861 / 0.424275 (-0.278414)	0.015707 / 0.007607 (0.008100)	0.805218 / 0.226044 (0.579174)	8.146831 / 2.268929 (5.877903)	3.506283 / 55.444624 (-51.938341)	2.736682 / 6.876477 (-4.139795)	2.959039 / 2.142072 (0.816967)	1.528428 / 4.805227 (-3.276799)	0.270980 / 6.500664 (-6.229684)	0.086824 / 0.075469 (0.011355)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.682506 / 1.841788 (-0.159282)	18.844103 / 8.074308 (10.769795)	21.008471 / 10.191392 (10.817079)	0.258372 / 0.680424 (-0.422052)	0.046505 / 0.534201 (-0.487696)	0.574760 / 0.579283 (-0.004523)	0.663745 / 0.434364 (0.229381)	0.702411 / 0.540337 (0.162074)	0.824024 / 1.386936 (-0.562912)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.010016 / 0.011353 (-0.001337)	0.007459 / 0.011008 (-0.003549)	0.103954 / 0.038508 (0.065446)	0.036363 / 0.023109 (0.013254)	0.464079 / 0.275898 (0.188181)	0.504730 / 0.323480 (0.181250)	0.007865 / 0.007986 (-0.000121)	0.005210 / 0.004328 (0.000882)	0.105018 / 0.004250 (0.100767)	0.062191 / 0.037052 (0.025139)	0.483304 / 0.258489 (0.224815)	0.547030 / 0.293841 (0.253189)	0.055436 / 0.128546 (-0.073110)	0.021073 / 0.075646 (-0.054573)	0.120952 / 0.419271 (-0.298319)	0.075593 / 0.043533 (0.032060)	0.459930 / 0.255139 (0.204791)	0.486924 / 0.283200 (0.203724)	0.129465 / 0.141683 (-0.012218)	1.902322 / 1.452155 (0.450167)	1.980809 / 1.492716 (0.488092)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.259263 / 0.018006 (0.241257)	0.596703 / 0.000490 (0.596213)	0.004520 / 0.000200 (0.004320)	0.000124 / 0.000054 (0.000070)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.032802 / 0.037411 (-0.004609)	0.138751 / 0.014526 (0.124225)	0.147106 / 0.176557 (-0.029451)	0.194791 / 0.737135 (-0.542345)	0.152643 / 0.296338 (-0.143696)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.678455 / 0.215209 (0.463246)	6.673643 / 2.077655 (4.595989)	2.943368 / 1.504120 (1.439248)	2.591223 / 1.541195 (1.050029)	2.741097 / 1.468490 (1.272607)	1.261178 / 4.584777 (-3.323599)	5.773853 / 3.745712 (2.028141)	3.171559 / 5.269862 (-2.098303)	2.124898 / 4.565676 (-2.440779)	0.161849 / 0.424275 (-0.262426)	0.015498 / 0.007607 (0.007891)	0.857984 / 0.226044 (0.631940)	8.456946 / 2.268929 (6.188018)	3.818787 / 55.444624 (-51.625837)	3.009953 / 6.876477 (-3.866523)	3.113006 / 2.142072 (0.970934)	1.477299 / 4.805227 (-3.327929)	0.267207 / 6.500664 (-6.233457)	0.087590 / 0.075469 (0.012121)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.757389 / 1.841788 (-0.084398)	19.287690 / 8.074308 (11.213381)	21.601991 / 10.191392 (11.410599)	0.260464 / 0.680424 (-0.419960)	0.028552 / 0.534201 (-0.505649)	0.558934 / 0.579283 (-0.020349)	0.673651 / 0.434364 (0.239287)	0.714448 / 0.540337 (0.174111)	0.857608 / 1.386936 (-0.529328)

lhoestq · 2023-02-09T16:28:24Z

Ready for review @mariosasko, LMKWYT :)

Sorry it tooks me a few tries to fix the CI - I ended up not trying to use the latest torch version in the CI.

mariosasko

Thanks!

tests/test_arrow_dataset.py

Co-authored-by: Mario Šaško <mariosasko777@gmail.com>

github-actions · 2023-02-19T18:35:09Z

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009474 / 0.011353 (-0.001878)	0.005507 / 0.011008 (-0.005501)	0.101219 / 0.038508 (0.062711)	0.035591 / 0.023109 (0.012481)	0.305841 / 0.275898 (0.029943)	0.339135 / 0.323480 (0.015656)	0.007920 / 0.007986 (-0.000066)	0.004252 / 0.004328 (-0.000077)	0.076912 / 0.004250 (0.072662)	0.041923 / 0.037052 (0.004871)	0.301405 / 0.258489 (0.042916)	0.356488 / 0.293841 (0.062647)	0.039342 / 0.128546 (-0.089204)	0.012711 / 0.075646 (-0.062935)	0.334193 / 0.419271 (-0.085079)	0.049112 / 0.043533 (0.005579)	0.301484 / 0.255139 (0.046345)	0.315306 / 0.283200 (0.032106)	0.102959 / 0.141683 (-0.038724)	1.420677 / 1.452155 (-0.031478)	1.549493 / 1.492716 (0.056777)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.284639 / 0.018006 (0.266633)	0.501226 / 0.000490 (0.500736)	0.004328 / 0.000200 (0.004128)	0.000091 / 0.000054 (0.000036)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.027034 / 0.037411 (-0.010377)	0.108066 / 0.014526 (0.093540)	0.122106 / 0.176557 (-0.054451)	0.162908 / 0.737135 (-0.574227)	0.127233 / 0.296338 (-0.169105)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.394023 / 0.215209 (0.178813)	3.932729 / 2.077655 (1.855075)	1.771195 / 1.504120 (0.267075)	1.582788 / 1.541195 (0.041594)	1.703219 / 1.468490 (0.234728)	0.702629 / 4.584777 (-3.882148)	3.780187 / 3.745712 (0.034475)	2.180433 / 5.269862 (-3.089428)	1.504806 / 4.565676 (-3.060871)	0.085289 / 0.424275 (-0.338986)	0.012580 / 0.007607 (0.004973)	0.515408 / 0.226044 (0.289363)	5.010613 / 2.268929 (2.741685)	2.256648 / 55.444624 (-53.187976)	1.914971 / 6.876477 (-4.961505)	2.038436 / 2.142072 (-0.103636)	0.846240 / 4.805227 (-3.958987)	0.164920 / 6.500664 (-6.335744)	0.063899 / 0.075469 (-0.011570)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.224160 / 1.841788 (-0.617627)	15.089995 / 8.074308 (7.015687)	14.777003 / 10.191392 (4.585611)	0.169873 / 0.680424 (-0.510551)	0.029233 / 0.534201 (-0.504968)	0.445424 / 0.579283 (-0.133859)	0.439194 / 0.434364 (0.004830)	0.536370 / 0.540337 (-0.003968)	0.636694 / 1.386936 (-0.750242)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008230 / 0.011353 (-0.003122)	0.005499 / 0.011008 (-0.005509)	0.076108 / 0.038508 (0.037600)	0.037444 / 0.023109 (0.014335)	0.364420 / 0.275898 (0.088522)	0.412308 / 0.323480 (0.088828)	0.006704 / 0.007986 (-0.001282)	0.004359 / 0.004328 (0.000031)	0.075080 / 0.004250 (0.070830)	0.057698 / 0.037052 (0.020646)	0.366088 / 0.258489 (0.107599)	0.409583 / 0.293841 (0.115742)	0.037882 / 0.128546 (-0.090664)	0.012421 / 0.075646 (-0.063225)	0.087701 / 0.419271 (-0.331571)	0.050669 / 0.043533 (0.007136)	0.351139 / 0.255139 (0.096000)	0.384340 / 0.283200 (0.101140)	0.108097 / 0.141683 (-0.033586)	1.445010 / 1.452155 (-0.007145)	1.559570 / 1.492716 (0.066853)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.324114 / 0.018006 (0.306108)	0.549134 / 0.000490 (0.548644)	0.003544 / 0.000200 (0.003344)	0.000097 / 0.000054 (0.000042)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.030646 / 0.037411 (-0.006765)	0.108573 / 0.014526 (0.094047)	0.125291 / 0.176557 (-0.051266)	0.174798 / 0.737135 (-0.562338)	0.128000 / 0.296338 (-0.168338)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.428881 / 0.215209 (0.213672)	4.282320 / 2.077655 (2.204665)	2.061462 / 1.504120 (0.557342)	1.858477 / 1.541195 (0.317283)	1.971646 / 1.468490 (0.503156)	0.723631 / 4.584777 (-3.861146)	3.822376 / 3.745712 (0.076664)	2.174427 / 5.269862 (-3.095434)	1.386066 / 4.565676 (-3.179611)	0.088391 / 0.424275 (-0.335884)	0.012948 / 0.007607 (0.005341)	0.524423 / 0.226044 (0.298378)	5.249389 / 2.268929 (2.980460)	2.528662 / 55.444624 (-52.915962)	2.245329 / 6.876477 (-4.631147)	2.402733 / 2.142072 (0.260660)	0.868864 / 4.805227 (-3.936364)	0.174066 / 6.500664 (-6.326598)	0.066165 / 0.075469 (-0.009304)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.296922 / 1.841788 (-0.544865)	15.814109 / 8.074308 (7.739801)	14.086059 / 10.191392 (3.894667)	0.190952 / 0.680424 (-0.489472)	0.017679 / 0.534201 (-0.516522)	0.428872 / 0.579283 (-0.150411)	0.435399 / 0.434364 (0.001035)	0.540856 / 0.540337 (0.000519)	0.648904 / 1.386936 (-0.738032)

* speed up batched torch dataloader * use latest torch * style * fix * update torchaudio as well * dont use latest torch in CI * Update tests/test_arrow_dataset.py Co-authored-by: Mario Šaško <mariosasko777@gmail.com> --------- Co-authored-by: Mario Šaško <mariosasko777@gmail.com>

This reverts commit 41086b1.

speed up batched torch dataloader

4f3c152

lhoestq requested a review from mariosasko February 8, 2023 13:39

use latest torch

12be850

style

23f076e

fix

d40f05e

lhoestq marked this pull request as draft February 8, 2023 16:06

update torchaudio as well

8637141

dont use latest torch in CI

2d3bd01

lhoestq marked this pull request as ready for review February 9, 2023 16:27

mariosasko approved these changes Feb 9, 2023

View reviewed changes

tests/test_arrow_dataset.py Outdated Show resolved Hide resolved

Update tests/test_arrow_dataset.py

141adf8

Co-authored-by: Mario Šaško <mariosasko777@gmail.com>

lhoestq merged commit f401758 into main Feb 19, 2023

lhoestq deleted the speed-up-batched-dataloader branch February 19, 2023 18:27

AJDERS added a commit to AJDERS/datasets that referenced this pull request Feb 21, 2023

Revert "Speed up batched PyTorch DataLoader (huggingface#5512)"

e170add

This reverts commit 41086b1.

Speed up batched PyTorch DataLoader #5512

Speed up batched PyTorch DataLoader #5512

Conversation

lhoestq commented Feb 8, 2023

HuggingFaceDocBuilderDev commented Feb 8, 2023 • edited Loading

github-actions bot commented Feb 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Feb 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Feb 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Feb 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Feb 8, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

github-actions bot commented Feb 9, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

lhoestq commented Feb 9, 2023

mariosasko left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 19, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

HuggingFaceDocBuilderDev commented Feb 8, 2023 •

edited

Loading