Skip to content

Commit

Permalink
Update setup.py
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq authored Feb 12, 2021
1 parent 9df937b commit 1cb0ef4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
# for saving datsets to local
"fsspec",
# To get datasets from the Datasets Hub on huggingface.co
"huggingface_hub==0.0.1",
"huggingface_hub==0.0.2",
]

BENCHMARKS_REQUIRE = [
Expand Down

1 comment on commit 1cb0ef4

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018478 / 0.011353 (0.007125) 0.014820 / 0.011008 (0.003812) 0.046788 / 0.038508 (0.008280) 0.037631 / 0.023109 (0.014522) 0.211782 / 0.275898 (-0.064116) 0.235276 / 0.323480 (-0.088204) 0.005711 / 0.007986 (-0.002275) 0.004522 / 0.004328 (0.000193) 0.007683 / 0.004250 (0.003433) 0.051676 / 0.037052 (0.014623) 0.210681 / 0.258489 (-0.047809) 0.239774 / 0.293841 (-0.054067) 0.150019 / 0.128546 (0.021473) 0.118124 / 0.075646 (0.042478) 0.435583 / 0.419271 (0.016311) 0.410778 / 0.043533 (0.367245) 0.210529 / 0.255139 (-0.044610) 0.232733 / 0.283200 (-0.050467) 1.640455 / 0.141683 (1.498772) 1.871417 / 1.452155 (0.419262) 1.925725 / 1.492716 (0.433009)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.052011 / 0.037411 (0.014599) 0.019773 / 0.014526 (0.005247) 0.068005 / 0.176557 (-0.108552) 0.051770 / 0.737135 (-0.685366) 0.036383 / 0.296338 (-0.259956)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.232511 / 0.215209 (0.017301) 2.334800 / 2.077655 (0.257145) 1.284229 / 1.504120 (-0.219891) 1.168944 / 1.541195 (-0.372250) 1.222869 / 1.468490 (-0.245621) 6.482959 / 4.584777 (1.898182) 5.779338 / 3.745712 (2.033626) 8.043241 / 5.269862 (2.773379) 7.049787 / 4.565676 (2.484111) 0.633934 / 0.424275 (0.209659) 0.010465 / 0.007607 (0.002858) 0.260941 / 0.226044 (0.034897) 2.725552 / 2.268929 (0.456623) 1.741517 / 55.444624 (-53.703107) 1.551267 / 6.876477 (-5.325210) 1.606187 / 2.142072 (-0.535886) 6.554734 / 4.805227 (1.749506) 4.142588 / 6.500664 (-2.358076) 4.788191 / 0.075469 (4.712722)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.399767 / 1.841788 (8.557979) 15.510614 / 8.074308 (7.436306) 18.155945 / 10.191392 (7.964553) 0.634235 / 0.680424 (-0.046189) 0.294735 / 0.534201 (-0.239466) 0.782285 / 0.579283 (0.203002) 0.591547 / 0.434364 (0.157183) 0.707646 / 0.540337 (0.167309) 1.534675 / 1.386936 (0.147739)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017346 / 0.011353 (0.005993) 0.015107 / 0.011008 (0.004098) 0.047459 / 0.038508 (0.008951) 0.038229 / 0.023109 (0.015120) 0.373760 / 0.275898 (0.097862) 0.418732 / 0.323480 (0.095252) 0.005855 / 0.007986 (-0.002130) 0.004610 / 0.004328 (0.000281) 0.008701 / 0.004250 (0.004450) 0.051386 / 0.037052 (0.014333) 0.387427 / 0.258489 (0.128938) 0.440958 / 0.293841 (0.147117) 0.143688 / 0.128546 (0.015142) 0.119957 / 0.075646 (0.044310) 0.481217 / 0.419271 (0.061946) 0.414042 / 0.043533 (0.370509) 0.367473 / 0.255139 (0.112334) 0.392486 / 0.283200 (0.109286) 1.711883 / 0.141683 (1.570200) 1.933825 / 1.452155 (0.481670) 1.896942 / 1.492716 (0.404226)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045034 / 0.037411 (0.007622) 0.021554 / 0.014526 (0.007028) 0.028913 / 0.176557 (-0.147643) 0.049318 / 0.737135 (-0.687817) 0.076192 / 0.296338 (-0.220146)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.305395 / 0.215209 (0.090186) 3.064225 / 2.077655 (0.986570) 2.015416 / 1.504120 (0.511296) 1.903906 / 1.541195 (0.362711) 1.961258 / 1.468490 (0.492768) 6.467772 / 4.584777 (1.882995) 5.462570 / 3.745712 (1.716858) 7.981349 / 5.269862 (2.711487) 7.042866 / 4.565676 (2.477189) 0.634227 / 0.424275 (0.209952) 0.010466 / 0.007607 (0.002859) 0.337569 / 0.226044 (0.111525) 3.410467 / 2.268929 (1.141539) 2.362540 / 55.444624 (-53.082085) 2.188723 / 6.876477 (-4.687754) 2.249259 / 2.142072 (0.107186) 6.361915 / 4.805227 (1.556688) 4.032779 / 6.500664 (-2.467885) 6.540147 / 0.075469 (6.464678)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.648150 / 1.841788 (8.806363) 14.413235 / 8.074308 (6.338927) 18.911088 / 10.191392 (8.719696) 1.298861 / 0.680424 (0.618437) 0.667527 / 0.534201 (0.133326) 0.762036 / 0.579283 (0.182753) 0.564351 / 0.434364 (0.129987) 0.692788 / 0.540337 (0.152450) 1.544834 / 1.386936 (0.157898)

CML watermark

Please sign in to comment.