Skip to content

Commit

Permalink
Revert "Update github pr docs actions (#5214)"
Browse files Browse the repository at this point in the history
This reverts commit a805ec6.
  • Loading branch information
Mishig authored Nov 16, 2022
1 parent 27b4035 commit c144ccd
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 9 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,8 @@ concurrency:

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@use_hf_hub
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: datasets
secrets:
token: ${{ secrets.HF_DOC_PUSH }}
comment_bot_token: ${{ secrets.HUGGINGFACE_PUSH }}
7 changes: 2 additions & 5 deletions .github/workflows/delete_doc_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@ on:

jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@use_hf_hub
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
with:
pr_number: ${{ github.event.number }}
package: datasets
secrets:
token: ${{ secrets.HF_DOC_PUSH }}
comment_bot_token: ${{ secrets.HUGGINGFACE_PUSH }}
package: datasets

1 comment on commit c144ccd

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.009901 / 0.011353 (-0.001452) 0.005674 / 0.011008 (-0.005335) 0.101978 / 0.038508 (0.063470) 0.038053 / 0.023109 (0.014944) 0.305516 / 0.275898 (0.029618) 0.384999 / 0.323480 (0.061519) 0.008383 / 0.007986 (0.000397) 0.004773 / 0.004328 (0.000444) 0.075657 / 0.004250 (0.071406) 0.050175 / 0.037052 (0.013123) 0.314083 / 0.258489 (0.055594) 0.352318 / 0.293841 (0.058477) 0.043581 / 0.128546 (-0.084965) 0.015644 / 0.075646 (-0.060002) 0.345981 / 0.419271 (-0.073291) 0.052036 / 0.043533 (0.008503) 0.308167 / 0.255139 (0.053028) 0.323903 / 0.283200 (0.040703) 0.116503 / 0.141683 (-0.025180) 1.457667 / 1.452155 (0.005512) 1.507817 / 1.492716 (0.015101)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.268092 / 0.018006 (0.250086) 0.542344 / 0.000490 (0.541855) 0.001054 / 0.000200 (0.000854) 0.000091 / 0.000054 (0.000037)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.031751 / 0.037411 (-0.005660) 0.120681 / 0.014526 (0.106155) 0.126173 / 0.176557 (-0.050384) 0.166282 / 0.737135 (-0.570853) 0.133744 / 0.296338 (-0.162595)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.392112 / 0.215209 (0.176903) 3.917997 / 2.077655 (1.840342) 1.764009 / 1.504120 (0.259890) 1.582740 / 1.541195 (0.041545) 1.626856 / 1.468490 (0.158366) 0.692900 / 4.584777 (-3.891877) 3.948321 / 3.745712 (0.202609) 2.146228 / 5.269862 (-3.123633) 1.371687 / 4.565676 (-3.193989) 0.082416 / 0.424275 (-0.341859) 0.012143 / 0.007607 (0.004536) 0.493368 / 0.226044 (0.267324) 4.947885 / 2.268929 (2.678957) 2.225426 / 55.444624 (-53.219199) 1.879564 / 6.876477 (-4.996913) 2.037803 / 2.142072 (-0.104269) 0.833536 / 4.805227 (-3.971691) 0.170204 / 6.500664 (-6.330460) 0.067842 / 0.075469 (-0.007627)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.472094 / 1.841788 (-0.369694) 14.135402 / 8.074308 (6.061094) 25.667334 / 10.191392 (15.475942) 0.833594 / 0.680424 (0.153170) 0.546182 / 0.534201 (0.011981) 0.446524 / 0.579283 (-0.132759) 0.494689 / 0.434364 (0.060325) 0.304184 / 0.540337 (-0.236153) 0.291314 / 1.386936 (-1.095622)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008303 / 0.011353 (-0.003050) 0.005293 / 0.011008 (-0.005716) 0.098932 / 0.038508 (0.060424) 0.043288 / 0.023109 (0.020179) 0.340242 / 0.275898 (0.064344) 0.381354 / 0.323480 (0.057874) 0.006842 / 0.007986 (-0.001143) 0.004076 / 0.004328 (-0.000252) 0.075324 / 0.004250 (0.071074) 0.046472 / 0.037052 (0.009420) 0.351590 / 0.258489 (0.093101) 0.407516 / 0.293841 (0.113675) 0.038112 / 0.128546 (-0.090434) 0.012780 / 0.075646 (-0.062866) 0.341910 / 0.419271 (-0.077361) 0.049619 / 0.043533 (0.006086) 0.343757 / 0.255139 (0.088618) 0.367817 / 0.283200 (0.084617) 0.115730 / 0.141683 (-0.025952) 1.499932 / 1.452155 (0.047777) 1.571294 / 1.492716 (0.078578)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.300816 / 0.018006 (0.282809) 0.533131 / 0.000490 (0.532642) 0.006240 / 0.000200 (0.006040) 0.000119 / 0.000054 (0.000064)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.028448 / 0.037411 (-0.008963) 0.112421 / 0.014526 (0.097895) 0.123492 / 0.176557 (-0.053065) 0.161538 / 0.737135 (-0.575597) 0.128719 / 0.296338 (-0.167620)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.428007 / 0.215209 (0.212798) 4.271864 / 2.077655 (2.194209) 2.026205 / 1.504120 (0.522085) 1.856333 / 1.541195 (0.315138) 1.998335 / 1.468490 (0.529845) 0.721483 / 4.584777 (-3.863294) 3.958323 / 3.745712 (0.212611) 3.199367 / 5.269862 (-2.070494) 1.949239 / 4.565676 (-2.616438) 0.087319 / 0.424275 (-0.336956) 0.013076 / 0.007607 (0.005469) 0.525769 / 0.226044 (0.299724) 5.224391 / 2.268929 (2.955462) 2.534581 / 55.444624 (-52.910044) 2.209983 / 6.876477 (-4.666494) 2.350494 / 2.142072 (0.208422) 0.852274 / 4.805227 (-3.952953) 0.171855 / 6.500664 (-6.328809) 0.068118 / 0.075469 (-0.007351)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.564543 / 1.841788 (-0.277245) 14.658666 / 8.074308 (6.584358) 12.417318 / 10.191392 (2.225926) 0.913598 / 0.680424 (0.233174) 0.580960 / 0.534201 (0.046759) 0.422732 / 0.579283 (-0.156551) 0.436757 / 0.434364 (0.002393) 0.265151 / 0.540337 (-0.275186) 0.300032 / 1.386936 (-1.086904)

Please sign in to comment.