Skip to content

Commit

Permalink
Update github pr docs actions
Browse files Browse the repository at this point in the history
  • Loading branch information
Mishig authored Nov 8, 2022
1 parent e706514 commit aceb39f
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 3 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ concurrency:

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@use_hf_hub
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: datasets
secrets:
token: ${{ secrets.HF_DOC_PUSH }}
comment_bot_token: ${{ secrets.HUGGINGFACE_PUSH }}
7 changes: 5 additions & 2 deletions .github/workflows/delete_doc_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ on:

jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@use_hf_hub
with:
pr_number: ${{ github.event.number }}
package: datasets
package: datasets
secrets:
token: ${{ secrets.HF_DOC_PUSH }}
comment_bot_token: ${{ secrets.HUGGINGFACE_PUSH }}

1 comment on commit aceb39f

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==6.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010359 / 0.011353 (-0.000994) 0.005754 / 0.011008 (-0.005254) 0.099121 / 0.038508 (0.060613) 0.044122 / 0.023109 (0.021013) 0.297936 / 0.275898 (0.022038) 0.381015 / 0.323480 (0.057535) 0.008907 / 0.007986 (0.000922) 0.004676 / 0.004328 (0.000347) 0.073890 / 0.004250 (0.069639) 0.058718 / 0.037052 (0.021665) 0.309743 / 0.258489 (0.051254) 0.363880 / 0.293841 (0.070039) 0.044587 / 0.128546 (-0.083959) 0.016160 / 0.075646 (-0.059486) 0.348251 / 0.419271 (-0.071020) 0.052216 / 0.043533 (0.008683) 0.303857 / 0.255139 (0.048718) 0.321265 / 0.283200 (0.038065) 0.119076 / 0.141683 (-0.022606) 1.423953 / 1.452155 (-0.028202) 1.475387 / 1.492716 (-0.017330)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.282904 / 0.018006 (0.264898) 0.559245 / 0.000490 (0.558755) 0.002642 / 0.000200 (0.002442) 0.000092 / 0.000054 (0.000037)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.027282 / 0.037411 (-0.010129) 0.108404 / 0.014526 (0.093878) 0.120335 / 0.176557 (-0.056221) 0.155837 / 0.737135 (-0.581299) 0.128539 / 0.296338 (-0.167799)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.402878 / 0.215209 (0.187669) 4.013942 / 2.077655 (1.936287) 1.815555 / 1.504120 (0.311435) 1.624216 / 1.541195 (0.083021) 1.844000 / 1.468490 (0.375510) 0.701931 / 4.584777 (-3.882846) 3.762430 / 3.745712 (0.016718) 2.283099 / 5.269862 (-2.986762) 1.579992 / 4.565676 (-2.985685) 0.084503 / 0.424275 (-0.339773) 0.011846 / 0.007607 (0.004239) 0.512313 / 0.226044 (0.286268) 5.086468 / 2.268929 (2.817539) 2.289676 / 55.444624 (-53.154949) 1.960249 / 6.876477 (-4.916228) 2.088882 / 2.142072 (-0.053190) 0.861516 / 4.805227 (-3.943712) 0.169676 / 6.500664 (-6.330988) 0.065133 / 0.075469 (-0.010337)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.482557 / 1.841788 (-0.359231) 14.873279 / 8.074308 (6.798971) 25.255787 / 10.191392 (15.064395) 0.908622 / 0.680424 (0.228198) 0.566800 / 0.534201 (0.032599) 0.440897 / 0.579283 (-0.138386) 0.431731 / 0.434364 (-0.002633) 0.271214 / 0.540337 (-0.269123) 0.273335 / 1.386936 (-1.113601)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.008201 / 0.011353 (-0.003152) 0.005631 / 0.011008 (-0.005377) 0.098921 / 0.038508 (0.060413) 0.038531 / 0.023109 (0.015422) 0.336348 / 0.275898 (0.060450) 0.416186 / 0.323480 (0.092706) 0.006630 / 0.007986 (-0.001356) 0.005720 / 0.004328 (0.001391) 0.074214 / 0.004250 (0.069964) 0.047779 / 0.037052 (0.010727) 0.351409 / 0.258489 (0.092920) 0.381010 / 0.293841 (0.087169) 0.040288 / 0.128546 (-0.088258) 0.012984 / 0.075646 (-0.062662) 0.332532 / 0.419271 (-0.086740) 0.049237 / 0.043533 (0.005705) 0.334939 / 0.255139 (0.079800) 0.367838 / 0.283200 (0.084639) 0.116263 / 0.141683 (-0.025419) 1.527454 / 1.452155 (0.075300) 1.575552 / 1.492716 (0.082836)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.328846 / 0.018006 (0.310840) 0.541792 / 0.000490 (0.541302) 0.001190 / 0.000200 (0.000990) 0.000093 / 0.000054 (0.000039)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.031468 / 0.037411 (-0.005944) 0.115996 / 0.014526 (0.101470) 0.128087 / 0.176557 (-0.048470) 0.174176 / 0.737135 (-0.562959) 0.135998 / 0.296338 (-0.160341)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.438391 / 0.215209 (0.223182) 4.322452 / 2.077655 (2.244797) 2.145007 / 1.504120 (0.640887) 1.967809 / 1.541195 (0.426615) 2.113594 / 1.468490 (0.645103) 0.726442 / 4.584777 (-3.858335) 3.844741 / 3.745712 (0.099029) 3.335988 / 5.269862 (-1.933874) 1.832259 / 4.565676 (-2.733418) 0.086422 / 0.424275 (-0.337853) 0.012581 / 0.007607 (0.004974) 0.538382 / 0.226044 (0.312338) 5.388337 / 2.268929 (3.119408) 2.616442 / 55.444624 (-52.828182) 2.325721 / 6.876477 (-4.550756) 2.481899 / 2.142072 (0.339827) 0.858725 / 4.805227 (-3.946502) 0.173721 / 6.500664 (-6.326943) 0.067391 / 0.075469 (-0.008078)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.592815 / 1.841788 (-0.248972) 15.360556 / 8.074308 (7.286248) 12.526840 / 10.191392 (2.335448) 0.943991 / 0.680424 (0.263568) 0.616065 / 0.534201 (0.081864) 0.424363 / 0.579283 (-0.154921) 0.432816 / 0.434364 (-0.001548) 0.259490 / 0.540337 (-0.280847) 0.259430 / 1.386936 (-1.127506)

Please sign in to comment.