Skip to content

Commit

Permalink
Fix docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
albertvillanova committed Oct 21, 2021
1 parent 105ead7 commit a869469
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions src/datasets/features/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ class Audio:
mono (:obj:`bool`, default ``True``): Whether to convert the audio signal to mono by averaging samples across
channels.
archived (:obj:`bool`, default ``False``): Whether the source data is archived with sequential access.
- If non-archived with sequential access (i.e. random access is allowed), the cache will only store the
absolute path to the audio file.
- If archived with sequential access, the cache will store the relative path of the audio file to the
Expand Down

1 comment on commit a869469

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.011519 / 0.011353 (0.000166) 0.004893 / 0.011008 (-0.006115) 0.040772 / 0.038508 (0.002263) 0.041160 / 0.023109 (0.018051) 0.370610 / 0.275898 (0.094712) 0.497774 / 0.323480 (0.174294) 0.009888 / 0.007986 (0.001902) 0.005632 / 0.004328 (0.001304) 0.009502 / 0.004250 (0.005252) 0.046237 / 0.037052 (0.009185) 0.362965 / 0.258489 (0.104476) 0.418877 / 0.293841 (0.125036) 0.039630 / 0.128546 (-0.088917) 0.013811 / 0.075646 (-0.061835) 0.331111 / 0.419271 (-0.088160) 0.062102 / 0.043533 (0.018570) 0.368718 / 0.255139 (0.113579) 0.392903 / 0.283200 (0.109703) 0.094739 / 0.141683 (-0.046943) 2.045133 / 1.452155 (0.592978) 2.075534 / 1.492716 (0.582818)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.252730 / 0.018006 (0.234724) 0.564828 / 0.000490 (0.564338) 0.006299 / 0.000200 (0.006099) 0.000347 / 0.000054 (0.000292)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.048219 / 0.037411 (0.010807) 0.027523 / 0.014526 (0.012997) 0.033712 / 0.176557 (-0.142845) 0.139872 / 0.737135 (-0.597264) 0.034113 / 0.296338 (-0.262225)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.627314 / 0.215209 (0.412105) 6.262405 / 2.077655 (4.184750) 2.501046 / 1.504120 (0.996926) 2.102691 / 1.541195 (0.561497) 2.091140 / 1.468490 (0.622650) 0.633318 / 4.584777 (-3.951459) 7.069924 / 3.745712 (3.324212) 1.546213 / 5.269862 (-3.723649) 1.370307 / 4.565676 (-3.195370) 0.067130 / 0.424275 (-0.357145) 0.006463 / 0.007607 (-0.001144) 0.781097 / 0.226044 (0.555052) 7.905034 / 2.268929 (5.636105) 3.141635 / 55.444624 (-52.302990) 2.459790 / 6.876477 (-4.416687) 2.522574 / 2.142072 (0.380501) 0.812511 / 4.805227 (-3.992717) 0.186238 / 6.500664 (-6.314426) 0.068035 / 0.075469 (-0.007434)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.821414 / 1.841788 (-0.020374) 14.771129 / 8.074308 (6.696821) 42.975576 / 10.191392 (32.784184) 0.971802 / 0.680424 (0.291378) 0.672711 / 0.534201 (0.138510) 0.290799 / 0.579283 (-0.288484) 0.714711 / 0.434364 (0.280347) 0.233451 / 0.540337 (-0.306886) 0.256503 / 1.386936 (-1.130433)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.012545 / 0.011353 (0.001192) 0.005357 / 0.011008 (-0.005651) 0.040730 / 0.038508 (0.002222) 0.038889 / 0.023109 (0.015780) 0.391417 / 0.275898 (0.115518) 0.438166 / 0.323480 (0.114686) 0.010538 / 0.007986 (0.002553) 0.004443 / 0.004328 (0.000115) 0.011289 / 0.004250 (0.007038) 0.055302 / 0.037052 (0.018249) 0.380669 / 0.258489 (0.122180) 0.449057 / 0.293841 (0.155216) 0.038998 / 0.128546 (-0.089549) 0.014756 / 0.075646 (-0.060890) 0.336965 / 0.419271 (-0.082307) 0.060747 / 0.043533 (0.017214) 0.395361 / 0.255139 (0.140222) 0.437127 / 0.283200 (0.153928) 0.108682 / 0.141683 (-0.033001) 2.142188 / 1.452155 (0.690033) 2.273178 / 1.492716 (0.780462)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.266452 / 0.018006 (0.248446) 0.613786 / 0.000490 (0.613296) 0.034577 / 0.000200 (0.034377) 0.000520 / 0.000054 (0.000465)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041520 / 0.037411 (0.004108) 0.028168 / 0.014526 (0.013642) 0.029866 / 0.176557 (-0.146691) 0.149107 / 0.737135 (-0.588028) 0.035229 / 0.296338 (-0.261109)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.647544 / 0.215209 (0.432335) 6.374638 / 2.077655 (4.296983) 2.659824 / 1.504120 (1.155704) 2.220449 / 1.541195 (0.679254) 2.268776 / 1.468490 (0.800286) 0.651522 / 4.584777 (-3.933255) 6.989002 / 3.745712 (3.243290) 1.590970 / 5.269862 (-3.678891) 1.493754 / 4.565676 (-3.071922) 0.071800 / 0.424275 (-0.352475) 0.006254 / 0.007607 (-0.001353) 0.819103 / 0.226044 (0.593058) 8.410058 / 2.268929 (6.141129) 3.291429 / 55.444624 (-52.153195) 2.559599 / 6.876477 (-4.316877) 2.590430 / 2.142072 (0.448358) 0.871213 / 4.805227 (-3.934014) 0.167223 / 6.500664 (-6.333441) 0.072595 / 0.075469 (-0.002874)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 2.006850 / 1.841788 (0.165063) 15.982672 / 8.074308 (7.908363) 45.991925 / 10.191392 (35.800532) 0.992145 / 0.680424 (0.311722) 0.745210 / 0.534201 (0.211009) 0.308574 / 0.579283 (-0.270709) 0.753620 / 0.434364 (0.319256) 0.237442 / 0.540337 (-0.302896) 0.271822 / 1.386936 (-1.115114)

CML watermark

Please sign in to comment.