Skip to content

Commit

Permalink
Fix style
Browse files Browse the repository at this point in the history
  • Loading branch information
albertvillanova committed Oct 26, 2021
1 parent 2ba4c28 commit 3efc39b
Showing 1 changed file with 3 additions and 5 deletions.
8 changes: 3 additions & 5 deletions src/datasets/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,11 +410,9 @@ def builder_configs(cls):
@classmethod
@utils.memoize()
def default_builder_config(cls):
config_kwargs = {
param: cls.kwargs[param]
for param in ["name"]
if hasattr(cls, "kwargs") and param in cls.kwargs
}
config_kwargs = {}
if hasattr(cls, "kwargs") and "name" in cls.kwargs:
config_kwargs["name"] = cls.kwargs[name]
if hasattr(cls, "VERSION") and cls.VERSION:
config_kwargs["version"] = cls.VERSION
config = cls.BUILDER_CONFIG_CLASS(**config_kwargs)
Expand Down

1 comment on commit 3efc39b

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010160 / 0.011353 (-0.001193) 0.004094 / 0.011008 (-0.006914) 0.037038 / 0.038508 (-0.001470) 0.041072 / 0.023109 (0.017963) 0.342330 / 0.275898 (0.066432) 0.494875 / 0.323480 (0.171395) 0.008698 / 0.007986 (0.000712) 0.005031 / 0.004328 (0.000702) 0.010677 / 0.004250 (0.006426) 0.042687 / 0.037052 (0.005635) 0.345670 / 0.258489 (0.087181) 0.387994 / 0.293841 (0.094153) 0.027870 / 0.128546 (-0.100677) 0.009492 / 0.075646 (-0.066154) 0.304053 / 0.419271 (-0.115218) 0.054220 / 0.043533 (0.010687) 0.342665 / 0.255139 (0.087526) 0.387001 / 0.283200 (0.103801) 0.092089 / 0.141683 (-0.049594) 1.938974 / 1.452155 (0.486819) 2.024030 / 1.492716 (0.531314)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.216749 / 0.018006 (0.198743) 0.476713 / 0.000490 (0.476223) 0.007113 / 0.000200 (0.006913) 0.000142 / 0.000054 (0.000088)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.043378 / 0.037411 (0.005967) 0.025714 / 0.014526 (0.011188) 0.030222 / 0.176557 (-0.146334) 0.146451 / 0.737135 (-0.590684) 0.031672 / 0.296338 (-0.264667)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.493829 / 0.215209 (0.278620) 5.034961 / 2.077655 (2.957306) 2.347798 / 1.504120 (0.843678) 2.103300 / 1.541195 (0.562105) 2.156981 / 1.468490 (0.688490) 0.433715 / 4.584777 (-4.151062) 6.017769 / 3.745712 (2.272056) 1.050023 / 5.269862 (-4.219839) 0.977239 / 4.565676 (-3.588437) 0.047994 / 0.424275 (-0.376281) 0.005638 / 0.007607 (-0.001969) 0.625466 / 0.226044 (0.399421) 6.281085 / 2.268929 (4.012157) 2.862314 / 55.444624 (-52.582311) 2.396943 / 6.876477 (-4.479534) 2.411505 / 2.142072 (0.269432) 0.564592 / 4.805227 (-4.240635) 0.121343 / 6.500664 (-6.379322) 0.059560 / 0.075469 (-0.015909)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.804617 / 1.841788 (-0.037170) 14.891789 / 8.074308 (6.817481) 33.101281 / 10.191392 (22.909889) 0.895400 / 0.680424 (0.214976) 0.597885 / 0.534201 (0.063684) 0.264353 / 0.579283 (-0.314930) 0.629127 / 0.434364 (0.194763) 0.234337 / 0.540337 (-0.306001) 0.246232 / 1.386936 (-1.140704)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010933 / 0.011353 (-0.000419) 0.004361 / 0.011008 (-0.006647) 0.036256 / 0.038508 (-0.002252) 0.041573 / 0.023109 (0.018463) 0.347432 / 0.275898 (0.071534) 0.374108 / 0.323480 (0.050628) 0.009016 / 0.007986 (0.001031) 0.005329 / 0.004328 (0.001001) 0.010694 / 0.004250 (0.006444) 0.049191 / 0.037052 (0.012138) 0.343732 / 0.258489 (0.085242) 0.387667 / 0.293841 (0.093826) 0.030661 / 0.128546 (-0.097885) 0.009433 / 0.075646 (-0.066213) 0.297915 / 0.419271 (-0.121357) 0.057706 / 0.043533 (0.014173) 0.351681 / 0.255139 (0.096542) 0.380034 / 0.283200 (0.096835) 0.104708 / 0.141683 (-0.036974) 1.946408 / 1.452155 (0.494254) 2.174796 / 1.492716 (0.682079)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.247671 / 0.018006 (0.229665) 0.478796 / 0.000490 (0.478306) 0.013210 / 0.000200 (0.013010) 0.000142 / 0.000054 (0.000087)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042080 / 0.037411 (0.004669) 0.024255 / 0.014526 (0.009729) 0.030204 / 0.176557 (-0.146353) 0.149692 / 0.737135 (-0.587444) 0.031029 / 0.296338 (-0.265309)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.482205 / 0.215209 (0.266996) 4.769367 / 2.077655 (2.691712) 2.166367 / 1.504120 (0.662247) 1.949352 / 1.541195 (0.408158) 1.967155 / 1.468490 (0.498665) 0.450279 / 4.584777 (-4.134498) 6.021402 / 3.745712 (2.275690) 1.160094 / 5.269862 (-4.109767) 1.106134 / 4.565676 (-3.459542) 0.051250 / 0.424275 (-0.373025) 0.005599 / 0.007607 (-0.002008) 0.644905 / 0.226044 (0.418861) 6.372878 / 2.268929 (4.103949) 2.853828 / 55.444624 (-52.590796) 2.369941 / 6.876477 (-4.506536) 2.290078 / 2.142072 (0.148006) 0.590048 / 4.805227 (-4.215180) 0.131801 / 6.500664 (-6.368863) 0.064925 / 0.075469 (-0.010544)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.891397 / 1.841788 (0.049610) 21.843306 / 8.074308 (13.768998) 30.995197 / 10.191392 (20.803805) 0.860703 / 0.680424 (0.180279) 0.649679 / 0.534201 (0.115478) 0.282568 / 0.579283 (-0.296715) 0.586400 / 0.434364 (0.152036) 0.222383 / 0.540337 (-0.317954) 0.242481 / 1.386936 (-1.144455)

CML watermark

Please sign in to comment.