Add OSCAR dataset card (#1833) · huggingface/datasets@f9df773

github-actions · 2021-02-12T14:22:05Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.020511 / 0.011353 (0.009158)	0.016439 / 0.011008 (0.005431)	0.047983 / 0.038508 (0.009475)	0.034504 / 0.023109 (0.011395)	0.223612 / 0.275898 (-0.052286)	0.260805 / 0.323480 (-0.062675)	0.009623 / 0.007986 (0.001637)	0.004865 / 0.004328 (0.000537)	0.006892 / 0.004250 (0.002642)	0.047471 / 0.037052 (0.010419)	0.221516 / 0.258489 (-0.036973)	0.258921 / 0.293841 (-0.034920)	0.163036 / 0.128546 (0.034490)	0.133672 / 0.075646 (0.058026)	0.465309 / 0.419271 (0.046038)	0.459175 / 0.043533 (0.415642)	0.219502 / 0.255139 (-0.035637)	0.264413 / 0.283200 (-0.018787)	1.818752 / 0.141683 (1.677069)	1.952732 / 1.452155 (0.500577)	2.003916 / 1.492716 (0.511200)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.042036 / 0.037411 (0.004624)	0.020820 / 0.014526 (0.006295)	0.028230 / 0.176557 (-0.148326)	0.047460 / 0.737135 (-0.689676)	0.048125 / 0.296338 (-0.248213)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.275295 / 0.215209 (0.060086)	2.934191 / 2.077655 (0.856536)	1.491390 / 1.504120 (-0.012730)	1.351775 / 1.541195 (-0.189420)	1.435240 / 1.468490 (-0.033250)	7.576647 / 4.584777 (2.991870)	6.621372 / 3.745712 (2.875660)	9.219156 / 5.269862 (3.949295)	8.155026 / 4.565676 (3.589349)	0.748266 / 0.424275 (0.323991)	0.011427 / 0.007607 (0.003820)	0.342845 / 0.226044 (0.116800)	3.397014 / 2.268929 (1.128086)	1.988188 / 55.444624 (-53.456436)	1.677527 / 6.876477 (-5.198949)	1.693024 / 2.142072 (-0.449049)	7.611559 / 4.805227 (2.806331)	6.213084 / 6.500664 (-0.287580)	6.874019 / 0.075469 (6.798550)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	12.067157 / 1.841788 (10.225369)	16.000691 / 8.074308 (7.926383)	25.756312 / 10.191392 (15.564920)	0.513365 / 0.680424 (-0.167059)	0.335310 / 0.534201 (-0.198891)	0.936280 / 0.579283 (0.356997)	0.700977 / 0.434364 (0.266613)	0.769123 / 0.540337 (0.228786)	1.671362 / 1.386936 (0.284426)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.019157 / 0.011353 (0.007805)	0.015521 / 0.011008 (0.004512)	0.046191 / 0.038508 (0.007683)	0.034163 / 0.023109 (0.011054)	0.345404 / 0.275898 (0.069506)	0.380235 / 0.323480 (0.056755)	0.009471 / 0.007986 (0.001486)	0.005088 / 0.004328 (0.000760)	0.006761 / 0.004250 (0.002510)	0.047481 / 0.037052 (0.010428)	0.348101 / 0.258489 (0.089612)	0.395925 / 0.293841 (0.102084)	0.160053 / 0.128546 (0.031507)	0.131553 / 0.075646 (0.055906)	0.458888 / 0.419271 (0.039617)	0.449326 / 0.043533 (0.405794)	0.354657 / 0.255139 (0.099518)	0.385237 / 0.283200 (0.102037)	1.816263 / 0.141683 (1.674580)	1.913135 / 1.452155 (0.460980)	2.035098 / 1.492716 (0.542381)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.044102 / 0.037411 (0.006691)	0.021560 / 0.014526 (0.007035)	0.038639 / 0.176557 (-0.137917)	0.056491 / 0.737135 (-0.680644)	0.029547 / 0.296338 (-0.266791)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.371256 / 0.215209 (0.156047)	3.763800 / 2.077655 (1.686146)	2.175321 / 1.504120 (0.671201)	1.943748 / 1.541195 (0.402554)	1.995813 / 1.468490 (0.527323)	7.261890 / 4.584777 (2.677113)	6.292724 / 3.745712 (2.547012)	9.012258 / 5.269862 (3.742396)	7.811911 / 4.565676 (3.246234)	0.727577 / 0.424275 (0.303302)	0.011226 / 0.007607 (0.003619)	0.424181 / 0.226044 (0.198137)	4.280841 / 2.268929 (2.011912)	2.706042 / 55.444624 (-52.738583)	2.335405 / 6.876477 (-4.541072)	2.405261 / 2.142072 (0.263189)	7.345899 / 4.805227 (2.540672)	5.113989 / 6.500664 (-1.386675)	8.007380 / 0.075469 (7.931911)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	12.600367 / 1.841788 (10.758580)	14.978017 / 8.074308 (6.903709)	25.415928 / 10.191392 (15.224536)	0.862584 / 0.680424 (0.182160)	0.627171 / 0.534201 (0.092970)	0.828762 / 0.579283 (0.249479)	0.657041 / 0.434364 (0.222677)	0.761937 / 0.540337 (0.221600)	1.661794 / 1.386936 (0.274858)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

1 comment on commit `f9df773`

github-actions bot commented on `f9df773` Feb 12, 2021

Commit

There are no files selected for viewing

1 comment on commit f9df773

github-actions bot commented on f9df773 Feb 12, 2021

Choose a reason for hiding this comment

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

1 comment on commit `f9df773`

github-actions bot commented on `f9df773` Feb 12, 2021