Skip to content

Commit

Permalink
Update supported versions of Python in setup.py
Browse files Browse the repository at this point in the history
  • Loading branch information
mariosasko authored Dec 15, 2021
1 parent 624b013 commit 531550f
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,9 @@
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
keywords="datasets machine learning datasets metrics",
Expand Down

1 comment on commit 531550f

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.013276 / 0.011353 (0.001924) 0.005382 / 0.011008 (-0.005626) 0.047188 / 0.038508 (0.008680) 0.045793 / 0.023109 (0.022683) 0.404199 / 0.275898 (0.128301) 0.430177 / 0.323480 (0.106698) 0.010133 / 0.007986 (0.002148) 0.006532 / 0.004328 (0.002204) 0.011099 / 0.004250 (0.006849) 0.044690 / 0.037052 (0.007638) 0.384411 / 0.258489 (0.125922) 0.459627 / 0.293841 (0.165786) 0.049730 / 0.128546 (-0.078817) 0.014615 / 0.075646 (-0.061031) 0.343488 / 0.419271 (-0.075783) 0.066673 / 0.043533 (0.023141) 0.372947 / 0.255139 (0.117808) 0.452719 / 0.283200 (0.169519) 0.102271 / 0.141683 (-0.039411) 2.135895 / 1.452155 (0.683741) 2.439793 / 1.492716 (0.947076)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.305878 / 0.018006 (0.287872) 0.591047 / 0.000490 (0.590558) 0.006586 / 0.000200 (0.006386) 0.000129 / 0.000054 (0.000075)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041016 / 0.037411 (0.003604) 0.035975 / 0.014526 (0.021449) 0.035911 / 0.176557 (-0.140646) 0.074128 / 0.737135 (-0.663007) 0.033319 / 0.296338 (-0.263019)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.713921 / 0.215209 (0.498712) 6.985952 / 2.077655 (4.908297) 2.488466 / 1.504120 (0.984346) 2.188773 / 1.541195 (0.647578) 2.160667 / 1.468490 (0.692177) 0.833179 / 4.584777 (-3.751598) 7.173521 / 3.745712 (3.427809) 5.272335 / 5.269862 (0.002473) 1.641349 / 4.565676 (-2.924327) 0.094623 / 0.424275 (-0.329652) 0.015289 / 0.007607 (0.007681) 0.868398 / 0.226044 (0.642353) 8.524685 / 2.268929 (6.255756) 3.470854 / 55.444624 (-51.973771) 2.453303 / 6.876477 (-4.423174) 2.484514 / 2.142072 (0.342442) 0.968933 / 4.805227 (-3.836294) 0.209467 / 6.500664 (-6.291197) 0.085012 / 0.075469 (0.009543)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 2.060456 / 1.841788 (0.218669) 14.876149 / 8.074308 (6.801841) 44.461005 / 10.191392 (34.269613) 1.026723 / 0.680424 (0.346299) 0.727561 / 0.534201 (0.193360) 0.661592 / 0.579283 (0.082308) 0.741395 / 0.434364 (0.307031) 0.397365 / 0.540337 (-0.142972) 0.434929 / 1.386936 (-0.952007)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010659 / 0.011353 (-0.000694) 0.005206 / 0.011008 (-0.005802) 0.045464 / 0.038508 (0.006956) 0.037860 / 0.023109 (0.014750) 0.406723 / 0.275898 (0.130825) 0.493668 / 0.323480 (0.170189) 0.008223 / 0.007986 (0.000238) 0.005976 / 0.004328 (0.001647) 0.009338 / 0.004250 (0.005087) 0.040801 / 0.037052 (0.003748) 0.409102 / 0.258489 (0.150613) 0.477455 / 0.293841 (0.183614) 0.051788 / 0.128546 (-0.076758) 0.014130 / 0.075646 (-0.061516) 0.370681 / 0.419271 (-0.048591) 0.069999 / 0.043533 (0.026467) 0.435334 / 0.255139 (0.180195) 0.482082 / 0.283200 (0.198882) 0.095758 / 0.141683 (-0.045924) 2.322533 / 1.452155 (0.870379) 2.294334 / 1.492716 (0.801617)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.373982 / 0.018006 (0.355976) 0.586706 / 0.000490 (0.586217) 0.039126 / 0.000200 (0.038926) 0.000618 / 0.000054 (0.000564)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.037828 / 0.037411 (0.000417) 0.025572 / 0.014526 (0.011046) 0.033806 / 0.176557 (-0.142751) 0.076060 / 0.737135 (-0.661075) 0.032253 / 0.296338 (-0.264086)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.667834 / 0.215209 (0.452625) 6.597424 / 2.077655 (4.519769) 2.561427 / 1.504120 (1.057307) 2.073069 / 1.541195 (0.531874) 2.180401 / 1.468490 (0.711911) 0.813919 / 4.584777 (-3.770858) 7.077265 / 3.745712 (3.331553) 3.417147 / 5.269862 (-1.852714) 1.664382 / 4.565676 (-2.901295) 0.110242 / 0.424275 (-0.314033) 0.016844 / 0.007607 (0.009237) 0.866690 / 0.226044 (0.640646) 8.451469 / 2.268929 (6.182541) 3.224942 / 55.444624 (-52.219683) 2.606261 / 6.876477 (-4.270216) 2.529945 / 2.142072 (0.387872) 1.001676 / 4.805227 (-3.803551) 0.197957 / 6.500664 (-6.302707) 0.083218 / 0.075469 (0.007749)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.922769 / 1.841788 (0.080981) 14.739465 / 8.074308 (6.665157) 43.642812 / 10.191392 (33.451420) 0.955740 / 0.680424 (0.275316) 0.668811 / 0.534201 (0.134610) 0.626138 / 0.579283 (0.046855) 0.729859 / 0.434364 (0.295495) 0.410798 / 0.540337 (-0.129540) 0.428672 / 1.386936 (-0.958264)

CML watermark

Please sign in to comment.