perf(python): Improve performance of indexing operations on Series. #5610

ghuls · 2022-11-24T00:01:53Z

Improve performance of indexing operations on Series:

First check for Series and numpy arrays and handle most logic in _pos_idxs(). In case of signed numpy arrays, after converting negative indexes to absolute indexes, convert to unsigned numpy array as it is faster than doing it when creating a new Series with an unsigned dtype from the signed numpy array.
Remove dispatch of cast to expression API as it add around 20 microseconds to each cast call, which is used relatively ofthen in _pos_idxs().
Move expensive checks on Sequences to the last moment after checking all other instance types.
Move deprecated boolean mask methods to the end as they can be the slowest paths.

Speed of a Series.to_frame() with a new name agument is also improved, by renaming the series before
converting to a frame (-20 microsecconds).

ghuls · 2022-11-24T00:05:05Z

import numpy as np
import polars as pl
import polars.internals as pli

numpy_array = np.arange(0, 1000000)
series_pl = pl.Series(numpy_array)

numpy_idxs_int64 = np.random.randint(1, 1000000, 10000)
numpy_idxs_uint64 = numpy_idxs_int64.astype(np.uint64)
numpy_idxs_int32 = numpy_idxs_int64.astype(np.int32)
numpy_idxs_uint32 = numpy_idxs_int64.astype(np.uint32)
numpy_idxs_int16 = numpy_idxs_int64.astype(np.int16)
numpy_idxs_uint16 = numpy_idxs_int64.astype(np.uint16)

series_idxs_int64 = pl.Series("idx", numpy_idxs_int64)
series_idxs_uint64 = pl.Series("idx", numpy_idxs_uint64)
series_idxs_int32 = pl.Series("idx", numpy_idxs_int32)
series_idxs_uint32 = pl.Series("idx", numpy_idxs_uint32)
series_idxs_int16 = pl.Series("idx", numpy_idxs_int16)
series_idxs_uint16 = pl.Series("idx", numpy_idxs_uint16)

python_idxs_list = series_idxs_int64.to_list()


# Numpy baseline:
In [3]: %timeit numpy_array[numpy_idxs_int64]
13.8 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit numpy_array[numpy_idxs_int32]
23 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit numpy_array[numpy_idxs_int16]
22.6 µs ± 82.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit numpy_array[numpy_idxs_uint64]
23.1 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit numpy_array[numpy_idxs_uint32]
22.9 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit numpy_array[numpy_idxs_uint16]
21.1 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


# New
In [2]: %timeit series_pl[numpy_idxs_int64]
49.7 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [3]: %timeit series_pl[numpy_idxs_int32]
48.3 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit series_pl[numpy_idxs_int16]
81.8 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit series_pl[numpy_idxs_uint64]
54.8 µs ± 855 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit series_pl[numpy_idxs_uint32]
34.4 µs ± 450 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]:  %timeit series_pl[numpy_idxs_uint16]
27.9 µs ± 402 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


# Old
In [4]: %timeit series_pl[numpy_idxs_int64]
114 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit series_pl[numpy_idxs_int32]
108 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit series_pl[numpy_idxs_int16]
142 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit series_pl[numpy_idxs_uint64]
93.6 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit series_pl[numpy_idxs_uint32]
71.5 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit series_pl[numpy_idxs_uint16]
61.9 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# New
In [17]: %timeit series_pl[series_idxs_int64]
61.3 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [18]: %timeit series_pl[series_idxs_int32]
45 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [19]: %timeit series_pl[series_idxs_int16]
210 µs ± 4.01 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [20]: %timeit series_pl[series_idxs_uint64]
51.2 µs ± 562 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [21]: %timeit series_pl[series_idxs_uint32]
20.1 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [22]: %timeit series_pl[series_idxs_uint16]
19.4 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Old
In [10]: %timeit series_pl[series_idxs_int64]
97.7 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit series_pl[series_idxs_int32]
84 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit series_pl[series_idxs_int16]
277 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [13]: %timeit series_pl[series_idxs_uint64]
90.6 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: %timeit series_pl[series_idxs_uint32]
24.5 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [15]: %timeit series_pl[series_idxs_uint16]
56.1 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# New
In [34]: %timeit numpy_array[python_idxs_list]
399 µs ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [35]: %timeit series_pl[python_idxs_list]
513 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# Old
In [21]: %timeit numpy_array[python_idxs_list]
336 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [22]: %timeit series_pl[python_idxs_list]
588 µs ± 6.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# New
In [26]: %timeit numpy_array[100]
53.5 ns ± 0.474 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [27]: %timeit series_pl[100]
4.82 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Old
In [23]: %timeit numpy_array[100]
53.5 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [24]: %timeit series_pl[100]
6 µs ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



# New
In [28]: %timeit numpy_array[[100, 200]]
528 ns ± 1.62 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [29]: %timeit series_pl[[100, 200]]
42.4 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Old
In [27]: %timeit numpy_array[[100, 200]]
572 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [28]: %timeit series_pl[[100, 200]]
108 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Improve performance of indexing operations on Series: - First check for Series and numpy arrays and handle most logic in _pos_idxs(). In case of signed numpy arrays, after converting negative indexes to absolute indexes, convert to unsigned numpy array as it is faster than doing it when creating a new Series with an unsigned dtype from the signed numpy array. - Remove dispatch of cast to expression API as it add around 20 microseconds to each cast call, which is used relatively ofthen in _pos_idxs(). - Move expensive checks on Sequences to the last moment after checking all other instance types. - Move deprecated boolean mask methods to the end as they can be the slowest paths. Speed of a Series.to_frame() with a new name agument is also improved, by renaming the series before converting to a frame (-20 microsecconds).

…ola-rs#5610)

github-actions bot added performance Performance issues or improvements python Related to Python Polars labels Nov 24, 2022

ghuls force-pushed the perf_python_getitem branch from 940b8a1 to 2a27beb Compare November 24, 2022 07:58

ghuls force-pushed the perf_python_getitem branch from 2a27beb to 5060b16 Compare November 24, 2022 09:19

ritchie46 merged commit b7be15a into pola-rs:master Nov 24, 2022

zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023

perf(python): Improve performance of indexing operations on Series. (p…

dcf49f4

…ola-rs#5610)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(python): Improve performance of indexing operations on Series. #5610

perf(python): Improve performance of indexing operations on Series. #5610

ghuls commented Nov 24, 2022

ghuls commented Nov 24, 2022

perf(python): Improve performance of indexing operations on Series. #5610

perf(python): Improve performance of indexing operations on Series. #5610

Conversation

ghuls commented Nov 24, 2022

ghuls commented Nov 24, 2022