Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(python): Improve performance of indexing operations on Series. #5610

Merged
merged 1 commit into from
Nov 24, 2022

Conversation

ghuls
Copy link
Collaborator

@ghuls ghuls commented Nov 24, 2022

Improve performance of indexing operations on Series:

  • First check for Series and numpy arrays and handle most logic in _pos_idxs(). In case of signed numpy arrays, after converting negative indexes to absolute indexes, convert to unsigned numpy array as it is faster than doing it when creating a new Series with an unsigned dtype from the signed numpy array.
  • Remove dispatch of cast to expression API as it add around 20 microseconds to each cast call, which is used relatively ofthen in _pos_idxs().
  • Move expensive checks on Sequences to the last moment after checking all other instance types.
  • Move deprecated boolean mask methods to the end as they can be the slowest paths.

Speed of a Series.to_frame() with a new name agument is also improved, by renaming the series before
converting to a frame (-20 microsecconds).

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars labels Nov 24, 2022
@ghuls
Copy link
Collaborator Author

ghuls commented Nov 24, 2022

import numpy as np
import polars as pl
import polars.internals as pli

numpy_array = np.arange(0, 1000000)
series_pl = pl.Series(numpy_array)

numpy_idxs_int64 = np.random.randint(1, 1000000, 10000)
numpy_idxs_uint64 = numpy_idxs_int64.astype(np.uint64)
numpy_idxs_int32 = numpy_idxs_int64.astype(np.int32)
numpy_idxs_uint32 = numpy_idxs_int64.astype(np.uint32)
numpy_idxs_int16 = numpy_idxs_int64.astype(np.int16)
numpy_idxs_uint16 = numpy_idxs_int64.astype(np.uint16)

series_idxs_int64 = pl.Series("idx", numpy_idxs_int64)
series_idxs_uint64 = pl.Series("idx", numpy_idxs_uint64)
series_idxs_int32 = pl.Series("idx", numpy_idxs_int32)
series_idxs_uint32 = pl.Series("idx", numpy_idxs_uint32)
series_idxs_int16 = pl.Series("idx", numpy_idxs_int16)
series_idxs_uint16 = pl.Series("idx", numpy_idxs_uint16)

python_idxs_list = series_idxs_int64.to_list()


# Numpy baseline:
In [3]: %timeit numpy_array[numpy_idxs_int64]
13.8 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit numpy_array[numpy_idxs_int32]
23 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit numpy_array[numpy_idxs_int16]
22.6 µs ± 82.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit numpy_array[numpy_idxs_uint64]
23.1 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit numpy_array[numpy_idxs_uint32]
22.9 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit numpy_array[numpy_idxs_uint16]
21.1 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


# New
In [2]: %timeit series_pl[numpy_idxs_int64]
49.7 µs ± 863 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [3]: %timeit series_pl[numpy_idxs_int32]
48.3 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit series_pl[numpy_idxs_int16]
81.8 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit series_pl[numpy_idxs_uint64]
54.8 µs ± 855 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit series_pl[numpy_idxs_uint32]
34.4 µs ± 450 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]:  %timeit series_pl[numpy_idxs_uint16]
27.9 µs ± 402 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


# Old
In [4]: %timeit series_pl[numpy_idxs_int64]
114 µs ± 4.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit series_pl[numpy_idxs_int32]
108 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [6]: %timeit series_pl[numpy_idxs_int16]
142 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit series_pl[numpy_idxs_uint64]
93.6 µs ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit series_pl[numpy_idxs_uint32]
71.5 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [9]: %timeit series_pl[numpy_idxs_uint16]
61.9 µs ± 3.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# New
In [17]: %timeit series_pl[series_idxs_int64]
61.3 µs ± 176 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [18]: %timeit series_pl[series_idxs_int32]
45 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [19]: %timeit series_pl[series_idxs_int16]
210 µs ± 4.01 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [20]: %timeit series_pl[series_idxs_uint64]
51.2 µs ± 562 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [21]: %timeit series_pl[series_idxs_uint32]
20.1 µs ± 87.1 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [22]: %timeit series_pl[series_idxs_uint16]
19.4 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Old
In [10]: %timeit series_pl[series_idxs_int64]
97.7 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit series_pl[series_idxs_int32]
84 µs ± 647 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit series_pl[series_idxs_int16]
277 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [13]: %timeit series_pl[series_idxs_uint64]
90.6 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: %timeit series_pl[series_idxs_uint32]
24.5 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [15]: %timeit series_pl[series_idxs_uint16]
56.1 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# New
In [34]: %timeit numpy_array[python_idxs_list]
399 µs ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [35]: %timeit series_pl[python_idxs_list]
513 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# Old
In [21]: %timeit numpy_array[python_idxs_list]
336 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [22]: %timeit series_pl[python_idxs_list]
588 µs ± 6.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# New
In [26]: %timeit numpy_array[100]
53.5 ns ± 0.474 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [27]: %timeit series_pl[100]
4.82 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# Old
In [23]: %timeit numpy_array[100]
53.5 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [24]: %timeit series_pl[100]
6 µs ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



# New
In [28]: %timeit numpy_array[[100, 200]]
528 ns ± 1.62 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [29]: %timeit series_pl[[100, 200]]
42.4 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# Old
In [27]: %timeit numpy_array[[100, 200]]
572 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [28]: %timeit series_pl[[100, 200]]
108 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Improve performance of indexing operations on Series:
  - First check for Series and numpy arrays and
    handle most logic in _pos_idxs(). In case
    of signed numpy arrays, after converting negative
    indexes to absolute indexes, convert to unsigned
    numpy array as it is faster than doing it when
    creating a new Series with an unsigned dtype
    from the signed numpy array.
  - Remove dispatch of cast to expression API as it
    add around 20 microseconds to each cast call,
    which is used relatively ofthen in _pos_idxs().
  - Move expensive checks on Sequences to the last
    moment after checking all other instance types.
  - Move deprecated boolean mask methods to the end
    as they can be the slowest paths.

Speed of a Series.to_frame() with a new name agument
is also improved, by renaming the series before
converting to a frame (-20 microsecconds).
@ritchie46 ritchie46 merged commit b7be15a into pola-rs:master Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants