Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): enhanced Series.dot method and related interop #5428

Merged
merged 1 commit into from
Nov 5, 2022
Merged

feat(python): enhanced Series.dot method and related interop #5428

merged 1 commit into from
Nov 5, 2022

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Nov 4, 2022

Added @ matrix multiplication operator support for Series, along with implicit coercion for types that make sense (1D lists and/or numpy arrays).

import numpy as np
import polars as pl

s1 = pl.Series("a", [1, 2, 3])
s2 = pl.Series("b", [4.0, 5.0, 6.0])

for dot_result in (
    s1.dot(s2),                # << existing
    s1 @ s2,                   # << new
    s1 @ [4, 5, 6],            # << new
    s1 @ np.array([4, 5, 6]),  # << new
):
    assert dot_result == 32

Reference: (numpy behaviour)

np.array([1, 2, 3]) @ np.array([4, 5, 6]) 
# 32
np.array([1, 2, 3]) @ [4, 5, 6]
# 32

Also:

Slightly more informative error messages on failed Series/DataFrame init, for example:

  • Before: "DataFrame constructor not called properly."
  • After: f"DataFrame constructor called with unsupported type; got {type(data)}"

I admit to being tempted to try proper 2D matrix multiplication support without taking on additional dependencies, but getting that fast is non-trivial! (There is a reason BLAS exists :). So, for now, just adding @ support for Series.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Nov 4, 2022
@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Nov 4, 2022

Odd CI error at the moment; will come back to this tomorrow (everything else looks good).

package ‘data.table’ is not available (for R version 3.5.3) 
[22](https://github.com/pola-rs/polars/actions/runs/3396784772/jobs/5648235387#step:7:23)
Error: Error in library(data.table) : there is no package called ‘data.table’

@ritchie46
Copy link
Member

Odd CI error at the moment; will come back to this tomorrow (everything else looks good).

Yes, I think our benchmark script is dead. :(

I admit to being tempted to try proper 2D matrix multiplication support without taking on additional dependencies, but getting that fast is non-trivial! (There is a reason BLAS exists :).

Yes, this is something we should do parallel on the Rust side. I certainly don't think we should beat BLASS. But we must be cheaper than copying to numpy + BLAS + copying back.

That sounds more doable. :)

@ritchie46 ritchie46 merged commit 903c7fb into pola-rs:master Nov 5, 2022
@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Nov 5, 2022

That sounds more doable. :)

Heh; I knocked up a trivial baseline out of morbid curiosity to see how far away the naive implementation is from the optimised one - only 250x slower, haha... "The only way forward is up..." :)

@alexander-beedie alexander-beedie deleted the series-dot-product branch November 5, 2022 17:58
zundertj pushed a commit to zundertj/polars that referenced this pull request Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants