Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Faster backward- and forward_fill() functions #16875

Open
Chuck321123 opened this issue Jun 11, 2024 · 4 comments
Open

Feature request: Faster backward- and forward_fill() functions #16875

Chuck321123 opened this issue Jun 11, 2024 · 4 comments
Labels
enhancement New feature or an improvement of an existing feature performance Performance issues or improvements

Comments

@Chuck321123
Copy link

Description

So maybe not the highest priority right now, but I would be happy if we got faster backward- and forward_fill() functions as I think there are more optimization potential to these functions. By running this code:

import pandas as pd
import numpy as np
import polars as pl

np.random.seed(123)

n_rows = 100_000_000

random_numbers = np.random.rand(n_rows)
nan_mask = np.random.rand(n_rows) < 0.5
random_numbers[nan_mask] = np.nan

# Create DataFrame
df = pd.DataFrame({
    'RandomNumbers': random_numbers
})

print(df.head(10))

df = pl.DataFrame(df)

df = df.with_columns(pl.col("RandomNumbers").fill_nan(None).alias("Results"))

%timeit df.with_columns(pl.col("RandomNumbers").backward_fill().alias("Results"))

I get these benchmarks: 1.17 s ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Obviously, it becomes even slower if you forward and backward fill over groups. Would be nice if someone could find a way to improve these functions.

@Chuck321123 Chuck321123 added the enhancement New feature or an improvement of an existing feature label Jun 11, 2024
@cmdlineluser
Copy link
Contributor

Just for reference: #15480 (comment)

More improvement with branchless filling is possible still but low priority at the moment, as it's rather labour-intensive to write.

@deanm0000
Copy link
Collaborator

did you mean for this to be a future request instead of a feature request?

@Chuck321123
Copy link
Author

Chuck321123 commented Jun 11, 2024

@deanm0000 My bad for the misspelling. In reality it's an optimization request

@Chuck321123 Chuck321123 changed the title Future request: Faster backward- and forward_fill() functions Feature request: Faster backward- and forward_fill() functions Jun 11, 2024
@alexander-beedie alexander-beedie added the performance Performance issues or improvements label Jun 12, 2024
@lukemanley
Copy link
Contributor

It looks like #20689 probably helps here? Maybe closes this issue?

Note comment: #20669 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature performance Performance issues or improvements
Projects
None yet
Development

No branches or pull requests

5 participants