Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Fix incorrect "coming from pandas" syntax #13767

Merged
merged 1 commit into from
Jan 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 14 additions & 28 deletions docs/user-guide/migration/pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,7 @@ and then joins the result back to the original `DataFrame` producing:
In Polars the same can be achieved with `window` functions:

```python
df.select(
pl.all(),
df.with_columns(
pl.col("type").count().over("c").alias("size")
)
```
Expand All @@ -266,17 +265,11 @@ shape: (7, 3)
│ i64 ┆ str ┆ u32 │
╞═════╪══════╪══════╡
│ 1 ┆ m ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ n ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ o ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ m ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ m ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ n ┆ 4 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ n ┆ 4 │
└─────┴──────┴──────┘
```
Expand All @@ -285,15 +278,14 @@ Because we can store the whole operation in a single expression, we can combine
`window` functions and even combine different groups!

Polars will cache window expressions that are applied over the same group, so storing
them in a single `select` is both convenient **and** optimal. In the following example
them in a single `with_columns` is both convenient **and** optimal. In the following example
we look at a case where we are calculating group statistics over `"c"` twice:

```python
df.select(
pl.all(),
df.with_columns(
pl.col("c").count().over("c").alias("size"),
pl.col("c").sum().over("type").alias("sum"),
pl.col("c").reverse().over("c").flatten().alias("reverse_type")
pl.col("type").reverse().over("c").alias("reverse_type")
Comment on lines -296 to +288
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this was a typo:

  • the original expression was aliased to "reverse_type", but never used type
  • reversing 'c' over 'c' is a no-op...

I've also removed .flatten as it doesn't do anything here

)
```

Expand All @@ -302,21 +294,15 @@ shape: (7, 5)
┌─────┬──────┬──────┬─────┬──────────────┐
│ c ┆ type ┆ size ┆ sum ┆ reverse_type │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ u32 ┆ i64 ┆ i64
│ i64 ┆ str ┆ u32 ┆ i64 ┆ str
╞═════╪══════╪══════╪═════╪══════════════╡
│ 1 ┆ m ┆ 3 ┆ 5 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ n ┆ 3 ┆ 5 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ o ┆ 3 ┆ 1 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ m ┆ 4 ┆ 5 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ m ┆ 4 ┆ 5 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ n ┆ 4 ┆ 5 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ n ┆ 4 ┆ 5 ┆ 1 │
│ 1 ┆ m ┆ 3 ┆ 5 ┆ o │
│ 1 ┆ n ┆ 3 ┆ 5 ┆ n │
│ 1 ┆ o ┆ 3 ┆ 1 ┆ m │
│ 2 ┆ m ┆ 4 ┆ 5 ┆ n │
│ 2 ┆ m ┆ 4 ┆ 5 ┆ n │
│ 2 ┆ n ┆ 4 ┆ 5 ┆ m │
│ 2 ┆ n ┆ 4 ┆ 5 ┆ m │
└─────┴──────┴──────┴─────┴──────────────┘
```

Expand Down Expand Up @@ -355,7 +341,7 @@ def add_ham(df: pd.DataFrame) -> pd.DataFrame:
.pipe(add_foo)
.pipe(add_bar)
.pipe(add_ham)
)
)
```

If we do this in polars, we would create 3 `with_column` contexts, that forces Polars to run the 3 pipes sequentially,
Expand Down Expand Up @@ -407,7 +393,7 @@ def get_ham(input_column: str) -> pl.Expr:
return pl.col(input_column).some_computation().alias("ham")

# Use pipe (just once) to get hold of the schema of the LazyFrame.
lf.pipe(lambda lf.with_columns(
lf.pipe(lambda lf: lf.with_columns(
get_ham("col_a"),
get_bar("col_b", lf.schema),
get_foo("col_c", lf.schema),
Expand Down