Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve streaming section of the user guide #13750

Merged
merged 3 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/_build/API_REFERENCE_LINKS.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ rust:
concat: https://docs.pola.rs/docs/rust/dev/polars_lazy/dsl/functions/fn.concat.html
SQLContext: https://docs.pola.rs/py-polars/html/reference/sql.html

explain: https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.explain

operators: https://docs.pola.rs/docs/rust/dev/polars_lazy/dsl/enum.Operator.html

Array: https://docs.pola.rs/docs/rust/dev/polars/datatypes/enum.DataType.html#variant.Array
Expand Down
20 changes: 17 additions & 3 deletions docs/src/python/user-guide/concepts/streaming.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@
# --8<-- [start:import]
import polars as pl
# --8<-- [end:import]

# --8<-- [start:streaming]
q = (
q1 = (
pl.scan_csv("docs/data/iris.csv")
.filter(pl.col("sepal_length") > 5)
.group_by("species")
.agg(pl.col("sepal_width").mean())
)

df = q.collect(streaming=True)
df = q1.collect(streaming=True)
# --8<-- [end:streaming]

# --8<-- [start:example]
print(q1.explain(streaming=True))

# --8<-- [end:example]

# --8<-- [start:example2]
q2 = pl.scan_csv("docs/data/iris.csv").with_columns(
pl.col("sepal_length").mean().over("species")
)

print(q2.explain(streaming=True))
# --8<-- [end:example2]
21 changes: 19 additions & 2 deletions docs/src/rust/user-guide/concepts/streaming.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,33 @@ use polars::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
// --8<-- [start:streaming]
let q = LazyCsvReader::new("docs/data/iris.csv")
let q1 = LazyCsvReader::new("docs/data/iris.csv")
.has_header(true)
.finish()?
.filter(col("sepal_length").gt(lit(5)))
.group_by(vec![col("species")])
.agg([col("sepal_width").mean()]);

let df = q.with_streaming(true).collect()?;
let df = q1.clone().with_streaming(true).collect()?;
println!("{}", df);
// --8<-- [end:streaming]

// --8<-- [start:example]
let query_plan = q1.with_streaming(true).explain(true)?;
println!("{}", query_plan);
// --8<-- [end:example]

// --8<-- [start:example2]
let q2 = LazyCsvReader::new("docs/data/iris.csv")
.finish()?
.with_columns(vec![col("sepal_length")
.mean()
.over(vec![col("species")])
.alias("sepal_length_mean")]);

let query_plan = q2.with_streaming(true).explain(true)?;
println!("{}", query_plan);
// --8<-- [end:example2]

Ok(())
}
24 changes: 24 additions & 0 deletions docs/user-guide/concepts/streaming.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,30 @@ Streaming is supported for many operations including:
- `with_columns`,`select`
- `group_by`
- `join`
- `unique`
- `sort`
- `explode`,`melt`
- `scan_csv`,`scan_parquet`,`scan_ipc`

This list is not exhaustive. Polars is in active development, and more operations can be added without explicit notice.

### Example with supported operations

To determine which parts of your query are streaming, use the `explain` method. Below is an example that demonstrates how to inspect the query plan. More information about the query plan can be found in the chapter on the [Lazy API](https://docs.pola.rs/user-guide/lazy/query-plan/).

{{code_block('user-guide/concepts/streaming', 'example',['explain'])}}

```python exec="on" result="text" session="user-guide/streaming"
--8<-- "python/user-guide/concepts/streaming.py:import"
--8<-- "python/user-guide/concepts/streaming.py:streaming"
--8<-- "python/user-guide/concepts/streaming.py:example"
```

### Example with non-streaming operations

{{code_block('user-guide/concepts/streaming', 'example2',['explain'])}}

```python exec="on" result="text" session="user-guide/streaming"
--8<-- "python/user-guide/concepts/streaming.py:import"
--8<-- "python/user-guide/concepts/streaming.py:example2"
```