Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improve structure of user guide #13951

Merged
merged 15 commits into from
Jan 24, 2024
Merged
4 changes: 2 additions & 2 deletions docs/_build/overrides/404.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% extends "main.html" %}
{% extends "base.html" %}
{% block content %}
<div>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
Expand Down Expand Up @@ -217,6 +217,6 @@ <h2>404 - You're lost.</h2>
How you got here is a mystery. But you can click the button below
to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
</p>
<a class="md-button" href="/polars">Home</a>
<a class="md-button" href="https://docs.pola.rs">Home</a>
</div>
{% endblock %}
2 changes: 1 addition & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function.
## Python

The Python API reference is built using Sphinx.
It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html).
It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html).

## Rust

Expand Down
47 changes: 28 additions & 19 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,12 @@
---
hide:
- navigation
---

# Polars

![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg)

<h1 style="text-align:center">Blazingly Fast DataFrame Library </h1>
<div align="center">
<a href="https://docs.rs/polars/latest/polars/">
<img src="https://docs.rs/polars/badge.svg" alt="rust docs"/>
<img src="https://docs.rs/polars/badge.svg" alt="Rust docs latest"/>
</a>
<a href="https://crates.io/crates/polars">
<img src="https://img.shields.io/crates/v/polars.svg"/>
<img src="https://img.shields.io/crates/v/polars.svg" alt="Rust crates Latest Release"/>
</a>
<a href="https://pypi.org/project/polars/">
<img src="https://img.shields.io/pypi/v/polars.svg" alt="PyPI Latest Release"/>
Expand All @@ -23,26 +16,42 @@ hide:
</a>
</div>

Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.

- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
## Key features

- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies.
- **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time
- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.

<!-- dprint-ignore-start -->

## Performance :rocket: :rocket:
!!! info "Users new to DataFrames"
A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.

Polars is very fast, and in fact is one of the best performing solutions available.
See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
<!-- dprint-ignore-end -->

Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
## Philosophy

The goal of Polars is to provide a lightning fast DataFrame library that:

- Utilizes all available cores on your machine.
- Optimizes queries to reduce unneeded work/memory allocations.
- Handles datasets much larger than your available RAM.
- A consistent and predictable API.
- Adheres to a strict schema (data-types should be known before running the query).

Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.

## Example

{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}

A more extensive introduction can be found in the [next chapter](user-guide/getting-started.md).

## Community

Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ matplotlib

mkdocs-material==9.5.2
mkdocs-macros-plugin==1.0.4
mkdocs-redirects==1.2.1
material-plausible-plugin==0.2.0
markdown-exec[ansi]==1.7.0
PyGithub==2.1.1
23 changes: 10 additions & 13 deletions docs/src/python/user-guide/basics/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,16 @@

df = pl.DataFrame(
{
"a": range(8),
"b": np.random.rand(8),
"a": range(5),
"b": np.random.rand(5),
"c": [
datetime(2022, 12, 1),
datetime(2022, 12, 2),
datetime(2022, 12, 3),
datetime(2022, 12, 4),
datetime(2022, 12, 5),
datetime(2022, 12, 6),
datetime(2022, 12, 7),
datetime(2022, 12, 8),
datetime(2025, 12, 1),
datetime(2025, 12, 2),
datetime(2025, 12, 3),
datetime(2025, 12, 4),
datetime(2025, 12, 5),
],
"d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None],
"d": [1, 2.0, float("nan"), -42, None],
}
)
# --8<-- [end:setup]
Expand All @@ -36,12 +33,12 @@
# --8<-- [end:select3]

# --8<-- [start:exclude]
df.select(pl.exclude("a"))
df.select(pl.exclude(["a", "c"]))
# --8<-- [end:exclude]

# --8<-- [start:filter]
df.filter(
pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)),
pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
)
# --8<-- [end:filter]

Expand Down
7 changes: 4 additions & 3 deletions docs/src/python/user-guide/basics/reading-writing.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
{
"integer": [1, 2, 3],
"date": [
datetime(2022, 1, 1),
datetime(2022, 1, 2),
datetime(2022, 1, 3),
datetime(2025, 1, 1),
datetime(2025, 1, 2),
datetime(2025, 1, 3),
],
"float": [4.0, 5.0, 6.0],
"string": ["a", "b", "c"],
}
)

Expand Down
25 changes: 11 additions & 14 deletions docs/src/rust/user-guide/basics/expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,16 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut rng = rand::thread_rng();

let df: DataFrame = df!(
"a" => 0..8,
"b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
"a" => 0..5,
"b"=> (0..5).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
"c"=> [
NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
"d"=> [Some(1.0), Some(2.0), None, Some(-42.), None]
)
.unwrap();

Expand Down Expand Up @@ -46,17 +43,17 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let out = df
.clone()
.lazy()
.select([col("*").exclude(["a"])])
.select([col("*").exclude(["a", "c"])])
.collect()?;
println!("{}", out);
// --8<-- [end:exclude]

// --8<-- [start:filter]
let start_date = NaiveDate::from_ymd_opt(2022, 12, 2)
let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
let end_date = NaiveDate::from_ymd_opt(2022, 12, 8)
let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
.unwrap()
.and_hms_opt(0, 0, 0)
.unwrap();
Expand Down
6 changes: 3 additions & 3 deletions docs/src/rust/user-guide/basics/reading-writing.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut df: DataFrame = df!(
"integer" => &[1, 2, 3],
"date" => &[
NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
],
"float" => &[4.0, 5.0, 6.0]
)
Expand Down
130 changes: 0 additions & 130 deletions docs/user-guide/basics/expressions.md

This file was deleted.

18 changes: 0 additions & 18 deletions docs/user-guide/basics/index.md

This file was deleted.

Loading