pola-rs · c-peters · Jan 24, 2024 · Jan 11, 2024 · Jan 11, 2024 · Jan 11, 2024
@@ -1,4 +1,4 @@
-{% extends "main.html" %}
+{% extends "base.html" %}
 {% block content %}
 <div>
    <svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"
@@ -217,6 +217,6 @@ <h2>404 - You're lost.</h2>
       How you got here is a mystery. But you can click the button below
       to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.
    </p>
-   <a class="md-button" href="/polars">Home</a>
+   <a class="md-button" href="https://docs.pola.rs">Home</a>
 </div>
 {% endblock %}
@@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function.
 ## Python
 
 The Python API reference is built using Sphinx.
-It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html).
+It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html).
 
 ## Rust
 

@@ -1,19 +1,12 @@
----
-hide:
-  - navigation
----
-
-# Polars
-
 ![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg)
 
 <h1 style="text-align:center">Blazingly Fast DataFrame Library </h1>
 <div align="center">
   <a href="https://docs.rs/polars/latest/polars/">
-    <img src="https://docs.rs/polars/badge.svg" alt="rust docs"/>
+    <img src="https://docs.rs/polars/badge.svg" alt="Rust docs latest"/>
   </a>
   <a href="https://crates.io/crates/polars">
-    <img src="https://img.shields.io/crates/v/polars.svg"/>
+    <img src="https://img.shields.io/crates/v/polars.svg" alt="Rust crates Latest Release"/>
   </a>
   <a href="https://pypi.org/project/polars/">
     <img src="https://img.shields.io/pypi/v/polars.svg" alt="PyPI Latest Release"/>
@@ -23,26 +16,42 @@ hide:
   </a>
 </div>
 
-Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
+Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS.
 
-- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies.
+## Key features
+
+- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies.
 - **I/O**: First class support for all common data storage layers: local, cloud storage & databases.
-- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
-- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
-- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
-- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage.
+- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
+- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time
+- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
+- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
+
+<!-- dprint-ignore-start -->
 
-## Performance :rocket: :rocket:
+!!! info "Users new to DataFrames"
+    A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering.
 
-Polars is very fast, and in fact is one of the best performing solutions available.
-See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project.
+<!-- dprint-ignore-end -->
 
-Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website.
+## Philosophy
+
+The goal of Polars is to provide a lightning fast DataFrame library that:
+
+- Utilizes all available cores on your machine.
+- Optimizes queries to reduce unneeded work/memory allocations.
+- Handles datasets much larger than your available RAM.
+- A consistent and predictable API.
+- Adheres to a strict schema (data-types should be known before running the query).
+
+Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.
 
 ## Example
 
 {{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}}
 
+A more extensive introduction can be found in the [next chapter](user-guide/getting-started.md).
+
 ## Community
 
 Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:

@@ -5,6 +5,7 @@ matplotlib
 
 mkdocs-material==9.5.2
 mkdocs-macros-plugin==1.0.4
+mkdocs-redirects==1.2.1
 material-plausible-plugin==0.2.0
 markdown-exec[ansi]==1.7.0
 PyGithub==2.1.1
@@ -6,19 +6,16 @@
 
 df = pl.DataFrame(
     {
-        "a": range(8),
-        "b": np.random.rand(8),
+        "a": range(5),
+        "b": np.random.rand(5),
         "c": [
-            datetime(2022, 12, 1),
-            datetime(2022, 12, 2),
-            datetime(2022, 12, 3),
-            datetime(2022, 12, 4),
-            datetime(2022, 12, 5),
-            datetime(2022, 12, 6),
-            datetime(2022, 12, 7),
-            datetime(2022, 12, 8),
+            datetime(2025, 12, 1),
+            datetime(2025, 12, 2),
+            datetime(2025, 12, 3),
+            datetime(2025, 12, 4),
+            datetime(2025, 12, 5),
         ],
-        "d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None],
+        "d": [1, 2.0, float("nan"), -42, None],
     }
 )
 # --8<-- [end:setup]
@@ -36,12 +33,12 @@
 # --8<-- [end:select3]
 
 # --8<-- [start:exclude]
-df.select(pl.exclude("a"))
+df.select(pl.exclude(["a", "c"]))
 # --8<-- [end:exclude]
 
 # --8<-- [start:filter]
 df.filter(
-    pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)),
+    pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)),
 )
 # --8<-- [end:filter]
 

@@ -6,11 +6,12 @@
     {
         "integer": [1, 2, 3],
         "date": [
-            datetime(2022, 1, 1),
-            datetime(2022, 1, 2),
-            datetime(2022, 1, 3),
+            datetime(2025, 1, 1),
+            datetime(2025, 1, 2),
+            datetime(2025, 1, 3),
         ],
         "float": [4.0, 5.0, 6.0],
+        "string": ["a", "b", "c"],
     }
 )
 

@@ -6,19 +6,16 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut rng = rand::thread_rng();
 
     let df: DataFrame = df!(
-        "a" => 0..8,
-        "b"=> (0..8).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
+        "a" => 0..5,
+        "b"=> (0..5).map(|_| rng.gen::<f64>()).collect::<Vec<f64>>(),
         "c"=> [
-            NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-            NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+            NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(),
         ],
-        "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None]
+        "d"=> [Some(1.0), Some(2.0), None, Some(-42.), None]
     )
     .unwrap();
 
@@ -46,17 +43,17 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let out = df
         .clone()
         .lazy()
-        .select([col("*").exclude(["a"])])
+        .select([col("*").exclude(["a", "c"])])
         .collect()?;
     println!("{}", out);
     // --8<-- [end:exclude]
 
     // --8<-- [start:filter]
-    let start_date = NaiveDate::from_ymd_opt(2022, 12, 2)
+    let start_date = NaiveDate::from_ymd_opt(2025, 12, 2)
         .unwrap()
         .and_hms_opt(0, 0, 0)
         .unwrap();
-    let end_date = NaiveDate::from_ymd_opt(2022, 12, 8)
+    let end_date = NaiveDate::from_ymd_opt(2025, 12, 3)
         .unwrap()
         .and_hms_opt(0, 0, 0)
         .unwrap();

@@ -9,9 +9,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
     let mut df: DataFrame = df!(
         "integer" => &[1, 2, 3],
         "date" => &[
-                NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-                NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
-                NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
+                NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
         ],
         "float" => &[4.0, 5.0, 6.0]
     )