From 6dc4dfc5b9aac1aa46517ef142d64f954748d4c1 Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 13:09:34 +0100 Subject: [PATCH 01/15] add getting-started section and remove basics chapter --- docs/index.md | 5 - docs/user-guide/basics/index.md | 18 ---- docs/user-guide/basics/joins.md | 26 ----- docs/user-guide/basics/reading-writing.md | 45 -------- .../expressions.md => getting-started.md} | 101 +++++++++++++++++- docs/user-guide/index.md | 39 ------- docs/user-guide/overview.md | 53 +++++++++ mkdocs.yml | 13 +-- 8 files changed, 158 insertions(+), 142 deletions(-) delete mode 100644 docs/user-guide/basics/index.md delete mode 100644 docs/user-guide/basics/joins.md delete mode 100644 docs/user-guide/basics/reading-writing.md rename docs/user-guide/{basics/expressions.md => getting-started.md} (52%) delete mode 100644 docs/user-guide/index.md create mode 100644 docs/user-guide/overview.md diff --git a/docs/index.md b/docs/index.md index 2c72f776edbb..cad8fc9b0322 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,8 +1,3 @@ ---- -hide: - - navigation ---- - # Polars ![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg) diff --git a/docs/user-guide/basics/index.md b/docs/user-guide/basics/index.md deleted file mode 100644 index af73c7967574..000000000000 --- a/docs/user-guide/basics/index.md +++ /dev/null @@ -1,18 +0,0 @@ -# Introduction - -This chapter is intended for new Polars users. -The goal is to provide a quick overview of the most common functionality. -Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details. - -!!! rust "Rust Users Only" - - Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md). - - To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars - ``` - # Cargo.toml - [dependencies] - polars = { version = "x", features = ["lazy", ...]} - ``` - - Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`). diff --git a/docs/user-guide/basics/joins.md b/docs/user-guide/basics/joins.md deleted file mode 100644 index 21cb927164a9..000000000000 --- a/docs/user-guide/basics/joins.md +++ /dev/null @@ -1,26 +0,0 @@ -# Combining DataFrames - -There are two ways `DataFrame`s can be combined depending on the use case: join and concat. - -## Join - -Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example. - -{{code_block('user-guide/basics/joins','join',['join'])}} - -```python exec="on" result="text" session="getting-started/joins" ---8<-- "python/user-guide/basics/joins.py:setup" ---8<-- "python/user-guide/basics/joins.py:join" -``` - -To see more examples with other types of joins, go the [User Guide](../transformations/joins.md). - -## Concat - -We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`. - -{{code_block('user-guide/basics/joins','hstack',['hstack'])}} - -```python exec="on" result="text" session="getting-started/joins" ---8<-- "python/user-guide/basics/joins.py:hstack" -``` diff --git a/docs/user-guide/basics/reading-writing.md b/docs/user-guide/basics/reading-writing.md deleted file mode 100644 index 8999f601e823..000000000000 --- a/docs/user-guide/basics/reading-writing.md +++ /dev/null @@ -1,45 +0,0 @@ -# Reading & writing - -Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe - -{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:dataframe" -``` - -#### CSV - -Polars has its own fast implementation for csv reading with many flexible configuration options. - -{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:csv" -``` - -As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below: - -{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:csv2" -``` - -#### JSON - -{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:json" -``` - -#### Parquet - -{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:parquet" -``` - -To see more examples and other data formats go to the [User Guide](../io/csv.md), section IO. diff --git a/docs/user-guide/basics/expressions.md b/docs/user-guide/getting-started.md similarity index 52% rename from docs/user-guide/basics/expressions.md rename to docs/user-guide/getting-started.md index 0277d3da72f6..52e9b078b3b7 100644 --- a/docs/user-guide/basics/expressions.md +++ b/docs/user-guide/getting-started.md @@ -1,13 +1,81 @@ -# Expressions +# Getting started +This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md). -`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: +## Installing Polars + +=== ":fontawesome-brands-python: Python" + + ``` bash + pip install polars + ``` + +=== ":fontawesome-brands-rust: Rust" + + ``` shell + cargo add polars -F lazy + + # Or Cargo.toml + [dependencies] + polars = { version = "x", features = ["lazy", ...]} + ``` + +## Reading & writing + +Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe + +{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:dataframe" +``` + +### CSV + +Polars has its own fast implementation for csv reading with many flexible configuration options. + +{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:csv" +``` + +As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below: + +{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:csv2" +``` + +### JSON + +{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:json" +``` + +### Parquet + +{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:parquet" +``` + +To see more examples and other data formats go to the [User Guide](io/csv.md), section IO. + + +## Expressions + +`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: - `select` - `filter` - `with_columns` - `group_by` -To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md). +To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md). ### Select statement @@ -128,3 +196,30 @@ Below are some examples on how to combine operations to create the `DataFrame` y ```python exec="on" result="text" session="getting-started/expressions" --8<-- "python/user-guide/basics/expressions.py:combine2" ``` + +## Combining DataFrames + +There are two ways `DataFrame`s can be combined depending on the use case: join and concat. + +### Join + +Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example. + +{{code_block('user-guide/basics/joins','join',['join'])}} + +```python exec="on" result="text" session="getting-started/joins" +--8<-- "python/user-guide/basics/joins.py:setup" +--8<-- "python/user-guide/basics/joins.py:join" +``` + +To see more examples with other types of joins, see the [Transformations section](transformations/joins.md) in the user guide. + +### Concat + +We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`. + +{{code_block('user-guide/basics/joins','hstack',['hstack'])}} + +```python exec="on" result="text" session="getting-started/joins" +--8<-- "python/user-guide/basics/joins.py:hstack" +``` diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md deleted file mode 100644 index 442029472d80..000000000000 --- a/docs/user-guide/index.md +++ /dev/null @@ -1,39 +0,0 @@ -# Introduction - -This user guide is an introduction to the [Polars DataFrame library](https://github.com/pola-rs/polars). -Its goal is to introduce you to Polars by going through examples and comparing it to other solutions. -Some design choices are introduced here. The guide will also introduce you to optimal usage of Polars. - -The Polars user guide is intended to live alongside the API documentation ([Python](https://docs.pola.rs/py-polars/html/reference/index.html) / [Rust](https://docs.rs/polars/latest/polars/)), which offers detailed descriptions of specific objects and functions. - -Even though Polars is completely written in [Rust](https://www.rust-lang.org/) (no runtime overhead!) and uses [Arrow](https://arrow.apache.org/) -- the [native arrow2 Rust implementation](https://github.com/jorgecarleitao/arrow2) -- as its foundation, the examples presented in this guide will be mostly using its higher-level language bindings. -Higher-level bindings only serve as a thin wrapper for functionality implemented in the core library. - -For [pandas](https://pandas.pydata.org/) users, our [Python package](https://pypi.org/project/polars/) will offer the easiest way to get started with Polars. - -### Philosophy - -The goal of Polars is to provide a lightning fast `DataFrame` library that: - -- Utilizes all available cores on your machine. -- Optimizes queries to reduce unneeded work/memory allocations. -- Handles datasets much larger than your available RAM. -- Has an API that is consistent and predictable. -- Has a strict schema (data-types should be known before running the query). - -Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts -in a query engine. - -As such Polars goes to great lengths to: - -- Reduce redundant copies. -- Traverse memory cache efficiently. -- Minimize contention in parallelism. -- Process data in chunks. -- Reuse memory allocations. - -!!! rust "Note" - - The Rust examples in this guide are synchronized with the main branch of the Polars repository, rather than the latest Rust release. - You may not be able to copy-paste code examples and use them with the latest release. - We aim to solve this in the future. diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md new file mode 100644 index 000000000000..3b480fcc8225 --- /dev/null +++ b/docs/user-guide/overview.md @@ -0,0 +1,53 @@ +# Overview + +![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg) + +

Blazingly Fast DataFrame Library

+
+ + rust docs + + + + + + PyPI Latest Release + + + DOI Latest Release + +
+ +Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are: + +- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies. +- **I/O**: First class support for all common data storage layers: local, cloud storage & databases. +- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. +- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time +- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. +- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage. + +## Performance :rocket: :rocket: + +Polars is very fast, and in fact is one of the best performing solutions available. +See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project. + +Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website. + +## Example + +{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} + +## Community + +Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project: + +--8<-- "docs/people.md" + +## Contributing + +We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](../development/contributing/index.md) to learn more. + +## License + +This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE). diff --git a/mkdocs.yml b/mkdocs.yml index 9918d5c2e8f3..bb3be85cb230 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -11,13 +11,14 @@ nav: - Home: index.md - User guide: - - user-guide/index.md + - user-guide/overview.md + - user-guide/getting-started.md - user-guide/installation.md - - Basics: - - user-guide/basics/index.md - - user-guide/basics/reading-writing.md - - user-guide/basics/expressions.md - - user-guide/basics/joins.md + # - Basics: + # - user-guide/basics/index.md + # - user-guide/basics/reading-writing.md + # - user-guide/basics/expressions.md + # - user-guide/basics/joins.md - Concepts: - Data types: - user-guide/concepts/data-types/overview.md From 0eab18cb498ae5f66f145c8a6f25c3eb8cf5fd4c Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:12:19 +0100 Subject: [PATCH 02/15] rewrite from home to overview - keeping the basics in place for now --- docs/_build/overrides/404.html | 2 +- docs/index.md | 53 ---------------------------------- docs/user-guide/overview.md | 33 ++++++++++++++------- mkdocs.yml | 9 +----- 4 files changed, 25 insertions(+), 72 deletions(-) delete mode 100644 docs/index.md diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html index ee9b8faa2aba..a216b32dfc5f 100644 --- a/docs/_build/overrides/404.html +++ b/docs/_build/overrides/404.html @@ -217,6 +217,6 @@

404 - You're lost.

How you got here is a mystery. But you can click the button below to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.

- Home + Home {% endblock %} diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index cad8fc9b0322..000000000000 --- a/docs/index.md +++ /dev/null @@ -1,53 +0,0 @@ -# Polars - -![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg) - -

Blazingly Fast DataFrame Library

-
- - rust docs - - - - - - PyPI Latest Release - - - DOI Latest Release - -
- -Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are: - -- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies. -- **I/O**: First class support for all common data storage layers: local, cloud storage & databases. -- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. -- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time -- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. -- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage. - -## Performance :rocket: :rocket: - -Polars is very fast, and in fact is one of the best performing solutions available. -See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project. - -Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website. - -## Example - -{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} - -## Community - -Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project: - ---8<-- "docs/people.md" - -## Contributing - -We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more. - -## License - -This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE). diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 3b480fcc8225..77340e013e89 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -18,26 +18,39 @@ -Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are: +Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS. -- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies. +## Key features +- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies. - **I/O**: First class support for all common data storage layers: local, cloud storage & databases. -- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. -- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time -- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. -- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage. +- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. +- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time +- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. +- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. -## Performance :rocket: :rocket: -Polars is very fast, and in fact is one of the best performing solutions available. -See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project. +!!! info "Users new to Dataframes" + A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. -Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website. + +## Philosophy + +The goal of Polars is to provide a lightning fast DataFrame library that: + +- Utilizes all available cores on your machine. +- Optimizes queries to reduce unneeded work/memory allocations. +- Handles datasets much larger than your available RAM. +- A consistent and predictable API. +- Strict schema (data-types should be known before running the query). + +Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine. ## Example {{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} +A more extensive introduction can be found in the [next chapter](/user-guide/getting-started). + ## Community Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project: diff --git a/mkdocs.yml b/mkdocs.yml index bb3be85cb230..1eadf554f927 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -2,23 +2,16 @@ # Project information site_name: Polars -site_url: https://docs.pola.rs +site_url: https://docs.pola.rs/ repo_url: https://github.com/pola-rs/polars repo_name: pola-rs/polars # Documentation layout nav: - - Home: index.md - - User guide: - user-guide/overview.md - user-guide/getting-started.md - user-guide/installation.md - # - Basics: - # - user-guide/basics/index.md - # - user-guide/basics/reading-writing.md - # - user-guide/basics/expressions.md - # - user-guide/basics/joins.md - Concepts: - Data types: - user-guide/concepts/data-types/overview.md From 80a9c06e9be4985608edde362afd8e2d419cfd23 Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:15:58 +0100 Subject: [PATCH 03/15] improve accessibility of page --- docs/user-guide/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 77340e013e89..bcc243cbf3f5 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -5,10 +5,10 @@

Blazingly Fast DataFrame Library

- rust docs + Rust docs latest - + Rust crates Latest Release PyPI Latest Release From d7bf6f43b915b41755201595241bb578fa423a2b Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:20:13 +0100 Subject: [PATCH 04/15] minor textual changes --- docs/api/index.md | 2 +- docs/user-guide/overview.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/api/index.md b/docs/api/index.md index 004799cae1b4..485b59923ad1 100644 --- a/docs/api/index.md +++ b/docs/api/index.md @@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function. ## Python The Python API reference is built using Sphinx. -It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html). +It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html). ## Rust diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index bcc243cbf3f5..bd69a7a9c75b 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -41,7 +41,7 @@ The goal of Polars is to provide a lightning fast DataFrame library that: - Optimizes queries to reduce unneeded work/memory allocations. - Handles datasets much larger than your available RAM. - A consistent and predictable API. -- Strict schema (data-types should be known before running the query). +- Adheres to a strict schema (data-types should be known before running the query). Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine. From 8760e412d53aa4350c13eed8119c41d6c31ca3ec Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:23:45 +0100 Subject: [PATCH 05/15] formatting --- docs/user-guide/getting-started.md | 2 +- docs/user-guide/overview.md | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md index 52e9b078b3b7..fca8a951089e 100644 --- a/docs/user-guide/getting-started.md +++ b/docs/user-guide/getting-started.md @@ -1,4 +1,5 @@ # Getting started + This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md). ## Installing Polars @@ -65,7 +66,6 @@ As we can see above, Polars made the datetimes a `string`. We can tell Polars to To see more examples and other data formats go to the [User Guide](io/csv.md), section IO. - ## Expressions `Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index bd69a7a9c75b..41bb32795209 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -21,6 +21,7 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS. ## Key features + - **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies. - **I/O**: First class support for all common data storage layers: local, cloud storage & databases. - **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. @@ -28,10 +29,8 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T - **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. - **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. - !!! info "Users new to Dataframes" - A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. - +A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. ## Philosophy From dcfb7c82054eb70cb9828f035e4fe3cf66ad86b4 Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:28:34 +0100 Subject: [PATCH 06/15] logo click goes to user guide overview now --- mkdocs.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/mkdocs.yml b/mkdocs.yml index 1eadf554f927..0adf527e21b6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -138,6 +138,7 @@ extra: analytics: provider: plausible domain: guide.pola.rs,combined.pola.rs + homepage: https://docs.pola.rs/user-guide/overview/ # Preview controls strict: true From 1f8c9d50f8b55d74fe1a96f2f6eb3c7dde408fa0 Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 14:31:26 +0100 Subject: [PATCH 07/15] dprint fmt breaks info box --- docs/user-guide/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 41bb32795209..50dda6be1e77 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -29,8 +29,8 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T - **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. - **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. -!!! info "Users new to Dataframes" -A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. +!!! info "Users new to DataFrames" + A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. ## Philosophy From 29be17846991fd4896cd93166fc402b747b332c0 Mon Sep 17 00:00:00 2001 From: r-brink Date: Thu, 11 Jan 2024 15:26:09 +0100 Subject: [PATCH 08/15] fix link --- docs/user-guide/overview.md | 6 +++++- mkdocs.yml | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md index 50dda6be1e77..f76eb0e1a1d3 100644 --- a/docs/user-guide/overview.md +++ b/docs/user-guide/overview.md @@ -29,9 +29,13 @@ Polars is a blazingly fast DataFrame library for manipulating structured data. T - **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. - **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. + + !!! info "Users new to DataFrames" A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. + + ## Philosophy The goal of Polars is to provide a lightning fast DataFrame library that: @@ -48,7 +52,7 @@ Polars is written in Rust which gives it C/C++ performance and allows it to full {{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} -A more extensive introduction can be found in the [next chapter](/user-guide/getting-started). +A more extensive introduction can be found in the [next chapter](getting-started.md). ## Community diff --git a/mkdocs.yml b/mkdocs.yml index 0adf527e21b6..4bc7dd115647 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -138,7 +138,7 @@ extra: analytics: provider: plausible domain: guide.pola.rs,combined.pola.rs - homepage: https://docs.pola.rs/user-guide/overview/ + homepage: /user-guide/overview/ # Preview controls strict: true From a4a9cb46e1adc6a1ebb3af862d621514769cf382 Mon Sep 17 00:00:00 2001 From: r-brink Date: Tue, 16 Jan 2024 12:52:12 +0100 Subject: [PATCH 09/15] getting started more to the point and index pages to support --- .../python/user-guide/basics/expressions.py | 23 +++---- .../user-guide/basics/reading-writing.py | 7 ++- .../src/rust/user-guide/basics/expressions.rs | 25 ++++---- .../rust/user-guide/basics/reading-writing.rs | 6 +- docs/user-guide/concepts/index.md | 12 ++++ docs/user-guide/expressions/index.md | 18 ++++++ docs/user-guide/getting-started.md | 61 ++++--------------- docs/user-guide/io/index.md | 12 ++++ docs/user-guide/lazy/index.md | 10 +++ docs/user-guide/transformations/index.md | 8 +++ mkdocs.yml | 6 ++ 11 files changed, 106 insertions(+), 82 deletions(-) create mode 100644 docs/user-guide/concepts/index.md create mode 100644 docs/user-guide/expressions/index.md create mode 100644 docs/user-guide/io/index.md create mode 100644 docs/user-guide/lazy/index.md create mode 100644 docs/user-guide/transformations/index.md diff --git a/docs/src/python/user-guide/basics/expressions.py b/docs/src/python/user-guide/basics/expressions.py index 041b023f27c4..12c6ea2170ec 100644 --- a/docs/src/python/user-guide/basics/expressions.py +++ b/docs/src/python/user-guide/basics/expressions.py @@ -6,19 +6,16 @@ df = pl.DataFrame( { - "a": range(8), - "b": np.random.rand(8), + "a": range(5), + "b": np.random.rand(5), "c": [ - datetime(2022, 12, 1), - datetime(2022, 12, 2), - datetime(2022, 12, 3), - datetime(2022, 12, 4), - datetime(2022, 12, 5), - datetime(2022, 12, 6), - datetime(2022, 12, 7), - datetime(2022, 12, 8), + datetime(2025, 12, 1), + datetime(2025, 12, 2), + datetime(2025, 12, 3), + datetime(2025, 12, 4), + datetime(2025, 12, 5), ], - "d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None], + "d": [1, 2.0, float("nan"), -42, None], } ) # --8<-- [end:setup] @@ -36,12 +33,12 @@ # --8<-- [end:select3] # --8<-- [start:exclude] -df.select(pl.exclude("a")) +df.select(pl.exclude(["a", "c"])) # --8<-- [end:exclude] # --8<-- [start:filter] df.filter( - pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)), + pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)), ) # --8<-- [end:filter] diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py index dc8a54ebd18f..f01fbba3fb30 100644 --- a/docs/src/python/user-guide/basics/reading-writing.py +++ b/docs/src/python/user-guide/basics/reading-writing.py @@ -6,11 +6,12 @@ { "integer": [1, 2, 3], "date": [ - datetime(2022, 1, 1), - datetime(2022, 1, 2), - datetime(2022, 1, 3), + datetime(2025, 1, 1), + datetime(2025, 1, 2), + datetime(2025, 1, 3), ], "float": [4.0, 5.0, 6.0], + "string": ["a", "b", "c"] } ) diff --git a/docs/src/rust/user-guide/basics/expressions.rs b/docs/src/rust/user-guide/basics/expressions.rs index ea6cae3c84af..757c52e3939f 100644 --- a/docs/src/rust/user-guide/basics/expressions.rs +++ b/docs/src/rust/user-guide/basics/expressions.rs @@ -6,19 +6,16 @@ fn main() -> Result<(), Box> { let mut rng = rand::thread_rng(); let df: DataFrame = df!( - "a" => 0..8, - "b"=> (0..8).map(|_| rng.gen::()).collect::>(), + "a" => 0..5, + "b"=> (0..5).map(|_| rng.gen::()).collect::>(), "c"=> [ - NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(), ], - "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None] + "d"=> [Some(1.0), Some(2.0), None, Some(-42.), None] ) .unwrap(); @@ -46,17 +43,17 @@ fn main() -> Result<(), Box> { let out = df .clone() .lazy() - .select([col("*").exclude(["a"])]) + .select([col("*").exclude(["a", "c"])]) .collect()?; println!("{}", out); // --8<-- [end:exclude] // --8<-- [start:filter] - let start_date = NaiveDate::from_ymd_opt(2022, 12, 2) + let start_date = NaiveDate::from_ymd_opt(2025, 12, 2) .unwrap() .and_hms_opt(0, 0, 0) .unwrap(); - let end_date = NaiveDate::from_ymd_opt(2022, 12, 8) + let end_date = NaiveDate::from_ymd_opt(2025, 12, 3) .unwrap() .and_hms_opt(0, 0, 0) .unwrap(); diff --git a/docs/src/rust/user-guide/basics/reading-writing.rs b/docs/src/rust/user-guide/basics/reading-writing.rs index 44c1a335428d..dad5e8713d24 100644 --- a/docs/src/rust/user-guide/basics/reading-writing.rs +++ b/docs/src/rust/user-guide/basics/reading-writing.rs @@ -9,9 +9,9 @@ fn main() -> Result<(), Box> { let mut df: DataFrame = df!( "integer" => &[1, 2, 3], "date" => &[ - NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), ], "float" => &[4.0, 5.0, 6.0] ) diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md new file mode 100644 index 000000000000..62dbd19fce2f --- /dev/null +++ b/docs/user-guide/concepts/index.md @@ -0,0 +1,12 @@ +# Concepts + +The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics: + +- Data types: + - [Overview](/data-types/overview.md) + - [Categoricals](data-types/categoricals.md) +- [Data structures](data-structures.md) +- [Contexts](contexts.md) +- [Expressions](expressions.md) +- [Lazy vs eager](lazy-vs-eager.md) +- [Streaming](streaming.md) \ No newline at end of file diff --git a/docs/user-guide/expressions/index.md b/docs/user-guide/expressions/index.md new file mode 100644 index 000000000000..c7e06cbe1863 --- /dev/null +++ b/docs/user-guide/expressions/index.md @@ -0,0 +1,18 @@ +# Expressions + +In the `Contexts` sections we outlined what `Expressions` are and how they are invaluable. In this section we will focus on the `Expressions` themselves. Each section gives an overview of what they do and provide additional examples. + +- [Operators](operators.md) +- [Column selections](column-selections.md) +- [Functions](functions.md) +- [Casting](casting.md) +- [Strings](strings.md) +- [Aggregation](aggregation.md) +- [Null](null.md) +- [Window](window.md) +- [Folds](folds.md) +- [Lists](lists.md) +- [Plugins](plugins.md) +- [User-defined functions](user-defined-functions.md) +- [Structs](structs.md) +- [Numpy](numpy.md) \ No newline at end of file diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md index fca8a951089e..0d5cee89fccd 100644 --- a/docs/user-guide/getting-started.md +++ b/docs/user-guide/getting-started.md @@ -22,7 +22,7 @@ This chapter is here to help you get started with Polars. It covers all the fund ## Reading & writing -Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe +Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we use csv as example to demonstrate foundational read/write operations. {{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} @@ -30,9 +30,9 @@ Polars supports reading and writing to all common files (e.g. csv, json, parquet --8<-- "python/user-guide/basics/reading-writing.py:dataframe" ``` -### CSV +### CSV example -Polars has its own fast implementation for csv reading with many flexible configuration options. +In this example we write the DataFrame to `output.csv`. After that we can read it back with `read_csv` and `print` the result for inspection. {{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} @@ -40,31 +40,7 @@ Polars has its own fast implementation for csv reading with many flexible config --8<-- "python/user-guide/basics/reading-writing.py:csv" ``` -As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below: - -{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:csv2" -``` - -### JSON - -{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:json" -``` - -### Parquet - -{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:parquet" -``` - -To see more examples and other data formats go to the [User Guide](io/csv.md), section IO. +For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide. ## Expressions @@ -79,7 +55,12 @@ To learn more about expressions and the context in which they operate, see the U ### Select statement -To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns. +To select a column we need to do two things: + +1. Define the `DataFrame` we want the data from. +2. Select the data that we need. + +In the example below you see that we select `col('*')`. The asterisk stands for all columns. {{code_block('user-guide/basics/expressions','select',['select'])}} @@ -100,25 +81,7 @@ print( ) ``` -The second option is to specify each column using `pl.col`. This option is shown below. - -{{code_block('user-guide/basics/expressions','select3',['select'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:select3" -) -``` - -If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement. - -{{code_block('user-guide/basics/expressions','exclude',['select'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:exclude" -) -``` +Follow these links to other parts of the User guide to learn more about [basic operations](expressions/operators.md) or [column selections](expressions/column-selections.md). ### Filter @@ -154,7 +117,7 @@ print( ) ``` -### Group by +### Group_by We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by. diff --git a/docs/user-guide/io/index.md b/docs/user-guide/io/index.md new file mode 100644 index 000000000000..5b0f35ff676f --- /dev/null +++ b/docs/user-guide/io/index.md @@ -0,0 +1,12 @@ +# IO + +Reading and writing your data is crucial for a DataFrame library. In this chapter you will learn more on how to read and write to different file formats that are supported by Polars. + +- [CSV](csv.md) +- [Excel](excel.md) +- [Parquet](parquet.md) +- [Json](json.md) +- [Multiple](multiple.md) +- [Database](database.md) +- [Cloud storage](cloud-storage.md) +- [Google Big Query](bigquery.md) \ No newline at end of file diff --git a/docs/user-guide/lazy/index.md b/docs/user-guide/lazy/index.md new file mode 100644 index 000000000000..8efc3b0fbb50 --- /dev/null +++ b/docs/user-guide/lazy/index.md @@ -0,0 +1,10 @@ +# Lazy + +The Lazy chapter is a guide for working with `LazyFrames`. It covers the functionalities like how to use it and how to optimise it. You can also find more information about the query plan or gain more insight in the streaming capabilities. + +- [Using lazy API](using.md) +- [Optimisations](optimizations.md) +- [Schemas](schemas.md) +- [Query plan](query-plan.md) +- [Execution](execution.md) +- [Streaming](streaming.md) \ No newline at end of file diff --git a/docs/user-guide/transformations/index.md b/docs/user-guide/transformations/index.md new file mode 100644 index 000000000000..eaf5cb9ca5f6 --- /dev/null +++ b/docs/user-guide/transformations/index.md @@ -0,0 +1,8 @@ +# Transformations + +The focus of this section is to describe different types of data transformations and provide some examples on how to use them. + +- [Joins](joins.md) +- [Concatenation](concatenation.md) +- [Pivot](pivot.md) +- [Melt](melt.md) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 4bc7dd115647..e5d6c26769b6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -13,6 +13,7 @@ nav: - user-guide/getting-started.md - user-guide/installation.md - Concepts: + - user-guide/concepts/index.md - Data types: - user-guide/concepts/data-types/overview.md - user-guide/concepts/data-types/categoricals.md @@ -22,6 +23,7 @@ nav: - user-guide/concepts/lazy-vs-eager.md - user-guide/concepts/streaming.md - Expressions: + - user-guide/expressions/index.md - user-guide/expressions/operators.md - user-guide/expressions/column-selections.md - user-guide/expressions/functions.md @@ -37,6 +39,7 @@ nav: - user-guide/expressions/structs.md - user-guide/expressions/numpy.md - Transformations: + - user-guide/transformations/index.md - user-guide/transformations/joins.md - user-guide/transformations/concatenation.md - user-guide/transformations/pivot.md @@ -48,6 +51,7 @@ nav: - user-guide/transformations/time-series/resampling.md - user-guide/transformations/time-series/timezones.md - Lazy API: + - user-guide/lazy/index.md - user-guide/lazy/using.md - user-guide/lazy/optimizations.md - user-guide/lazy/schemas.md @@ -55,6 +59,7 @@ nav: - user-guide/lazy/execution.md - user-guide/lazy/streaming.md - IO: + - user-guide/io/index.md - user-guide/io/csv.md - user-guide/io/excel.md - user-guide/io/parquet.md @@ -128,6 +133,7 @@ theme: - navigation.tabs - navigation.tabs.sticky - navigation.footer + - navigation.indexes - content.tabs.link icon: repo: fontawesome/brands/github From 06e84cab5b4e0e9292e5ea007567bedd0e7b9080 Mon Sep 17 00:00:00 2001 From: r-brink Date: Tue, 16 Jan 2024 13:00:09 +0100 Subject: [PATCH 10/15] formatting --- docs/user-guide/concepts/index.md | 6 +++--- docs/user-guide/expressions/index.md | 2 +- docs/user-guide/getting-started.md | 8 ++++---- docs/user-guide/io/index.md | 2 +- docs/user-guide/lazy/index.md | 2 +- docs/user-guide/transformations/index.md | 2 +- 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md index 62dbd19fce2f..e3eac9b2c70f 100644 --- a/docs/user-guide/concepts/index.md +++ b/docs/user-guide/concepts/index.md @@ -3,10 +3,10 @@ The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics: - Data types: - - [Overview](/data-types/overview.md) - - [Categoricals](data-types/categoricals.md) + - [Overview](data-types/overview.md) + - [Categoricals](data-types/categoricals.md) - [Data structures](data-structures.md) - [Contexts](contexts.md) - [Expressions](expressions.md) - [Lazy vs eager](lazy-vs-eager.md) -- [Streaming](streaming.md) \ No newline at end of file +- [Streaming](streaming.md) diff --git a/docs/user-guide/expressions/index.md b/docs/user-guide/expressions/index.md index c7e06cbe1863..3724e09ce15e 100644 --- a/docs/user-guide/expressions/index.md +++ b/docs/user-guide/expressions/index.md @@ -15,4 +15,4 @@ In the `Contexts` sections we outlined what `Expressions` are and how they are i - [Plugins](plugins.md) - [User-defined functions](user-defined-functions.md) - [Structs](structs.md) -- [Numpy](numpy.md) \ No newline at end of file +- [Numpy](numpy.md) diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md index 0d5cee89fccd..07fa84e3a479 100644 --- a/docs/user-guide/getting-started.md +++ b/docs/user-guide/getting-started.md @@ -40,7 +40,7 @@ In this example we write the DataFrame to `output.csv`. After that we can read i --8<-- "python/user-guide/basics/reading-writing.py:csv" ``` -For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide. +For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide. ## Expressions @@ -55,10 +55,10 @@ To learn more about expressions and the context in which they operate, see the U ### Select statement -To select a column we need to do two things: +To select a column we need to do two things: -1. Define the `DataFrame` we want the data from. -2. Select the data that we need. +1. Define the `DataFrame` we want the data from. +2. Select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns. diff --git a/docs/user-guide/io/index.md b/docs/user-guide/io/index.md index 5b0f35ff676f..5a3548871e8a 100644 --- a/docs/user-guide/io/index.md +++ b/docs/user-guide/io/index.md @@ -9,4 +9,4 @@ Reading and writing your data is crucial for a DataFrame library. In this chapte - [Multiple](multiple.md) - [Database](database.md) - [Cloud storage](cloud-storage.md) -- [Google Big Query](bigquery.md) \ No newline at end of file +- [Google Big Query](bigquery.md) diff --git a/docs/user-guide/lazy/index.md b/docs/user-guide/lazy/index.md index 8efc3b0fbb50..be731390f09c 100644 --- a/docs/user-guide/lazy/index.md +++ b/docs/user-guide/lazy/index.md @@ -7,4 +7,4 @@ The Lazy chapter is a guide for working with `LazyFrames`. It covers the functio - [Schemas](schemas.md) - [Query plan](query-plan.md) - [Execution](execution.md) -- [Streaming](streaming.md) \ No newline at end of file +- [Streaming](streaming.md) diff --git a/docs/user-guide/transformations/index.md b/docs/user-guide/transformations/index.md index eaf5cb9ca5f6..cd673786643c 100644 --- a/docs/user-guide/transformations/index.md +++ b/docs/user-guide/transformations/index.md @@ -5,4 +5,4 @@ The focus of this section is to describe different types of data transformations - [Joins](joins.md) - [Concatenation](concatenation.md) - [Pivot](pivot.md) -- [Melt](melt.md) \ No newline at end of file +- [Melt](melt.md) From d656bacb212fec972fe8600957bd07cca87539e3 Mon Sep 17 00:00:00 2001 From: r-brink Date: Tue, 23 Jan 2024 10:47:01 +0100 Subject: [PATCH 11/15] resolved feedback and formatting --- .../python/user-guide/basics/reading-writing.py | 2 +- docs/user-guide/getting-started.md | 16 +++++++--------- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py index f01fbba3fb30..68c0ab235fd1 100644 --- a/docs/src/python/user-guide/basics/reading-writing.py +++ b/docs/src/python/user-guide/basics/reading-writing.py @@ -11,7 +11,7 @@ datetime(2025, 1, 3), ], "float": [4.0, 5.0, 6.0], - "string": ["a", "b", "c"] + "string": ["a", "b", "c"], } ) diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md index 07fa84e3a479..3ae743114cf8 100644 --- a/docs/user-guide/getting-started.md +++ b/docs/user-guide/getting-started.md @@ -22,7 +22,7 @@ This chapter is here to help you get started with Polars. It covers all the fund ## Reading & writing -Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we use csv as example to demonstrate foundational read/write operations. +Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk. {{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} @@ -30,9 +30,7 @@ Polars supports reading and writing to all common files (e.g. csv, json, parquet --8<-- "python/user-guide/basics/reading-writing.py:dataframe" ``` -### CSV example - -In this example we write the DataFrame to `output.csv`. After that we can read it back with `read_csv` and `print` the result for inspection. +In the example below we write the DataFrame to a csv file called `output.csv`. After thatread it back with `read_csv` and `print` the result for inspection. {{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} @@ -40,11 +38,11 @@ In this example we write the DataFrame to `output.csv`. After that we can read i --8<-- "python/user-guide/basics/reading-writing.py:csv" ``` -For more examples on the CSV file format and other data formats, start here [IO section on CSV](io/csv.md) of the User Guide. +For more examples on the CSV file format and other data formats, start with the [IO section](io/index.md) of the User Guide. ## Expressions -`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: +`Expressions` are the core strength of Polars. The `expressions` offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: - `select` - `filter` @@ -53,7 +51,7 @@ For more examples on the CSV file format and other data formats, start here [IO To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md). -### Select statement +### Select To select a column we need to do two things: @@ -105,7 +103,7 @@ print( ) ``` -### With_columns +### Add columns `with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results. @@ -144,7 +142,7 @@ print( ) ``` -### Combining operations +### Combination Below are some examples on how to combine operations to create the `DataFrame` you require. From 733be36f3fb14dd31b8a065e1d31d6d9f8f83e37 Mon Sep 17 00:00:00 2001 From: chielP Date: Tue, 23 Jan 2024 14:43:03 +0100 Subject: [PATCH 12/15] nested list --- docs/user-guide/concepts/index.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md index e3eac9b2c70f..63a2ebeabe44 100644 --- a/docs/user-guide/concepts/index.md +++ b/docs/user-guide/concepts/index.md @@ -2,9 +2,8 @@ The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics: -- Data types: - - [Overview](data-types/overview.md) - - [Categoricals](data-types/categoricals.md) +- [Data Types: Overview](data-types/overview.md) +- [Data Types: Categoricals](data-types/categoricals.md) - [Data structures](data-structures.md) - [Contexts](contexts.md) - [Expressions](expressions.md) From df2d4cf3158e5906a655a83a228afc81a67750ff Mon Sep 17 00:00:00 2001 From: r-brink Date: Wed, 24 Jan 2024 11:49:46 +0100 Subject: [PATCH 13/15] rename overview to index and make new homepage --- docs/{user-guide/overview.md => index.md} | 2 -- mkdocs.yml | 7 +++---- 2 files changed, 3 insertions(+), 6 deletions(-) rename docs/{user-guide/overview.md => index.md} (99%) diff --git a/docs/user-guide/overview.md b/docs/index.md similarity index 99% rename from docs/user-guide/overview.md rename to docs/index.md index f76eb0e1a1d3..7217c434ea6f 100644 --- a/docs/user-guide/overview.md +++ b/docs/index.md @@ -1,5 +1,3 @@ -# Overview - ![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg)

Blazingly Fast DataFrame Library

diff --git a/mkdocs.yml b/mkdocs.yml index e5d6c26769b6..ed960fd9e709 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,15 +1,15 @@ # https://www.mkdocs.org/user-guide/configuration/ # Project information -site_name: Polars +site_name: Polars User Guide site_url: https://docs.pola.rs/ repo_url: https://github.com/pola-rs/polars repo_name: pola-rs/polars # Documentation layout nav: - - User guide: - - user-guide/overview.md + - User Guide: + - index.md - user-guide/getting-started.md - user-guide/installation.md - Concepts: @@ -144,7 +144,6 @@ extra: analytics: provider: plausible domain: guide.pola.rs,combined.pola.rs - homepage: /user-guide/overview/ # Preview controls strict: true From a0c2265279a0a6d0165bb817bb5fa80e65189967 Mon Sep 17 00:00:00 2001 From: r-brink Date: Wed, 24 Jan 2024 13:47:48 +0100 Subject: [PATCH 14/15] fixed deadlinks --- docs/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/index.md b/docs/index.md index 7217c434ea6f..16ec4a31e4ae 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,7 +50,7 @@ Polars is written in Rust which gives it C/C++ performance and allows it to full {{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} -A more extensive introduction can be found in the [next chapter](getting-started.md). +A more extensive introduction can be found in the [next chapter](user-guide/getting-started.md). ## Community @@ -60,7 +60,7 @@ Polars has a very active community with frequent releases (approximately weekly) ## Contributing -We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](../development/contributing/index.md) to learn more. +We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more. ## License From 5148a7fb2d4b4585b3b008414bd04856014541c2 Mon Sep 17 00:00:00 2001 From: chielP Date: Wed, 24 Jan 2024 14:43:27 +0100 Subject: [PATCH 15/15] redirect on old links --- docs/_build/overrides/404.html | 4 ++-- docs/requirements.txt | 1 + mkdocs.yml | 7 +++++++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html index a216b32dfc5f..986222099a22 100644 --- a/docs/_build/overrides/404.html +++ b/docs/_build/overrides/404.html @@ -1,4 +1,4 @@ -{% extends "main.html" %} +{% extends "base.html" %} {% block content %}
404 - You're lost. How you got here is a mystery. But you can click the button below to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.

- Home + Home
{% endblock %} diff --git a/docs/requirements.txt b/docs/requirements.txt index e0416d67440b..d0a5a5d8193f 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -5,6 +5,7 @@ matplotlib mkdocs-material==9.5.2 mkdocs-macros-plugin==1.0.4 +mkdocs-redirects==1.2.1 material-plausible-plugin==0.2.0 markdown-exec[ansi]==1.7.0 PyGithub==2.1.1 diff --git a/mkdocs.yml b/mkdocs.yml index ed960fd9e709..c26fdd20902e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -175,3 +175,10 @@ plugins: - material-plausible - macros: module_name: docs/_build/scripts/macro + - redirects: + redirect_maps: + 'user-guide/index.md': 'index.md' + 'user-guide/basics/index.md': 'user-guide/getting-started.md' + 'user-guide/basics/reading-writing.md': 'user-guide/getting-started.md' + 'user-guide/basics/expressions.md': 'user-guide/getting-started.md' + 'user-guide/basics/joins.md': 'user-guide/getting-started.md' \ No newline at end of file