From 532fdef1b095f3e265aa8508e8185fd237e42ba3 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 12 Mar 2024 22:32:24 +0100 Subject: [PATCH 1/4] Tutorials: Improve guidance within article header sections - Introductory explainers / What's inside To give a better overview about the actual features used within the tutorial. - Guidance and layout In order not to use too much vertical space for the header information, use a two-column micro layout where applicable. For a better visual appearance where enumerating the tables' columns in detail, use field lists [1,2]. [1] https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#field-lists [2] https://myst-parser.readthedocs.io/en/latest/syntax/typography.html#field-lists --- docs/index.md | 2 +- docs/tutorials/full-text.md | 38 +++++++------ docs/tutorials/object.md | 18 +++--- docs/tutorials/time-series-advanced.md | 79 +++++++++++++++++++++----- docs/tutorials/time-series.md | 42 ++++++++++++-- 5 files changed, 131 insertions(+), 48 deletions(-) diff --git a/docs/index.md b/docs/index.md index d832558..13bc407 100644 --- a/docs/index.md +++ b/docs/index.md @@ -31,7 +31,7 @@ Learn how to sign up and get started with a free cluster. :link: tutorials :link-type: ref -Learn how to use some of the key features of CrateDB. +Learn how to use key features of CrateDB. ::: diff --git a/docs/tutorials/full-text.md b/docs/tutorials/full-text.md index 113362f..2815577 100644 --- a/docs/tutorials/full-text.md +++ b/docs/tutorials/full-text.md @@ -4,33 +4,37 @@ CrateDB is an exceptional choice for handling complex queries and large-scale data sets. One of its standout features is its full-text search capabilities, -built on top of the powerful Lucene library. This makes it a great fit for +using the BM25 ranking algorithm for information retrieval, built on top of +the powerful Lucene indexing library. This makes CrateDB an excellent fit for organizing, searching, and analyzing extensive datasets. In this tutorial, we will explore how to manage a dataset of Netflix titles, making use of CrateDB Cloud's full-text search capabilities. Each entry in our imaginary dataset will have the following attributes: -- `show_id`: A unique identifier for each show or movie. -- `type`: Specifies whether the title is a movie, TV show, or another format. -- `title`: The title of the movie or show. -- `director`: The name of the director. -- `cast`: An array listing the cast members. -- `country`: The country where the title was produced. -- `date_added`: A timestamp indicating when the title was added to the catalog. -- `release_year`: The year the title was released. -- `rating`: The content rating (e.g., PG, R, etc.). -- `duration`: The duration of the title in minutes or seasons. -- `listed_in`: An array containing genres that the title falls under. -- `description`: A textual description of the title, indexed using full-text search. - -To begin, let's create the schema for this dataset: +:show_id: A unique identifier for each show or movie. +:type: Specifies whether the title is a movie, TV show, or another format. +:title: The title of the movie or show. +:director: The name of the director. +:cast: An array listing the cast members. +:country: The country where the title was produced. +:date_added: A timestamp indicating when the title was added to the catalog. +:release_year: The year the title was released. +:rating: The content rating (e.g., PG, R, etc.). +:duration: The duration of the title in minutes or seasons. +:listed_in: An array containing genres that the title falls under. +:description: A textual description of the title, indexed using full-text search. + +To begin, let's create the schema for this dataset. + ## Creating the Table -CrateDB uses SQL, a powerful and familiar language for database management. To +CrateDB uses SQL, the most popular query language for database management. To store the data, create a table with columns tailored to the -dataset using the `CREATE TABLE` command. Importantly, you will also take advantage +dataset using the `CREATE TABLE` command. + +Importantly, you will also take advantage of CrateDB's full-text search capabilities by setting up a full-text index on the description column. This will enable you to perform complex textual queries later on. diff --git a/docs/tutorials/object.md b/docs/tutorials/object.md index ec25bff..9def81f 100644 --- a/docs/tutorials/object.md +++ b/docs/tutorials/object.md @@ -8,7 +8,7 @@ nested data efficiently. In this tutorial, we'll explore how to leverage this feature in marketing data analysis, along with the use of generated columns to parse and manage URLs. -Consider marketing data that captures details of various campaigns: +Consider marketing data that captures details of various campaigns. :::{code} json { @@ -23,11 +23,11 @@ Consider marketing data that captures details of various campaigns: } ::: -To begin, let's create the schema for this dataset: +To begin, let's create the schema for this dataset. ## Creating the Table -CrateDB uses SQL, a powerful and familiar language for database management. To +CrateDB uses SQL, the most popular query language for database management. To store the marketing data, create a table with columns tailored to the dataset using the `CREATE TABLE` command: @@ -45,14 +45,14 @@ CREATE TABLE marketing_data ( ); ::: -In this table definition: +Let's highlight two features in this table definition: -- The `metrics` column is set up as an `OBJECT` featuring a dynamic structure. - This enables you to perform flexible queries on its nested attributes like +:metrics: An `OBJECT` column featuring a dynamic structure for + performing flexible queries on its nested attributes like clicks, impressions, and conversion rate. -- Additionally, a generated column named `url_parts` is configured to - automatically parse the `landing_page_url`. This makes it more convenient for - you to query specific components of the URL later on. +:url_parts: A generated column to + decode an URL from the `landing_page_url` column. This is convenient + to query for specific components of the URL later on. The table is designed to accommodate both fixed and dynamic attributes, providing a robust and flexible structure for storing your marketing data. diff --git a/docs/tutorials/time-series-advanced.md b/docs/tutorials/time-series-advanced.md index bdf60de..42b82b4 100644 --- a/docs/tutorials/time-series-advanced.md +++ b/docs/tutorials/time-series-advanced.md @@ -2,27 +2,76 @@ # Analyzing Device Readings with Metadata Integration -CrateDB is highly regarded as an optimal database solution for managing time-series data thanks to its unique blend of features. It is particularly effective when you need to combine time-series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis. CrateDB supports effective time-series analysis with fast aggregations, a rich set of built-in functions, and `JOIN` operations. +CrateDB is highly regarded as an optimal database solution for managing +time-series data thanks to its unique blend of features. It is particularly +effective when you need to combine time-series data with metadata, for +instance, in scenarios where data like sensor readings or log entries, need +to be augmented with additional context for more insightful analysis. -In this tutorial, we will illustrate how to augment time-series data with the metadata to enable more comprehensive analysis. To get started let’s use a time-series dataset that captures various device readings, such as battery, CPU, and memory information. Each record includes: -- `ts` - timestamp when each reading was taken. -- `device_id` - identifier of the device. -- `battery` - object containing battery level, status, and temperature. -- `cpu` - object containing average CPU loads over the last 1, 5, and 15 minutes. -- `memory` - object containing information about the device's free and used memory. +:::::{grid} +:padding: 0 -The second dataset in this tutorial contains metadata information about various devices. Each record includes: +::::{grid-item} +:class: rubric-slimmer +:columns: auto 6 6 6 -- `device_id` - identifier of the device. -- `api_version` - version of the API that the device supports. -- `manufacturer` - name of the manufacturer of the device. -- `model` - model name of the device. -- `os_name` - the name of the operating system running on the device. +:::{rubric} About +::: + +CrateDB supports effective time-series analysis with enhanced features +for fast aggregations. + +- Rich data types for storing structured nested data (OBJECT) alongside + time series data. +- A rich set of built-in functions for aggregations. +- Relational JOIN operations. +- Common table expressions (CTEs). + +:::: + +::::{grid-item} +:class: rubric-slimmer +:columns: auto 6 6 6 + +:::{rubric} Data +::: +This tutorial illustrates how to effectively query time-series data with +metadata, in order to conduct comprehensive data analysis. + +It uses a time-series dataset that includes telemetry readings from appliances, +such as battery, CPU, and memory information, as well as metadata information +like manufacturer, model, and firmware version. +:::: + +::::: + + +## Creating the Tables + +CrateDB uses SQL, the most popular query language for database management. To +store the device readings and the device info data, define two tables with +columns tailored to the datasets. + +To get started, let’s use a time-series dataset that captures various device +readings, such as battery, CPU, and memory information. Each record includes: + +:ts: Timestamp when each reading was taken. +:device_id: Identifier of the device. +:battery: Object containing battery level, status, and temperature. +:cpu: Object containing average CPU loads over the last 1, 5, and 15 minutes. +:memory: Object containing information about the device's free and used memory. + +The second dataset in this tutorial contains metadata information about various +devices. Each record includes: -## Creating the Table +:device_id: Identifier of the device. +:api_version: Version of the API that the device supports. +:manufacturer: Name of the manufacturer of the device. +:model: Model name of the device. +:os_name: Name of the operating system running on the device. -CrateDB uses SQL, a powerful and familiar language for database management. To store the device readings and the device info data, create two tables with columns tailored to the datasets using the `CREATE TABLE` command: +Create the tables using the `CREATE TABLE` command: :::{code} sql CREATE TABLE IF NOT EXISTS doc.devices_readings ( diff --git a/docs/tutorials/time-series.md b/docs/tutorials/time-series.md index b7a94a4..f9538a9 100644 --- a/docs/tutorials/time-series.md +++ b/docs/tutorials/time-series.md @@ -7,18 +7,48 @@ which is managing time series data. Time series data refers to collections of data points recorded at specific intervals over time, like the hourly temperature of a city or the daily sales of a store. +:::::{grid} +:padding: 0 + +::::{grid-item} +:class: rubric-slimmer +:columns: auto 6 6 6 + +:::{rubric} About +::: + +Effectively query observations using enhanced features for time series data. + +Run aggregations with gap filling / interpolation, using common +table expressions (CTEs) and LAG / LEAD window functions. + +Find maximum values using the MAX_BY aggregate function, returning +the value from one column based on the maximum or minimum value of another +column within a group. +:::: + +::::{grid-item} +:class: rubric-slimmer +:columns: auto 6 6 6 + +:::{rubric} Data +::: For this tutorial, imagine a dataset that captures weather readings from CrateDB offices across the globe. Each record includes: -- `timestamp`: The exact time of the recording. -- `location`: The location of the weather station. -- `temperature`: The temperature in degrees Celsius. -- `humidity`: The humidity in percentage. -- `wind_speed`: The wind speed in km/h. +:timestamp: The exact time of the recording. +:location: The location of the weather station. +:temperature: The temperature in degrees Celsius. +:humidity: The humidity in percentage. +:wind_speed: The wind speed in km/h. +:::: + +::::: + ## Creating the Table -CrateDB uses SQL, a powerful and familiar language for database management. To +CrateDB uses SQL, the most popular query language for database management. To store the weather data, create a table with columns tailored to the dataset using the `CREATE TABLE` command: From 962f28b531c8313c9533c2229680e17ab422df37 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 12 Mar 2024 22:35:29 +0100 Subject: [PATCH 2/4] Tutorials: Wording update: s/time-series/time series/ --- docs/tutorials/index.md | 16 ++++++++-------- docs/tutorials/time-series-advanced.md | 18 +++++++++--------- docs/tutorials/time-series.md | 2 +- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index def2364..76ad983 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -11,11 +11,11 @@ efficient practices to optimize your CrateDB experience. :margin: 4 4 0 0 :gutter: 1 -:::{grid-item-card} {octicon}`clock` Time-Series +:::{grid-item-card} {octicon}`clock` Time Series :link: time-series :link-type: ref -Dive into the world of time-series data with CrateDB. This tutorial will guide -you through the best ways to store, query, and analyze time-series data. +Dive into the world of time series data with CrateDB. This tutorial will guide +you through the best ways to store, query, and analyze time series data. It is perfect for those working with IoT devices, monitoring systems, or any application where time-oriented data is crucial. @@ -46,13 +46,13 @@ efficiently. A must-read for anyone looking to make sense of large volumes of unstructured text data. ::: -:::{grid-item-card} {octicon}`clock` Advanced Time-Series +:::{grid-item-card} {octicon}`clock` Advanced Time Series :link: time-series-advanced :link-type: ref -This tutorial demonstrates how to augment time-series data with the metadata to enable more comprehensive analysis. +This tutorial demonstrates how to augment time series data with the metadata to enable more comprehensive analysis. The techniques and queries allow for unlocking deeper insights and harnessing the -full potential of time-series data in real-world applications. +full potential of time series data in real-world applications. ::: :::: @@ -64,8 +64,8 @@ most relevant to your use case. We wish you a happy learning experience. :hidden: :maxdepth: 1 -Time-Series +Time Series Objects Full-Text Search -Advanced Time-Series +Advanced Time Series ::: diff --git a/docs/tutorials/time-series-advanced.md b/docs/tutorials/time-series-advanced.md index 42b82b4..9609db2 100644 --- a/docs/tutorials/time-series-advanced.md +++ b/docs/tutorials/time-series-advanced.md @@ -3,8 +3,8 @@ # Analyzing Device Readings with Metadata Integration CrateDB is highly regarded as an optimal database solution for managing -time-series data thanks to its unique blend of features. It is particularly -effective when you need to combine time-series data with metadata, for +time series data thanks to its unique blend of features. It is particularly +effective when you need to combine time series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis. @@ -19,7 +19,7 @@ to be augmented with additional context for more insightful analysis. :::{rubric} About ::: -CrateDB supports effective time-series analysis with enhanced features +CrateDB supports effective time series analysis with enhanced features for fast aggregations. - Rich data types for storing structured nested data (OBJECT) alongside @@ -36,10 +36,10 @@ for fast aggregations. :::{rubric} Data ::: -This tutorial illustrates how to effectively query time-series data with +This tutorial illustrates how to effectively query time series data with metadata, in order to conduct comprehensive data analysis. -It uses a time-series dataset that includes telemetry readings from appliances, +It uses a time series dataset that includes telemetry readings from appliances, such as battery, CPU, and memory information, as well as metadata information like manufacturer, model, and firmware version. :::: @@ -53,7 +53,7 @@ CrateDB uses SQL, the most popular query language for database management. To store the device readings and the device info data, define two tables with columns tailored to the datasets. -To get started, let’s use a time-series dataset that captures various device +To get started, let’s use a time series dataset that captures various device readings, such as battery, CPU, and memory information. Each record includes: :ts: Timestamp when each reading was taken. @@ -124,7 +124,7 @@ WITH (compression='gzip', empty_string_as_null=true) RETURN SUMMARY; ::: -## Time-series Analysis with Metadata +## Time Series Analysis with Metadata To illustrate `JOIN` operation, the first query retrieves the 30 rows of combined data from two tables, `devices.readings` and `devices.info`, based on a matching `device_id` in both. It effectively merges the detailed readings and corresponding device information, providing a comprehensive view of each device's status and metrics. @@ -144,7 +144,7 @@ GROUP BY "day" ORDER BY "day"; ::: -Rolling averages are crucial in time-series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series. +Rolling averages are crucial in time series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series. The following example illustrates the average (`AVG`), minimum (`MIN`), and maximum (`MAX`) battery temperature over a window of the last 100 temperature readings (`ROWS BETWEEN 100 PRECEDING AND CURRENT ROW`). The window is defined in descending order by timestamp (`ts`) and can be adapted to support different use cases. @@ -239,4 +239,4 @@ ORDER BY model_avg_battery_level DESC; ::: -In conclusion, this tutorial has guided you through the process of querying and analyzing time-series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata. These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time-series data in real-world applications. +In conclusion, this tutorial has guided you through the process of querying and analyzing time series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata. These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time series data in real-world applications. diff --git a/docs/tutorials/time-series.md b/docs/tutorials/time-series.md index f9538a9..37f2d79 100644 --- a/docs/tutorials/time-series.md +++ b/docs/tutorials/time-series.md @@ -1,6 +1,6 @@ (time-series)= -# Time-Series: Analyzing Weather Data +# Time Series: Analyzing Weather Data CrateDB is a powerful database designed to handle various use cases, one of which is managing time series data. Time series data refers to collections of From c752b75b386c8b176152cd9dd1ebdb1aeb3a57a4 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 14 Mar 2024 22:08:21 +0100 Subject: [PATCH 3/4] Chore: Ignore linkcheck errors on https://azure.microsoft.com HTTPSConnectionPool(host='azure.microsoft.com', port=443): Read timed out. (read timeout=5) --- docs/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/conf.py b/docs/conf.py index 78e7ed3..57d1e4a 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -12,6 +12,7 @@ "https://eastus2.azure.cratedb.cloud/", "https://portal.azure.com/", "https://azuremarketplace.microsoft.com/", + "https://azure.microsoft.com/", ] linkcheck_timeout = 5 From 18f1cd68eeb41f477eb679d1a6cc9501044d9c06 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 14 Mar 2024 22:09:17 +0100 Subject: [PATCH 4/4] Chore: Ignore linkcheck errors on https://hub.docker.com/ HTTPSConnectionPool(host='hub.docker.com', port=443): Read timed out. (read timeout=5) --- docs/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/conf.py b/docs/conf.py index 57d1e4a..df5d087 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -13,6 +13,7 @@ "https://portal.azure.com/", "https://azuremarketplace.microsoft.com/", "https://azure.microsoft.com/", + "https://hub.docker.com/", ] linkcheck_timeout = 5