Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorials: Improve guidance within article header sections #64

Merged
merged 4 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
"https://eastus2.azure.cratedb.cloud/",
"https://portal.azure.com/",
"https://azuremarketplace.microsoft.com/",
"https://azure.microsoft.com/",
"https://hub.docker.com/",
]

linkcheck_timeout = 5
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Learn how to sign up and get started with a free cluster.
:link: tutorials
:link-type: ref

Learn how to use some of the key features of CrateDB.
Learn how to use key features of CrateDB.
:::


Expand Down
38 changes: 21 additions & 17 deletions docs/tutorials/full-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,37 @@

CrateDB is an exceptional choice for handling complex queries and large-scale
data sets. One of its standout features is its full-text search capabilities,
built on top of the powerful Lucene library. This makes it a great fit for
using the BM25 ranking algorithm for information retrieval, built on top of
the powerful Lucene indexing library. This makes CrateDB an excellent fit for
organizing, searching, and analyzing extensive datasets.

In this tutorial, we will explore how to manage a dataset of Netflix titles,
making use of CrateDB Cloud's full-text search capabilities.
Each entry in our imaginary dataset will have the following attributes:

- `show_id`: A unique identifier for each show or movie.
- `type`: Specifies whether the title is a movie, TV show, or another format.
- `title`: The title of the movie or show.
- `director`: The name of the director.
- `cast`: An array listing the cast members.
- `country`: The country where the title was produced.
- `date_added`: A timestamp indicating when the title was added to the catalog.
- `release_year`: The year the title was released.
- `rating`: The content rating (e.g., PG, R, etc.).
- `duration`: The duration of the title in minutes or seasons.
- `listed_in`: An array containing genres that the title falls under.
- `description`: A textual description of the title, indexed using full-text search.

To begin, let's create the schema for this dataset:
:show_id: A unique identifier for each show or movie.
:type: Specifies whether the title is a movie, TV show, or another format.
:title: The title of the movie or show.
:director: The name of the director.
:cast: An array listing the cast members.
:country: The country where the title was produced.
:date_added: A timestamp indicating when the title was added to the catalog.
:release_year: The year the title was released.
:rating: The content rating (e.g., PG, R, etc.).
:duration: The duration of the title in minutes or seasons.
:listed_in: An array containing genres that the title falls under.
:description: A textual description of the title, indexed using full-text search.

To begin, let's create the schema for this dataset.


## Creating the Table

CrateDB uses SQL, a powerful and familiar language for database management. To
CrateDB uses SQL, the most popular query language for database management. To
store the data, create a table with columns tailored to the
dataset using the `CREATE TABLE` command. Importantly, you will also take advantage
dataset using the `CREATE TABLE` command.

Importantly, you will also take advantage
of CrateDB's full-text search capabilities by setting up a full-text index on
the description column. This will enable you to perform complex textual queries
later on.
Expand Down
16 changes: 8 additions & 8 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ efficient practices to optimize your CrateDB experience.
:margin: 4 4 0 0
:gutter: 1

:::{grid-item-card} {octicon}`clock` Time-Series
:::{grid-item-card} {octicon}`clock` Time Series
:link: time-series
:link-type: ref
Dive into the world of time-series data with CrateDB. This tutorial will guide
you through the best ways to store, query, and analyze time-series data.
Dive into the world of time series data with CrateDB. This tutorial will guide
you through the best ways to store, query, and analyze time series data.

It is perfect for those working with IoT devices, monitoring systems, or any
application where time-oriented data is crucial.
Expand Down Expand Up @@ -46,13 +46,13 @@ efficiently. A must-read for anyone looking to make sense of large volumes of
unstructured text data.
:::

:::{grid-item-card} {octicon}`clock` Advanced Time-Series
:::{grid-item-card} {octicon}`clock` Advanced Time Series
:link: time-series-advanced
:link-type: ref
This tutorial demonstrates how to augment time-series data with the metadata to enable more comprehensive analysis.
This tutorial demonstrates how to augment time series data with the metadata to enable more comprehensive analysis.

The techniques and queries allow for unlocking deeper insights and harnessing the
full potential of time-series data in real-world applications.
full potential of time series data in real-world applications.
:::

::::
Expand All @@ -64,8 +64,8 @@ most relevant to your use case. We wish you a happy learning experience.
:hidden:
:maxdepth: 1

Time-Series <time-series>
Time Series <time-series>
Objects<object>
Full-Text Search<full-text>
Advanced Time-Series <time-series-advanced>
Advanced Time Series <time-series-advanced>
:::
18 changes: 9 additions & 9 deletions docs/tutorials/object.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nested data efficiently. In this tutorial, we'll explore how to leverage this
feature in marketing data analysis, along with the use of generated columns to
parse and manage URLs.

Consider marketing data that captures details of various campaigns:
Consider marketing data that captures details of various campaigns.

:::{code} json
{
Expand All @@ -23,11 +23,11 @@ Consider marketing data that captures details of various campaigns:
}
:::

To begin, let's create the schema for this dataset:
To begin, let's create the schema for this dataset.

## Creating the Table

CrateDB uses SQL, a powerful and familiar language for database management. To
CrateDB uses SQL, the most popular query language for database management. To
store the marketing data, create a table with columns tailored to the
dataset using the `CREATE TABLE` command:

Expand All @@ -45,14 +45,14 @@ CREATE TABLE marketing_data (
);
:::

In this table definition:
Let's highlight two features in this table definition:

- The `metrics` column is set up as an `OBJECT` featuring a dynamic structure.
This enables you to perform flexible queries on its nested attributes like
:metrics: An `OBJECT` column featuring a dynamic structure for
performing flexible queries on its nested attributes like
clicks, impressions, and conversion rate.
- Additionally, a generated column named `url_parts` is configured to
automatically parse the `landing_page_url`. This makes it more convenient for
you to query specific components of the URL later on.
:url_parts: A generated column to
decode an URL from the `landing_page_url` column. This is convenient
to query for specific components of the URL later on.

The table is designed to accommodate both fixed and dynamic attributes,
providing a robust and flexible structure for storing your marketing data.
Expand Down
85 changes: 67 additions & 18 deletions docs/tutorials/time-series-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,76 @@

# Analyzing Device Readings with Metadata Integration

CrateDB is highly regarded as an optimal database solution for managing time-series data thanks to its unique blend of features. It is particularly effective when you need to combine time-series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis. CrateDB supports effective time-series analysis with fast aggregations, a rich set of built-in functions, and `JOIN` operations.
CrateDB is highly regarded as an optimal database solution for managing
time series data thanks to its unique blend of features. It is particularly
effective when you need to combine time series data with metadata, for
instance, in scenarios where data like sensor readings or log entries, need
to be augmented with additional context for more insightful analysis.

In this tutorial, we will illustrate how to augment time-series data with the metadata to enable more comprehensive analysis. To get started let’s use a time-series dataset that captures various device readings, such as battery, CPU, and memory information. Each record includes:

- `ts` - timestamp when each reading was taken.
- `device_id` - identifier of the device.
- `battery` - object containing battery level, status, and temperature.
- `cpu` - object containing average CPU loads over the last 1, 5, and 15 minutes.
- `memory` - object containing information about the device's free and used memory.
:::::{grid}
:padding: 0

The second dataset in this tutorial contains metadata information about various devices. Each record includes:
::::{grid-item}
:class: rubric-slimmer
:columns: auto 6 6 6

- `device_id` - identifier of the device.
- `api_version` - version of the API that the device supports.
- `manufacturer` - name of the manufacturer of the device.
- `model` - model name of the device.
- `os_name` - the name of the operating system running on the device.
:::{rubric} About
:::

CrateDB supports effective time series analysis with enhanced features
for fast aggregations.

- Rich data types for storing structured nested data (OBJECT) alongside
time series data.
- A rich set of built-in functions for aggregations.
- Relational JOIN operations.
- Common table expressions (CTEs).

::::

::::{grid-item}
:class: rubric-slimmer
:columns: auto 6 6 6

:::{rubric} Data
:::
This tutorial illustrates how to effectively query time series data with
metadata, in order to conduct comprehensive data analysis.

It uses a time series dataset that includes telemetry readings from appliances,
such as battery, CPU, and memory information, as well as metadata information
like manufacturer, model, and firmware version.
::::

:::::


## Creating the Tables

CrateDB uses SQL, the most popular query language for database management. To
store the device readings and the device info data, define two tables with
columns tailored to the datasets.

To get started, let’s use a time series dataset that captures various device
readings, such as battery, CPU, and memory information. Each record includes:

:ts: Timestamp when each reading was taken.
:device_id: Identifier of the device.
:battery: Object containing battery level, status, and temperature.
:cpu: Object containing average CPU loads over the last 1, 5, and 15 minutes.
:memory: Object containing information about the device's free and used memory.
Comment on lines +59 to +63
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny how battery is colored red?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, is it some keyword by chance?


The second dataset in this tutorial contains metadata information about various
devices. Each record includes:

## Creating the Table
:device_id: Identifier of the device.
:api_version: Version of the API that the device supports.
:manufacturer: Name of the manufacturer of the device.
:model: Model name of the device.
:os_name: Name of the operating system running on the device.

CrateDB uses SQL, a powerful and familiar language for database management. To store the device readings and the device info data, create two tables with columns tailored to the datasets using the `CREATE TABLE` command:
Create the tables using the `CREATE TABLE` command:

:::{code} sql
CREATE TABLE IF NOT EXISTS doc.devices_readings (
Expand Down Expand Up @@ -75,7 +124,7 @@ WITH (compression='gzip', empty_string_as_null=true)
RETURN SUMMARY;
:::

## Time-series Analysis with Metadata
## Time Series Analysis with Metadata

To illustrate `JOIN` operation, the first query retrieves the 30 rows of combined data from two tables, `devices.readings` and `devices.info`, based on a matching `device_id` in both. It effectively merges the detailed readings and corresponding device information, providing a comprehensive view of each device's status and metrics.

Expand All @@ -95,7 +144,7 @@ GROUP BY "day"
ORDER BY "day";
:::

Rolling averages are crucial in time-series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series.
Rolling averages are crucial in time series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series.

The following example illustrates the average (`AVG`), minimum (`MIN`), and maximum (`MAX`) battery temperature over a window of the last 100 temperature readings (`ROWS BETWEEN 100 PRECEDING AND CURRENT ROW`). The window is defined in descending order by timestamp (`ts`) and can be adapted to support different use cases.

Expand Down Expand Up @@ -190,4 +239,4 @@ ORDER BY
model_avg_battery_level DESC;
:::

In conclusion, this tutorial has guided you through the process of querying and analyzing time-series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata. These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time-series data in real-world applications.
In conclusion, this tutorial has guided you through the process of querying and analyzing time series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata. These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time series data in real-world applications.
44 changes: 37 additions & 7 deletions docs/tutorials/time-series.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,54 @@
(time-series)=

# Time-Series: Analyzing Weather Data
# Time Series: Analyzing Weather Data

CrateDB is a powerful database designed to handle various use cases, one of
which is managing time series data. Time series data refers to collections of
data points recorded at specific intervals over time, like the hourly
temperature of a city or the daily sales of a store.

:::::{grid}
:padding: 0

::::{grid-item}
:class: rubric-slimmer
:columns: auto 6 6 6

:::{rubric} About
:::

Effectively query observations using enhanced features for time series data.

Run aggregations with gap filling / interpolation, using common
table expressions (CTEs) and LAG / LEAD window functions.

Find maximum values using the MAX_BY aggregate function, returning
the value from one column based on the maximum or minimum value of another
column within a group.
::::

::::{grid-item}
:class: rubric-slimmer
:columns: auto 6 6 6

:::{rubric} Data
:::
For this tutorial, imagine a dataset that captures weather
readings from CrateDB offices across the globe. Each record includes:

- `timestamp`: The exact time of the recording.
- `location`: The location of the weather station.
- `temperature`: The temperature in degrees Celsius.
- `humidity`: The humidity in percentage.
- `wind_speed`: The wind speed in km/h.
:timestamp: The exact time of the recording.
:location: The location of the weather station.
:temperature: The temperature in degrees Celsius.
:humidity: The humidity in percentage.
:wind_speed: The wind speed in km/h.
::::

:::::


## Creating the Table

CrateDB uses SQL, a powerful and familiar language for database management. To
CrateDB uses SQL, the most popular query language for database management. To
store the weather data, create a table with columns tailored to the
dataset using the `CREATE TABLE` command:

Expand Down
Loading