Skip to content

Commit

Permalink
Tutorials: Improve layout on Time Series tutorials
Browse files Browse the repository at this point in the history
For adding dividers better conveying "What's Inside" each section, i.e.
which feature is demonstrated, use Sphinx' "rubric" directive [1].

[1] https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-rubric
  • Loading branch information
amotl committed Mar 12, 2024
1 parent 9ab46f9 commit 622f013
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 14 deletions.
68 changes: 59 additions & 9 deletions docs/tutorials/time-series-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,10 @@ RETURN SUMMARY;

## Time Series Analysis with Metadata

To illustrate `JOIN` operation, the first query retrieves the 30 rows of combined data from two tables, `devices.readings` and `devices.info`, based on a matching `device_id` in both. It effectively merges the detailed readings and corresponding device information, providing a comprehensive view of each device's status and metrics.

:::{rubric} JOIN Operations
:::
To illustrate `JOIN` operations, the first query retrieves the 30 rows of combined data from two tables, `devices.readings` and `devices.info`, based on a matching `device_id` in both. It effectively merges the detailed readings and corresponding device information, providing a comprehensive view of each device's status and metrics.

:::{code} sql
SELECT *
Expand All @@ -135,6 +138,9 @@ JOIN devices.info i ON r.device_id = i.device_id
LIMIT 30;
:::


:::{rubric} Aggregate Values
:::
The next query illustrates the calculation of summaries for aggregate values. In particular, it finds average battery levels (`avg_battery_level`) for each day and shows the result in an ascending order.

:::{code} sql
Expand All @@ -144,6 +150,9 @@ GROUP BY "day"
ORDER BY "day";
:::


:::{rubric} Rolling Averages and Window Functions
:::
Rolling averages are crucial in time series analysis because they help smooth out short-term fluctuations and reveal underlying trends by averaging data points over a specified period. This approach is particularly effective in mitigating the impact of outliers and noise in the data, allowing for a clearer understanding of the true patterns in the time series.

The following example illustrates the average (`AVG`), minimum (`MIN`), and maximum (`MAX`) battery temperature over a window of the last 100 temperature readings (`ROWS BETWEEN 100 PRECEDING AND CURRENT ROW`). The window is defined in descending order by timestamp (`ts`) and can be adapted to support different use cases.
Expand All @@ -158,7 +167,19 @@ JOIN doc.devices_info i ON r.device_id = i.device_id
WINDOW w AS (ORDER BY "ts" DESC ROWS BETWEEN 100 PRECEDING AND CURRENT ROW);
:::

The next query shows how to extract the most recent reading for each device of the _mustang_ model. The query selects the latest timestamp (`MAX(r.ts)`), which represents the most recent reading time, and the corresponding latest readings for battery, CPU, and memory (`MAX_BY` for each respective component, using the timestamp as the determining factor). These results are grouped by `device_id`, `manufacturer`, and `model` to ensure that the latest readings for each unique device are included. This query is particularly useful for monitoring the most current status of specific devices in a fleet.

:::{rubric} Most Recent Observation
:::
The next query shows how to extract the most recent reading for each device of
the _mustang_ model. The query selects the latest timestamp (`MAX(r.ts)`),
which represents the most recent reading time, and the corresponding latest
readings for battery, CPU, and memory. It uses `MAX_BY` for each respective
component, using the timestamp as the determining factor.

These results are grouped by `device_id`, `manufacturer`, and `model` to ensure
that the latest readings for each unique device are included. This query is
particularly useful for monitoring the most current status of specific devices
in a fleet.

:::{code} sql
SELECT
Expand All @@ -179,15 +200,34 @@ GROUP BY
r.device_id, i.manufacturer, i.model;
:::

Finally, we demonstrate the complex query that illustrates the usage of Common Table Expressions (CTEs) to aggregate and analyze device readings and information. The query relies on three CTEs to temporarily capture data:

- **MaxTimestamp CTE**: This CTE finds the most recent timestamp (`MAX(ts)`) in the `doc.devices_readings` table. It's used to focus the analysis on recent data.
- **DeviceReadingsAgg CTE**: This CTE calculates the average battery level and temperature for each device, but only for readings taken within the last week (as defined by `r.ts >= m.max_ts - INTERVAL '1 week'`).
- **DeviceModelInfo CTE**: This CTE selects details from the `doc.devices_info` table, specifically the `device_id`, `manufacturer`, `model`, and `api_version`, but only for devices with an API version between 21 and 25.
:::{rubric} Common Table Expressions (CTEs)
:::
Finally, we illustrate the use of Common Table Expressions (CTEs) on behalf of
a complex query to aggregate and analyze device readings and metadata information.
The query relies on three CTEs to temporarily capture data.

:max_timestamp:
Find the most recent timestamp (`MAX(ts)`) in the
`doc.devices_readings` table. This CTE is used to focus the analysis
on recent data.

:device_readings_agg:
Calculate the average battery level and temperature for each
device, but only for readings taken within the last week, as defined by
`r.ts >= m.max_ts - INTERVAL '1 week'`.

:device_model_info:
Select details from the `doc.devices_info` table, specifically
the `device_id`, `manufacturer`, `model`, and `api_version`, but only for
devices with an API version between 21 and 25.

The main `SELECT` statement joins the `DeviceReadingsAgg` and `DeviceModelInfo` CTEs, and aggregates data to provide the average battery level and temperature for each combination of manufacturer, model, and API version. It also proivdes the number of readings (`COUNT(*)`) for each grouping.
The main `SELECT` statement joins the `device_readings_agg` and `device_model_info`
CTEs, and aggregates data to provide the average battery level and temperature
for each combination of manufacturer, model, and API version.
It also provides the number of readings (`COUNT(*)`) for each grouping.

Overall, the query aims to provide a detailed analysis of the battery performance (both level and temperature) for devices with specific API versions, while focusing only on recent data. It allows for a better understanding of how different models and manufacturers are performing in terms of battery efficiency within a specified API range and time frame.
The query aims to provide a detailed analysis of the battery performance (both level and temperature) for devices with specific API versions, while focusing only on recent data. It allows for a better understanding of how different models and manufacturers are performing in terms of battery efficiency within a specified API range and time frame.

:::{code} sql
WITH
Expand Down Expand Up @@ -239,4 +279,14 @@ ORDER BY
model_avg_battery_level DESC;
:::

In conclusion, this tutorial has guided you through the process of querying and analyzing time series data with CrateDB, demonstrating how to effectively merge device metrics with relevant metadata. These techniques and queries are important for unlocking deeper insights into device performance, equipping you with the skills needed to harness the full potential of time series data in real-world applications.

:::{rubric} Conclusion
:::

This tutorial has guided you through the process of querying and
analyzing time series data with CrateDB, demonstrating how to effectively merge
device metrics with relevant metadata.

These techniques and queries are important for unlocking deeper insights into
device performance, equipping you with the skills needed to harness the full
potential of time series data in real-world applications.
22 changes: 17 additions & 5 deletions docs/tutorials/time-series.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ FROM weather_data
GROUP BY location;
:::

:::{rubric} MAX_BY Aggregate Functions
:::
Computing basic averages is nothing special, but what if you need to answer more detailed
questions? For example, if you want to know the highest temperature for each
place and when it occurred.
Expand All @@ -117,13 +119,16 @@ FROM weather_data
GROUP BY location;
:::

:::{rubric} Gap Filling
:::
You have probably observed by now, that there are gaps in the dataset for certain
metrics. Such occurrences are common, perhaps due to a sensor malfunction or
disconnection. To address this, the missing values need to be filled in. You can
employ another useful tool: window functions paired with the `IGNORE NULLS`
feature. Within a Common Table Expression (CTE), we utilize window functions to
disconnection. To address this, the missing values need to be filled in.

Window functions paired with the `IGNORE NULLS` feature will solve your needs.
Within a Common Table Expression (CTE), we utilize window functions to
spot the next and prior non-null temperature recordings, and then compute the
arithmetic mean to bridge the gap:
arithmetic mean to fill the gap.

:::{code} sql
WITH OrderedData AS (
Expand All @@ -143,4 +148,11 @@ FROM OrderedData
ORDER BY location, timestamp;
:::

The `WINDOW` clause defines a window that partitions the data by location and orders it by timestamp. This ensures that the `LAG` and `LEAD` window functions operate within each location group chronologically. If the temperature value is defined as `NULL`, the query returns the interpolated value calculated as the average of the previous and next available temperature readings. Otherwise, it uses the original value.
The `WINDOW` clause defines a window that partitions the data by location and
orders it by timestamp.

This ensures that the `LAG` and `LEAD` window functions operate within each
location group chronologically. If the temperature value is defined as `NULL`,
the query returns the interpolated value calculated as the average of the
previous and next available temperature readings. Otherwise, it uses the
original value.

0 comments on commit 622f013

Please sign in to comment.