-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
17 changed files
with
596 additions
and
1,008 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
(cluster-automation)= | ||
# Automation | ||
|
||
Automation in CrateDB Cloud allows users to streamline and manage routine | ||
database operations efficiently. Two primary automation features available are | ||
the SQL Scheduler and Table Policies, both of which facilitate the maintenance | ||
and optimization of database tasks. | ||
|
||
:::{important} | ||
- Automation is available for all newly deployed clusters. | ||
- For existing clusters, the feature can be enabled on demand. (Contact | ||
[support](https://support.crate.io/) for activation.) | ||
|
||
Automation utilizes a dedicated database user `gc_admin` with full cluster | ||
privileges to execute scheduled tasks and persists data in the `gc` schema. | ||
::: | ||
|
||
## SQL Scheduler | ||
|
||
The SQL Scheduler is designed to automate routine database tasks by scheduling | ||
SQL queries to run at specific times, in UTC time. This feature supports | ||
creating job descriptions with valid [cron patterns](https://www.ibm.com/docs/en/db2oc?topic=task-unix-cron-format) | ||
and SQL statements, enabling a wide range of tasks. Users can manage these jobs | ||
through the Cloud UI, adding, removing, editing, activating, and deactivating | ||
them as needed. | ||
|
||
### Use Cases | ||
|
||
- Regularly updating or aggregating table data. | ||
- Automating export and import of data. | ||
- Deleting old/redundant data to maintain database efficiency. | ||
|
||
### Accessing and Using the SQL Scheduler | ||
|
||
SQL Scheduler can be found in the "Automation" tab in the left-hand | ||
navigation menu. There are two tabs relevant to the SQL Scheduler: | ||
|
||
|
||
**SQL Scheduler** shows a list of your existing jobs. In the list, you can | ||
activate/deactivate each job with a toggle in the "Active" column. You can | ||
also edit and delete jobs with buttons on the right side of the list. | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-overview.png) | ||
|
||
|
||
**Logs** shows a list of *scheduled* job runs, whether they failed or succeeded, | ||
execution time, run time, and the error in case they were unsuccessful. In case | ||
of an error, more details can be viewed showing the executed query and a stack | ||
trace. You can filter the logs by status or by a specific job. | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-logs.png) | ||
|
||
### Examples | ||
|
||
#### Cleanup of Old Files | ||
|
||
Cleanup tasks represent a common use case for these types of automated jobs. | ||
This example deletes records older than 30 days from a specified table once a | ||
day: | ||
|
||
```sql | ||
DELETE FROM "sample_data" | ||
WHERE | ||
"timestamp_column" < NOW() - INTERVAL '30 days'; | ||
``` | ||
|
||
How often you run it, of course, depends on you, but once a day is common for | ||
cleanup. This expression runs every day at 2:30 PM UTC: | ||
|
||
Schedule: `30 14 * * *` | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-cleanup.png) | ||
|
||
#### Copying Logs into a Persistent Table | ||
|
||
Another useful example might be copying data to another table for archival | ||
purposes. This specifically copies from the system logs table into one of | ||
our own tables. | ||
|
||
```sql | ||
CREATE TABLE IF NOT EXISTS "logs"."persistent_jobs_log" ( | ||
"classification" OBJECT (DYNAMIC), | ||
"ended" TIMESTAMP WITH TIME ZONE, | ||
"error" TEXT, | ||
"id" TEXT, | ||
"node" OBJECT (DYNAMIC), | ||
"started" TIMESTAMP WITH TIME ZONE, | ||
"stmt" TEXT, | ||
"username" TEXT, | ||
PRIMARY KEY (id) | ||
) CLUSTERED INTO 1 SHARDS; | ||
|
||
INSERT INTO | ||
"logs"."persistent_jobs_log" | ||
SELECT | ||
* | ||
FROM | ||
sys.jobs_log | ||
ON CONFLICT ("id") DO NOTHING; | ||
``` | ||
|
||
In this example, we schedule the job to run every hour: | ||
|
||
Schedule: `0 * * * *` | ||
|
||
![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-copying.png) | ||
|
||
:::{note} | ||
Limitations and Known Issues: | ||
* Only one job can run at a time; subsequent jobs will be queued until the | ||
current one completes. | ||
* Long-running jobs may block the execution of queued jobs, leading to | ||
potential delays. | ||
::: | ||
|
||
|
||
## Table Policies | ||
|
||
Table policies allow automating maintenance operations for **partitioned tables**. | ||
Automated actions can be set up to be executed daily based on a pre-configured | ||
ruleset. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy.png) | ||
|
||
### Overview | ||
|
||
Table policy overview can be found in the left-hand navigation menu under | ||
"Automation". From the list of policies, you can create, delete, edit, or | ||
(de)activate them. Logs of executed policies can be found in the "Logs" tab. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy-logs.png) | ||
|
||
A new policy can be created with the "Add New Policy" button. | ||
|
||
![Table policy list](../_assets/img/cluster-table-policy-create.png) | ||
|
||
After naming the policy and selecting the tables/schemas to be impacted, you | ||
must specify the time column. This column, which should be a timestamp used for | ||
partitioning, will determine the data affected by the policy. It is important | ||
that this time column is consistently present across all targeted tables/schemas. | ||
While you can apply the policy to tables without the specified time column, | ||
it will not get executed for those. If your tables have different timestamp | ||
columns, consider setting up separate policies for each to ensure accuracy. | ||
|
||
:::{note} | ||
The "Time Column" must be of type `TIMESTAMP`. | ||
::: | ||
|
||
Next, a condition is used to determine affected partitions. The system is | ||
time-based. A partition is eligible for action if the value in the partitioned | ||
column is smaller (`<`), or smaller or equal (`<=`) than the current date minus | ||
`n` days, months, or years. | ||
|
||
### Actions | ||
|
||
Following actions are supported: | ||
* **Delete:** Deletes eligible partitions along with their data. | ||
* **Set replicas:** Changes the replication factor of eligible partitions. | ||
* **Force merge:** Merges segments on eligible partitions to ensure a specified number. | ||
|
||
After filling out the info, you can see the affected schemas/tables and the | ||
number of affected partitions if the policy gets executed at this very moment. | ||
|
||
### Examples | ||
|
||
Consider a scenario where you have a table and want to optimize space on your | ||
cluster. For older data (e.g., 30 days), which may have already been snapshotted | ||
and is only accessed infrequently, meaning it's not used for live analyitcs, it | ||
might be sufficient for it to exist just once in the cluster without replication. | ||
Additionally, you may not want to retain data older than 60 days. | ||
|
||
Assume the following table schema: | ||
|
||
```sql | ||
CREATE TABLE data_table ( | ||
ts TIMESTAMP, | ||
ts_day GENERATED ALWAYS AS date_trunc('day',ts), | ||
val DOUBLE | ||
) PARTITIONED BY (ts_day); | ||
``` | ||
|
||
For the outlined scenario, the policies would be as follows: | ||
|
||
**Policy 1 - Saving replica space:** | ||
* **Time Column:** `ts_day` | ||
* **Condition:** `older than 30 days` | ||
* **Actions:** `Set replicas to 0.` | ||
|
||
**Policy 2 - Data removal:** | ||
* **Time Column:** `ts_day` | ||
* **Condition:** `older than 60 days` | ||
* **Actions:** `Delete eligible partition(s)` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
(cluster-backups)= | ||
# Backups | ||
|
||
You can find the Backups page in the detailed view of your cluster and | ||
you can see and restore all existing backups here. | ||
|
||
By default, a backup is made every hour. The backups are kept for 14 | ||
days. We also keep the last 14 backups indefinitely, no matter the state | ||
of your cluster. | ||
|
||
The Backups tab provides a list of all your backups. By default, a | ||
backup is made every hour. | ||
|
||
![Cloud Console cluster backups page](../_assets/img/cluster-backups.png) | ||
|
||
You can also control the schedule of your backups by clicking the *Edit | ||
backup schedule* button. | ||
|
||
![Cloud Console cluster backups edit page](../_assets/img/cluster-backups-edit.png) | ||
|
||
Here you can create a custom schedule by selecting any number of hour | ||
slots. Backups will be created at selected times. At least one backup a | ||
day is mandatory. | ||
|
||
To restore a particular backup, click the *Restore* button. A popup | ||
window with a SQL statement will appear. Input this statement to your | ||
Admin UI console either by copy-pasting it, or clicking the *Run query | ||
in Admin UI*. The latter will bring you directly to the Admin UI console | ||
with the statement automatically pre-filled. | ||
|
||
![Cloud Console cluster backups restore page](../_assets/img/cluster-backups-restore.png) | ||
|
||
You have a choice between restoring the cluster fully, or only specific | ||
tables. | ||
|
||
(cluster-cloning)= | ||
## Cluster Cloning | ||
|
||
Cluster cloning is a process of duplicating all the data from a specific | ||
snapshot into a different cluster. Creating the new cluster isn't part | ||
of the cloning process, you need to create the target cluster yourself. | ||
You can clone a cluster from the Backups page. | ||
|
||
![Cloud Console cluster backup snapshots](../_assets/img/cluster-backups.png) | ||
|
||
Choose a snapshot and click the *Clone* button. As with restoring a | ||
backup, you can choose between cloning the whole cluster, or only | ||
specific tables. | ||
|
||
![Cloud Console cluster clone popup](../_assets/img/cluster-clone-popup.png) | ||
|
||
:::{note} | ||
Keep in mind that the full cluster clone will include users, views, | ||
privileges and everything else. Cloning also doesn't distinguish | ||
between cluster plans, meaning you can clone from CR2 to CR1 or any | ||
other variation. | ||
::: | ||
|
||
(cluster-cloning-fail)= | ||
## Failed cloning | ||
|
||
There are circumstances under which cloning can fail or behave | ||
unexpectedly. These are: | ||
|
||
- If you already have tables with the same names in the target cluster | ||
as in the source snapshot, the entire clone operation will fail. | ||
- There isn't enough storage left on the target cluster to | ||
accommodate the tables you're trying to clone. In this case, you | ||
might get an incomplete cloning as the cluster will run out of | ||
storage. | ||
- You're trying to clone an invalid or no longer existing snapshot. | ||
This can happen if you're cloning through | ||
[Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case, | ||
the cloning will fail. | ||
- You're trying to restore a table that is not included in the | ||
snapshot. This can happen if you're restoring snapshots through | ||
[Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case, | ||
the cloning will fail. | ||
|
||
When cloning fails, it is indicated by a banner in the cluster overview | ||
screen. | ||
|
||
![Cloud Console cluster failed cloning](../_assets/img/cluster-clone-failed.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
(cluster-console)= | ||
# Console | ||
|
||
The Console in CrateDB Cloud allows users to execute SQL queries seamlessly | ||
against their CrateDB cluster. The Console can be accessed by users having the | ||
"Organization Admin" role in the left-hand navigation menu within a cluster. | ||
|
||
- **Table and Schema Tree View:** Easily navigate through your database | ||
structure. | ||
- **Client-Side Query Validation:** Ensure your SQL queries are correct before | ||
execution. | ||
- **Multiple Query Execution:** Run several queries in sequence. | ||
- **Query History:** Access and manage your past queries. | ||
|
||
:::{important} | ||
- The Console is available for all newly deployed clusters. | ||
- For older clusters, this feature can be enabled on demand. Contact | ||
[support](https://support.crate.io/) for activation. | ||
|
||
The Console currently utilizes a dedicated database user `gc_admin` with full | ||
cluster privileges. | ||
::: | ||
|
||
:::{note} | ||
**Multi-Query Execution:** | ||
When running multiple queries at once, the Console executes them sequentially, | ||
not within a single session or transaction. If one query fails, the subsequent | ||
queries will not be executed. Currently, session settings are not persisted | ||
between queries. | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
(cluster-export)= | ||
# Export | ||
|
||
The "Export" section allows users to download specific tables/views. When you | ||
first visit the Export tab, you can specify the name of a table/view, | ||
format (CSV, JSON, or Parquet) and whether you'd like your data to be | ||
gzip compressed (recommended for CSV and JSON files). | ||
|
||
:::{important} | ||
- Size limit for exporting is 1 GiB | ||
- Exports are held for 3 days, then automatically deleted | ||
::: | ||
|
||
:::{note} | ||
**Limitations with Parquet**: | ||
Parquet is a highly compressed data format for very efficient storage of | ||
tabular data. Please note that for OBJECT and ARRAY columns in CrateDB, | ||
the exported data will be JSON encoded when saving to Parquet | ||
(effectively saving them as strings). This is due to the complexity of | ||
encoding structs and lists in the Parquet format, where determining the | ||
exact schema might not be possible. When re-importing such a Parquet | ||
file, make sure you pre-create the table with the correct schema. | ||
::: | ||
|
||
|
||
|
||
|
Oops, something went wrong.