Refresh cloud documentation (#81)

crate · Aug 9, 2024 · 775e408 · 775e408
1 parent 627ee75
commit 775e408
Show file tree

Hide file tree

Showing 17 changed files with 596 additions and 1,008 deletions.
diff --git a/docs/_assets/img/cluster-export-tab-history.png b/docs/_assets/img/cluster-export-tab-history.png
diff --git a/docs/_assets/img/cluster-export.png b/docs/_assets/img/cluster-export.png
diff --git a/docs/cluster/automation.md b/docs/cluster/automation.md
@@ -0,0 +1,192 @@
+(cluster-automation)=
+# Automation
+
+Automation in CrateDB Cloud allows users to streamline and manage routine
+database operations efficiently. Two primary automation features available are
+the SQL Scheduler and Table Policies, both of which facilitate the maintenance
+and optimization of database tasks.
+
+:::{important}
+- Automation is available for all newly deployed clusters.
+- For existing clusters, the feature can be enabled on demand. (Contact
+  [support](https://support.crate.io/) for activation.)
+
+Automation utilizes a dedicated database user `gc_admin` with full cluster
+privileges to execute scheduled tasks and persists data in the `gc` schema.
+:::
+
+## SQL Scheduler
+
+The SQL Scheduler is designed to automate routine database tasks by scheduling
+SQL queries to run at specific times, in UTC time. This feature supports
+creating job descriptions with valid [cron patterns](https://www.ibm.com/docs/en/db2oc?topic=task-unix-cron-format)
+and SQL statements, enabling a wide range of tasks. Users can manage these jobs
+through the Cloud UI, adding, removing, editing, activating, and deactivating
+them as needed.
+
+### Use Cases
+
+- Regularly updating or aggregating table data.
+- Automating export and import of data.
+- Deleting old/redundant data to maintain database efficiency.
+
+### Accessing and Using the SQL Scheduler
+
+SQL Scheduler can be found in the "Automation" tab in the left-hand
+navigation menu. There are two tabs relevant to the SQL Scheduler:
+
+
+**SQL Scheduler** shows a list of your existing jobs. In the list, you can
+activate/deactivate each job with a toggle in the "Active" column. You can
+also edit and delete jobs with buttons on the right side of the list.
+
+![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-overview.png)
+
+
+**Logs** shows a list of *scheduled* job runs, whether they failed or succeeded,
+execution time, run time, and the error in case they were unsuccessful. In case
+of an error, more details can be viewed showing the executed query and a stack
+trace. You can filter the logs by status or by a specific job.
+
+![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-logs.png)
+
+### Examples
+
+#### Cleanup of Old Files
+
+Cleanup tasks represent a common use case for these types of automated jobs.
+This example deletes records older than 30 days from a specified table once a
+day:
+
+```sql
+DELETE FROM "sample_data"
+WHERE
+  "timestamp_column" < NOW() - INTERVAL '30 days';
+```
+
+How often you run it, of course, depends on you, but once a day is common for
+cleanup. This expression runs every day at 2:30 PM UTC:
+
+Schedule: `30 14 * * *`
+
+![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-cleanup.png)
+
+#### Copying Logs into a Persistent Table
+
+Another useful example might be copying data to another table for archival
+purposes. This specifically copies from the system logs table into one of
+our own tables.
+
+```sql
+CREATE TABLE IF NOT EXISTS "logs"."persistent_jobs_log" (
+  "classification" OBJECT (DYNAMIC),
+  "ended" TIMESTAMP WITH TIME ZONE,
+  "error" TEXT,
+  "id" TEXT,
+  "node" OBJECT (DYNAMIC),
+  "started" TIMESTAMP WITH TIME ZONE,
+  "stmt" TEXT,
+  "username" TEXT,
+  PRIMARY KEY (id)
+) CLUSTERED INTO 1 SHARDS;
+
+INSERT INTO
+  "logs"."persistent_jobs_log"
+SELECT
+  *
+FROM
+  sys.jobs_log
+ON CONFLICT ("id") DO NOTHING;
+```
+
+In this example, we schedule the job to run every hour:
+
+Schedule: `0 * * * *`
+
+![SQL Scheduler overview](../_assets/img/cluster-sql-scheduler-example-copying.png)
+
+:::{note}
+Limitations and Known Issues:
+* Only one job can run at a time; subsequent jobs will be queued until the
+  current one completes.
+* Long-running jobs may block the execution of queued jobs, leading to
+  potential delays.
+:::
+
+
+## Table Policies
+
+Table policies allow automating maintenance operations for **partitioned tables**.
+Automated actions can be set up to be executed daily based on a pre-configured
+ruleset.
+
+![Table policy list](../_assets/img/cluster-table-policy.png)
+
+### Overview
+
+Table policy overview can be found in the left-hand navigation menu under
+"Automation". From the list of policies, you can create, delete, edit, or
+(de)activate them. Logs of executed policies can be found in the "Logs" tab.
+
+![Table policy list](../_assets/img/cluster-table-policy-logs.png)
+
+A new policy can be created with the "Add New Policy" button.
+
+![Table policy list](../_assets/img/cluster-table-policy-create.png)
+
+After naming the policy and selecting the tables/schemas to be impacted, you
+must specify the time column. This column, which should be a timestamp used for
+partitioning, will determine the data affected by the policy. It is important
+that this time column is consistently present across all targeted tables/schemas.
+While you can apply the policy to tables without the specified time column,
+it will not get executed for those. If your tables have different timestamp
+columns, consider setting up separate policies for each to ensure accuracy.
+
+:::{note}
+The "Time Column" must be of type `TIMESTAMP`.
+:::
+
+Next, a condition is used to determine affected partitions. The system is
+time-based. A partition is eligible for action if the value in the partitioned
+column is smaller (`<`), or smaller or equal (`<=`) than the current date minus
+`n` days, months, or years.
+
+### Actions
+
+Following actions are supported:
+* **Delete:** Deletes eligible partitions along with their data.
+* **Set replicas:** Changes the replication factor of eligible partitions.
+* **Force merge:** Merges segments on eligible partitions to ensure a specified number.
+
+After filling out the info, you can see the affected schemas/tables and the
+number of affected partitions if the policy gets executed at this very moment.
+
+### Examples
+
+Consider a scenario where you have a table and want to optimize space on your
+cluster. For older data (e.g., 30 days), which may have already been snapshotted
+and is only accessed infrequently, meaning it's not used for live analyitcs, it
+might be sufficient for it to exist just once in the cluster without replication.
+Additionally, you may not want to retain data older than 60 days.
+
+Assume the following table schema:
+
+```sql
+CREATE TABLE data_table (
+   ts TIMESTAMP,
+   ts_day GENERATED ALWAYS AS date_trunc('day',ts),
+   val DOUBLE
+) PARTITIONED BY (ts_day);
+```
+
+For the outlined scenario, the policies would be as follows:
+
+**Policy 1 - Saving replica space:**
+* **Time Column:** `ts_day`
+* **Condition:** `older than 30 days`
+* **Actions:** `Set replicas to 0.`
+
+**Policy 2 - Data removal:**
+* **Time Column:** `ts_day`
+* **Condition:** `older than 60 days`
+* **Actions:** `Delete eligible partition(s)`
diff --git a/docs/cluster/backups.md b/docs/cluster/backups.md
@@ -0,0 +1,83 @@
+(cluster-backups)=
+# Backups 
+
+You can find the Backups page in the detailed view of your cluster and
+you can see and restore all existing backups here.
+
+By default, a backup is made every hour. The backups are kept for 14
+days. We also keep the last 14 backups indefinitely, no matter the state
+of your cluster.
+
+The Backups tab provides a list of all your backups. By default, a
+backup is made every hour.
+
+![Cloud Console cluster backups page](../_assets/img/cluster-backups.png)
+
+You can also control the schedule of your backups by clicking the *Edit
+backup schedule* button.
+
+![Cloud Console cluster backups edit page](../_assets/img/cluster-backups-edit.png)
+
+Here you can create a custom schedule by selecting any number of hour
+slots. Backups will be created at selected times. At least one backup a
+day is mandatory.
+
+To restore a particular backup, click the *Restore* button. A popup
+window with a SQL statement will appear. Input this statement to your
+Admin UI console either by copy-pasting it, or clicking the *Run query
+in Admin UI*. The latter will bring you directly to the Admin UI console
+with the statement automatically pre-filled.
+
+![Cloud Console cluster backups restore page](../_assets/img/cluster-backups-restore.png)
+
+You have a choice between restoring the cluster fully, or only specific
+tables.
+
+(cluster-cloning)=
+## Cluster Cloning 
+
+Cluster cloning is a process of duplicating all the data from a specific
+snapshot into a different cluster. Creating the new cluster isn't part
+of the cloning process, you need to create the target cluster yourself.
+You can clone a cluster from the Backups page.
+
+![Cloud Console cluster backup snapshots](../_assets/img/cluster-backups.png)
+
+Choose a snapshot and click the *Clone* button. As with restoring a
+backup, you can choose between cloning the whole cluster, or only
+specific tables.
+
+![Cloud Console cluster clone popup](../_assets/img/cluster-clone-popup.png)
+
+:::{note}
+Keep in mind that the full cluster clone will include users, views,
+privileges and everything else. Cloning also doesn't distinguish
+between cluster plans, meaning you can clone from CR2 to CR1 or any
+other variation.
+:::
+
+(cluster-cloning-fail)=
+## Failed cloning 
+
+There are circumstances under which cloning can fail or behave
+unexpectedly. These are:
+
+-   If you already have tables with the same names in the target cluster
+    as in the source snapshot, the entire clone operation will fail.
+-   There isn't enough storage left on the target cluster to
+    accommodate the tables you're trying to clone. In this case, you
+    might get an incomplete cloning as the cluster will run out of
+    storage.
+-   You're trying to clone an invalid or no longer existing snapshot.
+    This can happen if you're cloning through
+    [Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case,
+    the cloning will fail.
+-   You're trying to restore a table that is not included in the
+    snapshot. This can happen if you're restoring snapshots through
+    [Croud](https://cratedb.com/docs/cloud/cli/en/latest/). In this case,
+    the cloning will fail.
+
+When cloning fails, it is indicated by a banner in the cluster overview
+screen.
+
+![Cloud Console cluster failed cloning](../_assets/img/cluster-clone-failed.png)
diff --git a/docs/cluster/console.md b/docs/cluster/console.md
@@ -0,0 +1,30 @@
+(cluster-console)=
+# Console
+
+The Console in CrateDB Cloud allows users to execute SQL queries seamlessly
+against their CrateDB cluster. The Console can be accessed by users having the
+"Organization Admin" role in the left-hand navigation menu within a cluster.
+
+- **Table and Schema Tree View:** Easily navigate through your database 
+  structure.
+- **Client-Side Query Validation:** Ensure your SQL queries are correct before 
+  execution.
+- **Multiple Query Execution:** Run several queries in sequence.
+- **Query History:** Access and manage your past queries.
+
+:::{important}
+- The Console is available for all newly deployed clusters.
+- For older clusters, this feature can be enabled on demand. Contact 
+  [support](https://support.crate.io/) for activation.
+
+The Console currently utilizes a dedicated database user `gc_admin` with full 
+cluster privileges.
+:::
+
+:::{note}
+**Multi-Query Execution:**
+When running multiple queries at once, the Console executes them sequentially, 
+not within a single session or transaction. If one query fails, the subsequent 
+queries will not be executed. Currently, session settings are not persisted 
+between queries.
+:::
diff --git a/docs/cluster/export.md b/docs/cluster/export.md
@@ -0,0 +1,27 @@
+(cluster-export)=
+# Export 
+
+The "Export" section allows users to download specific tables/views. When you
+first visit the Export tab, you can specify the name of a table/view,
+format (CSV, JSON, or Parquet) and whether you'd like your data to be
+gzip compressed (recommended for CSV and JSON files).
+
+:::{important}
+-   Size limit for exporting is 1 GiB
+-   Exports are held for 3 days, then automatically deleted
+:::
+
+:::{note}
+**Limitations with Parquet**:
+Parquet is a highly compressed data format for very efficient storage of
+tabular data. Please note that for OBJECT and ARRAY columns in CrateDB,
+the exported data will be JSON encoded when saving to Parquet
+(effectively saving them as strings). This is due to the complexity of
+encoding structs and lists in the Parquet format, where determining the
+exact schema might not be possible. When re-importing such a Parquet
+file, make sure you pre-create the table with the correct schema.
+:::
+
+
+
+