The following sections provide an overview of the ZetaSQL data model.
ZetaSQL data is stored in tables. Each table consists of an ordered list of columns and a number of rows. Each column has a name used to identify it through SQL statements, and is assigned a specific data type.
For example, the following table, Singers, is a standard SQL table.
Column Name | Data Type | Default Value |
---|---|---|
SingerId | INT64 |
<auto-increment> |
FirstName | STRING |
|
LastName | STRING |
|
BirthDate | DATE |
|
Status | STRING |
"active" |
SingerInfo | PROTO<SingerMetadata> |
|
Albums | PROTO<Album> |
The proto, SingerMetadata
, has the following definition:
message SingerMetadata {
optional string nationality = 1;
repeated Residence residence = 2;
message Residence {
required int64 start_year = 1;
optional int64 end_year = 2;
optional string city = 3;
optional string country = 4 [default = "USA"];
}
}
A SELECT *
statement on this table would return rows similar to the following:
SingerId | FirstName | LastName | BirthDate | Status | SingerInfo |
---|---|---|---|---|---|
1 | Marc | Richards | 1970-09-03 | active | {nationality: "England"} |
2 | Catalina | Smith | 1990-08-17 | inactive | {nationality: "U.S.A."} |
3 | Lea | Martin | 1991-11-09 | active | {nationality: "Australia"} |
4 | Xanathe | Riou | 1995-05-23 | inactive | {nationality: U.S.A."} |
While tables do not have a type, some operations will construct an implicit
STRUCT
type out of a SQL row, using the column names and types for field
definitions.
For more information on the data types ZetaSQL supports, see Data Types.
Constraints require that any writes to one or more columns, such as inserts or updates, conform to certain rules. Data manipulation language (DML) statements enforce constraints. ZetaSQL supports the following constraints:
-
Primary key constraint. A primary key consists of one or more columns, and specifies that the value of each row of these combined columns must be unique within that table. A table can contain at most one primary key constraint.
Some data manipulation language (DML) keywords may require the existence of a primary key. ZetaSQL also implicitly builds an index on the primary key. The default order of this index is ascending. The primary key can contain
NULL
values. -
Unique constraint. Specifies that one or more columns must contain only unique values. Unlike a primary key, more than one unique constraint can exist on a table.
An index allows the database engine to query a column or set of columns more quickly. You can specify that sort order is ascending or descending. A unique or primary key index defines an indexed column that is subject to the uniqueness constraint.
ZetaSQL tables support pseudocolumns. Pseudocolumns contain data elements that you can query like regular columns, but are not considered real columns in the table. Pseudocolumn values may not be physically stored with each row, but the query engine will materialize a value for the column using some appropriate mechanism.
For example, an engine might support a pseudocolumn called ROWNUM
, which
returns a number indicating the order in which a row was returned. You can then
construct a query like this:
SELECT ROWNUM, SingerId, FirstName, LastName FROM Singers
WHERE Status = "active"
Here's an example of rows returned by this query:
ROWNUM | SingerId | FirstName | LastName |
---|---|---|---|
1 | 1 | Marc | Richards |
2 | 3 | Lea | Martin |
In this case,the schema for the Singers table does not define a column,
ROWNUM
. Instead, the engine materializes the data only when requested.
To return a value of a pseudocolumn, you must specify it in your query.
Pseudocolumns do not show up in SELECT *
statements. For example:
SELECT * FROM singers
This query will return all named columns in the table, but won't include
pseudocolumns such as ROWNUM
.
In addition to standard SQL tables, ZetaSQL supports value tables.
In a value table, rather than having rows made up of a list of columns, each row
is a single value of type STRUCT
, and there are no column names.
In the following example, a value table for a STRUCT
is produced with the
SELECT AS VALUE
statement:
SELECT * FROM (SELECT AS VALUE STRUCT(123 AS a, FALSE AS b))
/*-----+-------*
| a | b |
+-----+-------+
| 123 | FALSE |
*-----+-------*/
Value tables are common when working with protocol buffers that may be stored in files instead of in the database.
For example, the following protocol buffer definition, AlbumReview
, contains
data about the reviews for an album.
message AlbumReview {
optional string albumtitle = 1;
optional string reviewer = 2;
optional string review = 3;
}
A list of AlbumReview
protocol buffers is stored in a file, AlbumReviewData
.
{albumtitle: "Songs on a Broken Banjo", reviewer: "Dan Starling", review: "Off key"}
{albumtitle: "Six and Seven", reviewer: "Alice Wayfarer", review: "Hurt my ears!"}
{albumtitle: "Go! Go! Go!", reviewer: "Eustace Millson", review: "My kids loved it!"}
The following query returns a stream of rows, with each row a value of type
AlbumReview
.
SELECT a FROM AlbumReviewsData AS a
To get specific data, such as all album titles in
the table, you have two options. You can specify albumtitle
as a protocol
buffer field:
SELECT a.albumtitle FROM AlbumReviewsData AS a
You can also access the top-level fields inside the value like columns in a table:
SELECT albumtitle FROM AlbumReviewsData
You can use ZetaSQL to return query results as a value table. This
is useful when you want to store a query result with a
PROTO
or type as a
table. To return a query result as a value table, use one of the following
statements:
Value tables can also occur as the output of the UNNEST
operator or a subquery. The WITH
clause
introduces a value table if the subquery used produces a value table.
In contexts where a query with exactly one column is expected, a value table query can be used instead. For example, scalar and array subqueries normally require a single-column query, but in ZetaSQL, they also allow using a value table query.
Most commonly, value tables are used for protocol buffer value tables, where the table contains a stream of protocol buffer values. In this case, the top-level protocol buffer fields can be used in the same way that column names are used when querying a regular table.
In some cases you might not want to work with the data within a protocol buffer, but with the protocol buffer itself.
Using SELECT AS VALUE
can help you keep your ZetaSQL statements as
simple as possible. To illustrate this, consider the
AlbumReview example specified earlier. To create a new
table from this data, you could write:
CREATE TABLE Reviews AS
SELECT albumreviews FROM AlbumReviewData AS albumreviews;
This statement creates a table that has a single column,
albumreviews
, which has a protocol buffer value of type
AlbumReviewData
. To retrieve all album titles from this table, you'd need to
write a query similar to:
SELECT r.albumreviews.albumtitle
FROM Reviews AS r;
Now, consider the same initial CREATE TABLE
statement, this time modified to
use SELECT AS VALUE
:
CREATE TABLE Reviews AS
SELECT AS VALUE albumreviews FROM AlbumReview AS albumreviews;
This statement creates a value table, instead of table. As a result, you can query any protocol buffer field as if it was a column. Now, if you want to retrieve all album titles from this table, you can write a much simpler query:
SELECT albumtitle
FROM Reviews;
In SET
operations like UNION ALL
you can combine tables with value tables,
provided that the table consists of a single column with a type that matches the
value table's type. The result of these operations is always a value table.
For example, consider the following definition for a table,
SingersAndAlbums
.
Column Name | Data Type |
---|---|
SingerId | INT64 |
AlbumId | INT64 |
AlbumReview | PROTO<AlbumReview> |
Next, we have a file, AlbumReviewData
that contains a list of AlbumReview
protocol buffers.
{albumtitle: "Songs on a Broken Banjo", reviewer: "Dan Starling", review: "Off key"}
{albumtitle: "Six and Seven", reviewer: "Alice Wayfarer", review: "Hurt my ears!"}
{albumtitle: "Go! Go! Go!", reviewer: "Eustace Millson", review: "My kids loved it!"}
The following query combines the AlbumReview
data from the
SingersAndAlbums
table with the data stored in the AlbumReviewData
file and
stores it in a new value table, AllAlbumReviews
.
SELECT AS VALUE sa.AlbumReview FROM SingersAndAlbums AS sa
UNION ALL
SELECT a FROM AlbumReviewData AS a
In most cases, pseudocolumns work the same with value tables as they do with tables.
For example, consider the following query and its results. This example works
because a
is an alias of the table AlbumReviewData
, and this table has a
ROWNUM
pseudocolumn. As a result, AlbumReviewData AS a
represents the
scanned rows, not the value.
-- This works
SELECT a.ROWNUM, a.albumtitle AS title FROM AlbumReviewData AS a
/*--------+---------------------------*
| ROWNUM | title |
+--------+---------------------------+
| 1 | "Songs on a Broken Banjo" |
| 2 | "Six and Seven" |
| 3 | "Go! Go! Go!" |
*--------+---------------------------*/
However, if you try to construct the query as follows, the query does not work.
The reason it fails is because the subquery,
SELECT a FROM AlbumReviewData AS a
, returns the AlbumReviewData
value only,
and this value does not have a field called ROWNUM
.
-- This fails
SELECT a.ROWNUM, a.albumtitle AS title FROM (SELECT a FROM AlbumReviewData AS a)
A property graph is a directed graph that includes the following parts:
- Nodes: Each node has a set of labels and properties. Each label has a name identifier and determines a set of properties. Each property has a name identifier and a value type.
- Edges: Similar to nodes, each edge has a set of labels and properties. Additionally, directed edges include source nodes and destination nodes.
- Labels: A label is identified by a unique name in the property graph and determines a set of properties. Nodes and edges can expose the same set of properties, using the same label.
- Properties: A property is identified by a unique name in a property graph. Properties declared on any label on a node or edge table are declared across the whole enclosing property graph, so they must always be consistent.
To create a property graph on top of a relational dataset, see the CREATE PROPERTY GRAPH statement in the DDL.