Skip to content

Latest commit

 

History

History
919 lines (772 loc) · 27 KB

virtual_schema_api.md

File metadata and controls

919 lines (772 loc) · 27 KB

Virtual Schema API Documentation

Table of Contents

Introduction

In the Exasol database, a Virtual Schema adapter script is similar to a UDF script. The adapter script defines a callback function and the Exasol core calls this function in different situations, for example when creating a Virtual Schema or during query processing. In order to communicate with the Virtual Schema adapter script, the Exasol database passes a JSON parameter containing the Virtual Schema API request to the callback function which then returns a JSON string containing the response. This page documents these JSON messages.

Virtual Schema Query Process

To get a better understanding let's take a look on how the Exasol database processes a Virtual Schema query:

virtual schema query processing

The diagram shows how Exasol handles Virtual Schema queries:

  • When the core receives an SQL query on a Virtual Schema table, it first checks the capabilities of the corresponding Virtual Schema adapter. Based on that information it removes all functions and literals that are not supported by the adapter.
  • Next, the Exasol Core sends a query to the Virtual Schema adapter as a PushdownRequest.
  • The Virtual Schema adapter now rewrites the query into a new SQL statement that typically invokes the Exasol importer IMPORT INTO .... For details see the IMPORT statements documentation. The importer statement contains a query to the external database as a string.
  • Next, the Exasol database parses this statement again and invokes the importer.
  • Finally, the Exasol core applies the functions that were not supported by the remote database itself as post processing and returns that result to the SQL client.

Instead of the IMPORT statement the adapter can also create other SQL statements. The important part is, that the Virtual Schema adapter receives an SQL statement and rewrites it into another SQL statement. For example, it can create a SELECT FROM VALUES statement with an inlined result.

Requests and Responses

There are the following request and response types:

Type Called ...
Create Virtual Schema … for each CREATE VIRTUAL SCHEMA ... statement
Refresh … for each ALTER VIRTUAL SCHEMA ... REFRESH ... statement.
Set Properties … for each ALTER VIRTUAL SCHEMA ... SET ... statement.
Drop Virtual Schema … for each DROP VIRTUAL SCHEMA ... statement.
Get Capabilities … whenever a virtual table is queried in a SELECT statement.
Pushdown … whenever a virtual table is queried in a SELECT statement.

We describe each of the types in the following sections. Please note: To keep the documentation concise we defined the elements which are commonly in separate sections below, e.g. schemaMetadataInfo and schemaMetadata.

Create Virtual Schema

Informs the Adapter about the request to create a Virtual Schema, and asks the Adapter for the metadata (tables and columns).

The Adapter is allowed to throw an Exception if the user missed to provide mandatory properties or in case of any other problems (e.g. connectivity).

Request:

{
    "type": "createVirtualSchema",
    "schemaMetadataInfo": {
        ...
    }
}

Response:

{
    "type": "createVirtualSchema",
    "schemaMetadata": {
        ...
    }
}

Notes

  • schemaMetadata is mandatory. However, it is allowed to contain no tables.

Refresh

Request to refresh the metadata for the whole Virtual Schema, or for specified tables.

Request:

{
    "type": "refresh",
    "schemaMetadataInfo": {
        ...
    },
    "requestedTables": ["T1", "T2"]
}

Notes

  • requestedTables is optional. If existing, only the specified tables shall be refreshed. The specified tables do not have to exist, it just tell Adapter to update these tables (which might be changed, deleted, added, or non-existing).

Response:

{
    "type": "refresh",
    "schemaMetadata": {
        ...
    },
    "requestedTables": ["T1", "T2"]
}

Notes

  • schemaMetadata is optional. It can be skipped if the adapter does not want to refresh (e.g. because it detected that there is no change).
  • requestedTables must exist if and only if the element existed in the request. The values must be the same as in the request (to make sure that Adapter only refreshed these tables).

Set Properties

Request to set and unset properties. The Adapter can decide whether it needs to send back new metadata. The Adapter is allowed to throw an Exception if the user provided invalid properties or in case of any other problems (e.g. connectivity).

Request:

{
    "type": "setProperties",
    "properties": {
        "JDBC_CONNECTION_STRING": "new-jdbc-connection-string",
        "NEW_PROPERTY": "value of a not yet existing property",
        "UNSET_PROPERTY": null
    },
    "schemaMetadataInfo": {
        ...
        "properties": {
            "JDBC_CONNECTION_STRING": "current-jdbc-connection-string",
            "THIS_REMAINS_UNCHANGED": "not listed in the section above, so the value remains as it is"
        }
    }
}

The new values appear in a separate object /properties directly in the root element. For reference the current property values are still reported under '/schemaMetadataInfo/properties'.

Adapters must apply the properties incrementally, meaning that all parameters not explicitly listed in /properties, remain unchanged.

Response:

{
    "type": "setProperties",
    "schemaMetadata": {
        ...
    }
}

Notes

  • Request: A property set to null means that this property was asked to be deleted. Properties set to null might also not have existed before.
  • Response: schemaMetadata is optional. It only exists if the adapter wants to send back new metadata. The existing metadata are overwritten completely.

Drop Virtual Schema

Inform the Adapter that a Virtual Schema is about to be dropped. The Adapter can update external dependencies if it has such. The Adapter is not expected to throw an exception, and if it does, it will be ignored.

Request:

{
    "type": "dropVirtualSchema",
    "schemaMetadataInfo": {
        ...
    }
}

Response:

{
    "type": "dropVirtualSchema"
}

Get Capabilities

Request the list of capabilities supported by the Adapter. Based on these capabilities, the database will collect everything that can be pushed down in the current query and sends a pushdown request afterwards.

Request:

{
    "type": "getCapabilities",
    "schemaMetadataInfo": {
        ...
    }
}

Response:

{
    "type": "getCapabilities",
    "capabilities": [
        "ORDER_BY_COLUMN",
        "AGGREGATE_SINGLE_GROUP",
        "LIMIT",
        "AGGREGATE_GROUP_BY_TUPLE",
        "FILTER_EXPRESSIONS",
        "SELECTLIST_EXPRESSIONS",
        "SELECTLIST_PROJECTION",
        "AGGREGATE_HAVING",
        "ORDER_BY_EXPRESSION",
        "AGGREGATE_GROUP_BY_EXPRESSION",
        "LIMIT_WITH_OFFSET",
        "AGGREGATE_GROUP_BY_COLUMN",
        "FN_PRED_LESSEQUALS",
        "FN_AGG_COUNT",
        "LITERAL_EXACTNUMERIC",
        "LITERAL_DATE",
        "LITERAL_INTERVAL",
        "LITERAL_TIMESTAMP_UTC",
        "LITERAL_TIMESTAMP",
        "LITERAL_NULL",
        "LITERAL_STRING",
        "LITERAL_DOUBLE",
        "LITERAL_BOOL"
    ]
}

The set of capabilities in the example above would be sufficient to pushdown all aspects of the following query:

SELECT user_id, COUNT(url)
FROM   vs.clicks
WHERE  user_id>1
GROUP  BY user_id
HAVING count(url)>1
ORDER  BY user_id
LIMIT  10;

Capability Prefixes

See also List of supported Capabilities.

Capability Prefix Java Implementation
Main Capabilities (none) MainCapability.java
Literal Capabilities LITERAL_ LiteralCapability.java
Predicate Capabilities FN_PRED_ PredicateCapability.java
Scalar Function Capabilities FN_ ScalarFunctionCapability.java
Aggregate Function Capabilities FN_AGG_ AggregateFunctionCapability.java

Dependent Capabilities and Capability Groups

There are capabilities that come in groups, sharing the same group prefix. An example is the FN_AGG_COUNT group. In this case there is a base capability that needs to be set as a precondition of setting any other capability in that group.

As an example, to allow push-down of SELECT COUNT(*) the Virtual Schema adapter needs to report FN_AGG_COUNT + FN_AGG_COUNT_STAR.

Similarly a left join only works if the VS adapter reports JOIN + JOIN_TYPE_LEFT_OUTER.

If the group has more members, you can combine all of them, but the base capability is the key of enabling any other capability in that group.

Pushdown

Contains an abstract specification of what to be pushed down, and requests a pushdown SQL statement from the Adapter which can be used to retrieve the requested data.

Request:

Running the following query

SELECT user_id, COUNT(url)
FROM   vs.clicks
WHERE  user_id>1
GROUP  BY user_id
HAVING count(url)>1
ORDER  BY user_id
LIMIT  10;

will produce the following Request, assuming that the Adapter has all required capabilities.

{
    "type": "pushdown",
    "pushdownRequest": {
        "type" : "select",
        "aggregationType" : "group_by",
        "from" :
        {
            "type" : "table",
            "name" : "CLICKS"
        },
        "selectList" :
        [
            {
                "type" : "column",
                "name" : "USER_ID",
                "columnNr" : 1,
                "tableName" : "CLICKS"
            },
            {
                "type" : "function_aggregate",
                "name" : "count",
                "arguments" :
                [
                    {
                        "type" : "column",
                        "name" : "URL",
                        "columnNr" : 2,
                        "tableName" : "CLICKS"
                    }
                ]
            }
        ],
        "selectListDataTypes" :
        [
            {
                "precision" : 18,
                "scale" : 0,
                "type" : "DECIMAL"
            },
            {
                "precision" : 10,
                "scale" : 0,
                "type" : "DECIMAL"
            }
        ],
        "filter" :
        {
            "type" : "predicate_less",
            "left" :
            {
                "type" : "literal_exactnumeric",
                "value" : "1"
            },
            "right" :
            {
                "type" : "column",
                "name" : "USER_ID",
                "columnNr" : 1,
                "tableName" : "CLICKS"
            }
        },
        "groupBy" :
        [
            {
                "type" : "column",
                "name" : "USER_ID",
                "columnNr" : 1,
                "tableName" : "CLICKS"
            }
        ],
        "having" :
        {
            "type" : "predicate_less",
            "left" :
            {
                "type" : "literal_exactnumeric",
                "value" : "1"
            },
            "right" :
            {
                "type" : "function_aggregate",
                "name" : "count",
                "arguments" :
                [
                    {
                        "type" : "column",
                        "name" : "URL",
                        "columnNr" : 2,
                        "tableName" : "CLICKS"
                    }
                ]
            }
        },
        "orderBy" :
        [
            {
                "type" : "order_by_element",
                "expression" :
                {
                    "type" : "column",
                    "columnNr" : 1,
                    "name" : "USER_ID",
                    "tableName" : "CLICKS"
                },
                "isAscending" : true,
                "nullsLast" : true
            }
        ],
        "limit" :
        {
            "numElements" : 10
        }
    },
    "involvedTables": [
    {
        "name" : "CLICKS",
        "columns" :
        [
            {
                "name" : "ID",
                "dataType" :
                {
                    "type" : "DECIMAL",
                    "precision" : 18,
                    "scale" : 0
                }
            },
            {
                "name" : "USER_ID",
                "dataType" :
                {
                   "type" : "DECIMAL",
                   "precision" : 18,
                    "scale" : 0
                }
            },
            {
                "name" : "URL",
                "dataType" :
                {
                   "type" : "VARCHAR",
                   "size" : 1000
                }
            },
            {
                "name" : "REQUEST_TIME",
                "dataType" :
                {
                    "type" : "TIMESTAMP"
                }
            }
        ]
    }
    ],
    "schemaMetadataInfo": {
        ...
    }
}

Notes

  • pushdownRequest: Specification what needs to be pushed down. You can think of it like a parsed SQL statement.
    • from: The requested from clause. This can be a table or a join.
    • selectList: The requested select list. There are three options for this field:
      • selectList is not given: This means SELECT *. Adapters requiring an explicit select list can use the field from to get the names and aliases of the tables (depth-first search on joins) and the field involvedTables to get the columns for each table.
      • selectList is an empty array: Select any column/expression. This is used, for example, if a query can not be pushed down completely. The adapter may choose something like SELECT TRUE to get the correct number of rows. Otherwise selectList contains the requested select list elements, a list of expressions. The order of the elements matters.
      • selectListDataTypes: List of data types expected in the result set. This information is only provided by Exasol databases with version 8.6.0 and higher or major version 7 and version ≥ 7.1.14, see Exasol Data Types API Documentation. The data types are retrieved and sent by the database's compiler from the virtual schema's virtual table definitions. These definitions are stored when creating or refreshing the virtual schema.
    • filter: The requested filter (where clause), a single expression.
    • aggregationType: An optional element, set if an aggregation is requested. Either group_by or single_group, if a aggregate function is used but no group by.
    • groupBy: The requested group by clause, a list of expressions.
    • having: The requested having clause, a single expression.
    • orderBy: The requested order-by clause, a list of order_by_element elements.
    • limit The requested limit of the result set, with an optional offset.
  • involvedTables: Metadata of the involved tables, encoded like in schemaMetadata.

Response:

Following the example above, a valid result could look like this:

{
    "type": "pushdown",
    "sql": "IMPORT FROM JDBC AT 'jdbc:exa:remote-db:8563;schema=native' USER 'sys' IDENTIFIED BY 'exasol' STATEMENT 'SELECT USER_ID, count(URL) FROM NATIVE.CLICKS WHERE 1 < USER_ID GROUP BY USER_ID HAVING 1 < count(URL) ORDER BY USER_ID LIMIT 10'"
}

Notes

  • sql: The pushdown SQL statement. It must be either an SELECT or IMPORT statement.

Embedded Commonly Used JSON Elements

The following Json objects can be embedded in a request or response. They have a fixed structure.

Schema Metadata Info

This document contains the most important metadata of the virtual schema and is sent to the adapter just "for information" with each request. It is the value of an element called schemaMetadataInfo.

{
    "schemaMetadataInfo": {
        "name": "MY_HIVE_VSCHEMA",
        "adapterNotes": "<serialized adapter state>",
        "properties": {
            "HIVE_SERVER": "my-hive-server",
            "HIVE_DB": "my-hive-db",
            "HIVE_USER": "my-hive-user"
        }
    }
}

Schema Metadata

This document is usually embedded in responses from the Adapter and informs the database about all metadata of the Virtual Schema, especially the contained Virtual Tables and its columns.

The Adapter can optionally store so called adapterNotes on each level (schema, table, column), to remember information which might be relevant for the Adapter in future. Adapter notes are simple strings. You can serialize objects into those strings of course, but keep in mind that the strings are embedded inside the Virtual Schemas JSON protocol, which makes quoting of conflicting characters necessary.

Some options to deal with the embedding issue:

  1. After serialization use Base64 encoding or
  2. Use a serialization that does not have conflicting characters like a simple CSV or key-value format or
  3. Quote conflicting characters

Which variant you should choose depends on considerations like amount of data to be transmitted, original data format and encoding overhead.

In the example below, the Adapter remembers the table partitioning and the data type of a column which is not directly supported in Exasol. The Adapter has this information during push-down and can consider the table partitioning during push-down or can add an appropriate cast for the column.

This example also demonstrates serialization in adapter notes via key-value encoding. As mentioned above more sophisticated serializations are possible as long as you make sure adapter notes are a valid string in the JSON format by encoding or quoting.

{"schemaMetadata":{
    "adapterNotes": "lastRefreshed=2015-03-01 12:10:01;anotherKey=More custom schema state here",
    "tables": [
    {
        "type": "table",
        "name": "EXASOL_CUSTOMERS",
        "adapterNotes": "hivePartitionColumns=CREATED,COUNTRY_ISO",
        "columns": [
        {
            "name": "ID",
            "dataType": {
                "type": "DECIMAL",
                "precision": 18,
                "scale": 0
            },
            "isIdentity": true
        },
        {
            "name": "COMPANY_NAME",
            "dataType": {
                "type": "VARCHAR",
                "size": 1000,
                "characterSet": "UTF8"
            },
            "default": "foo",
            "isNullable": false,
            "comment": "The official name of the company",
            "adapterNotes": "hiveDataType=List<String>"
        },
        {
            "name": "DISCOUNT_RATE",
            "dataType": {
                "type": "DOUBLE"
            }
        }
        ]
    },
    {
        "type": "table",
        "name": "TABLE_2",
        "columns": [
        {
            "name": "COL1",
            "dataType": {
                "type": "DECIMAL",
                "precision": 18,
                "scale": 0
            }
        },
        {
            "name": "COL2",
            "dataType": {
                "type": "VARCHAR",
                "size": 1000
            }
        }
        ]
    }
    ]
}}

Expressions

This section handles the expressions that can occur in a pushdown request. Expressions are consistently encoded in the following way. This allows easy and consisting parsing and serialization.

{
    "type": "<type-of-expression>",
    ...
}

Each expression-type can have any number of additional fields of arbitrary type. In the following sections we define the known expressions.

Table

This element currently only occurs in from clause

{
    "type": "table",
    "name": "CLICKS",
    "alias": "A"
}

Notes

  • alias: This is an optional property and is added if the table has an alias in the original query.

Join

This element currently only occurs in from clause

{
    "type": "join",
    "join_type": "inner",
    "left": {
        ...
    },
    "right" : {
        ...
    },
    "condition" : {
        ...
    }
}

Notes

  • join_type: Can be inner, left_outer, right_outer or full_outer.
  • left: This can be a table or a join.
  • right: This can be a table or a join.
  • condition: This can be an arbitrary expression.

Column Lookup

A column lookup is a reference to a table column. It can reference the table directly or via an alias.

{
    "type": "column",
    "tableName": "T",
    "tableAlias": "A",
    "columnNr": 0,
    "name": "ID"
}

Notes

  • tableAlias: This is an optional property and is added if the referenced table has an alias.
  • columnNr: Column number in the virtual table, starting with 0.

Order By Element

{
    "type": "order_by_element",
    "expression": {
        ...
    },
    "isAscending": true,
    "nullsLast": true
}

Notes

  • The field expression contains the expression to order by.

Data Types

Refer to the Exasol Data Types API Documentation

Literal

{
    "type": "literal_null"
}
{
    "type": "literal_string",
    "value": "my string"
}
{
    "type": "literal_double",
    "value": "1.2345000000000000e+02"
}
{
    "type": "literal_exactnumeric",
    "value": "123.45"
}
{
    "type": "literal_bool",
    "value": true
}
{
    "type": "literal_date",
    "value": "2015-12-01"
}
{
    "type": "literal_timestamp",
    "value": "2015-12-01 12:01:01.1234"
}
{
    "type": "literal_timestamputc",
    "value": "2015-12-01 12:01:01.1234"
}
{
    "type": "literal_interval",
    "value": "+2-01",
    "dataType": {
            "type": "INTERVAL",
            "fromTo": "YEAR TO MONTH",
            "precision": 2
        }
}
{
    "type": "literal_interval",
    "value": "+0 00:00:02.000",
    "dataType": {
            "type": "INTERVAL",
            "fromTo": "DAY TO SECONDS",
            "precision": 2,
            "fraction": 2
        }
}

Predicates

Whenever there is ... this is a shortcut for an arbitrary expression.

AND / OR
{
    "type": "predicate_and",
    "expressions": [
        ...
    ]
}

The same can be used for predicate_or.

NOT / IS NULL / IS NOT NULL
{
    "type": "predicate_not",
    "expression": {
        ...
    }
}

The same can be used for predicate_is_null, predicate_is_not_null.

Comparison operators
{
    "type": "predicate_equal",
    "left": {
        ...
    },
    "right": {
        ...
    }
}

The same can be used for predicate_notequal, predicate_less and predicate_lessequal.

LIKE
{
    "type": "predicate_like",
    "expression": {
        ...
    },
    "pattern": {
        ...
    },
    "escapeChar": {
        ...
    }
}

Notes:

  • escapeChar is optional.
REGEXP_LIKE
{
    "type": "predicate_like_regexp",
    "expression": {
        ...
    },
    "pattern": {
        ...
    }
}
BETWEEN
{
    "type": "predicate_between",
    "expression": {
        ...
    },
    "left": {
        ...
    },
    "right": {
        ...
    }
}
IN

<exp> IN (<const1>, <const2>)

{
    "type": "predicate_in_constlist",
    "expression": {
        ...
    }
    "arguments": [
        ...
    ]
}
IS JSON / IS NOT JSON

exp1 IS JSON {VALUE | ARRAY | OBJECT | SCALAR} {WITH | WITHOUT} UNIQUE KEYS (requires predicate capability IS_JSON)

{
    "type": "predicate_is_json",
    "expression": {
        ...
    },
    "typeConstraint": "VALUE",
    "keyUniquenessConstraint": "WITHOUT UNIQUE KEYS"
}

Notes:

  • typeConstraint is "VALUE", "ARRAY", "OBJECT", or "SCALAR".
  • keyUniquenessConstraint is "WITH UNIQUE KEYS" or "WITHOUT UNIQUE KEYS".

The same can be used for a predicate type predicate_is_not_json (requires predicate capability IS_NOT_JSON).

Scalar Functions

Refer to the Exasol Scalar Functions API Documentation

Aggregate Functions

Refer to the Exasol Aggregate Functions API Documentation