Skip to content

Commit

Permalink
Update clients
Browse files Browse the repository at this point in the history
  • Loading branch information
criccomini committed Sep 7, 2023
1 parent 1180d1b commit 0368113
Show file tree
Hide file tree
Showing 6 changed files with 192 additions and 81 deletions.
43 changes: 32 additions & 11 deletions docs/integrations/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,48 @@ parent: "Integrations"
1. TOC
{:toc}

The `BigQueryReader` class is used to convert BigQuery table schemas to Recap types. The main method in this class is `to_recap`.
## Connecting

## `to_recap`
### CLI

```python
def to_recap(self, dataset: str, table: str) -> StructType
```bash
recap add my_bq bigquery://
```

The `to_recap` method takes in the name of a BigQuery dataset and table, and returns a Recap `StructType` that represents the BigQuery table schema.
### Environment Variables

```bash
export RECAP_SYSTEM__MY_BQ=bigquery://
```

### Example
### Python API

```python
from google.cloud import bigquery
from recap.readers.bigquery import BigQueryReader
from recap.clients import create_client

with create_client("bigquery://") as client:
client.ls("my_project")
```

## Format

### URLs

client = bigquery.Client()
recap_schema = BigQueryReader(client).to_recap("my_dataset", "my_table")
Recap's BigQuery URL format:

```
bigquery://<project>
```

### Paths

Recap's BigQuery paths are formatted as:

```
[system]/[project]/[dataset]/[table]
```

In this example, `recap_schema` will be a `StructType` that represents the schema of `my_table` in `my_dataset`.
The `BigQueryReader` class is used to read BigQuery table schemas as Recap schemas.

## Type Conversion

Expand Down
46 changes: 33 additions & 13 deletions docs/integrations/confluent-schema-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,51 @@ parent: "Integrations"
1. TOC
{:toc}

The `ConfluentRegistryReader` class is used to convert schemas registered in a Confluent Schema Registry to Recap types. The main method in this class is `to_recap`.
## Connecting

## `to_recap`
### CLI

```python
def to_recap(self, topic: str) -> StructType
```bash
recap add my_csr http+csr://my-registry:8081
```

The `to_recap` method takes in the name of a Kafka topic, fetches the associated schema from the Confluent Schema Registry, and converts it to a Recap `StructType`. The method supports Avro, JSON, and Protobuf schemas.
### Environment Variables

```bash
export RECAP_SYSTEM__MY_CSR=http+csr://my-registry:8081
```

### Example
### Python API

```python
from confluent_kafka.schema_registry import SchemaRegistryClient
from recap.readers.confluent_registry import ConfluentRegistryReader
from recap.clients import create_client

with create_client("http+csr://my-registry:8081") as client:
client.ls()
```

## Format

registry = SchemaRegistryClient({"url": "http://my-registry:8081"})
recap_schema = ConfluentRegistryReader(registry).to_recap("my_topic")
### URLs

Recap's [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html) client takes a URL pointing to the Confluent Schema Registry HTTP server.

{: .note }
The scheme must be `http+csr` or `https+csr`. The `+csr` suffix is required to distinguish this client from other clients that also use HTTP connections (similar to [SQLAlchemy's `dialect+driver` format](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls))

### Paths

Recap's Confluent Schema Registry paths are formatted as:

```
[system]/[topic]
```

In this example, `recap_schema` will be a `StructType` that represents the schema of the value of messages in `my_topic`.
You may optionally include `-key` or `-value` at the end of the path to specify the key or value schema, respectively. If no suffix is supplied `-value` is assumed.

## Type Conversion

The `to_recap` method uses the `AvroConverter`, `JSONSchemaConverter`, and `ProtobufConverter` classes to convert schemas, based on their type.
`ConfluentRegistryClient.get_schema()` uses the `AvroConverter`, `JSONSchemaConverter`, and `ProtobufConverter` classes to convert schemas, based on their type.

Please see the individual documentation for these classes for information on how they convert types:

Expand All @@ -46,4 +66,4 @@ Please see the individual documentation for these classes for information on how

1. ConfluentRegistryReader does not support [schema references](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#schema-references).

The conversion functions raise a `ValueError` exception if the conversion is not possible.
The conversion functions raise a `ValueError` exception if the conversion is not possible.
47 changes: 30 additions & 17 deletions docs/integrations/hive-metastore.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,45 @@ parent: "Integrations"
1. TOC
{:toc}

The `HiveMetastoreReader` class is used to convert Hive table schemas into Recap types. This class can also be used to fetch and convert table statistics from Hive Metastore.
## Connecting

## `to_recap`
### CLI

```python
def to_recap(
self,
database_name: str,
table_name: str,
include_stats: bool = False,
) -> StructType
```bash
recap add my_hms thrift+hms://hive:password@localhost:9083
```

The `to_recap` method takes in the name of a database and a table within that database, retrieves the associated schema from the Hive Metastore, and converts it into a Recap `StructType`. If `include_stats` is set to True, the method will also fetch table statistics from the Hive Metastore and include them in the returned `StructType`.
### Environment Variables

```bash
export RECAP_SYSTEM__MY_HMS=thrift+hms://hive:password@localhost:9083
```

### Example
### Python API

```python
from pymetastore.metastore import HMS
from recap.readers.hive_metastore import HiveMetastoreReader
from recap.clients import create_client

with HMS.create("localhos", 9093) as client:
recap_schema = HiveMetastoreReader(client).to_recap("my_database", "my_table")
with create_client("thrift+hms://hive:password@localhost:9083") as client:
client.ls("testdb")
```

In this example, `recap_schema` will be a `StructType` that represents the schema of the `my_table` table in the `my_database` database.
## Format

### URLs

Recap's Hive Metastore client takes the Thrift URL to the Hive Metastore.

{: .note }
The scheme must be `thrift+hms`. The `+hms` suffix is required to distinguish this client from other clients that also use Thrift connections (similar to [SQLAlchemy's `dialect+driver` format](https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls))

### Paths

Recap's Confluent Schema Registry paths are formatted as:

```
[system]/[database]/[table]
```

## Type Conversion

Expand Down Expand Up @@ -66,4 +79,4 @@ In this example, `recap_schema` will be a `StructType` that represents the schem

## Limitations and Constraints

The conversion functions raise a `ValueError` exception if the conversion is not possible.
The conversion functions raise a `ValueError` exception if the conversion is not possible.
47 changes: 28 additions & 19 deletions docs/integrations/mysql.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,44 @@ parent: "Integrations"
1. TOC
{:toc}

The `MysqlReader` class is used to convert MySQL table schemas to Recap types. The main method in this class is `to_recap`.
## Connecting

## `to_recap`
### CLI

```python
def to_recap(self, table: str, schema: str, catalog: str) -> StructType
```bash
recap add my_sql mysql://mysql:password@localhost:3306
```

The `to_recap` method takes in the name of a MySQL table, schema, and catalog, and returns a Recap `StructType` that represents the PostgreSQL table schema.
### Environment Variables

{: .note }
MySQL's schema and catalog semantics are non-standard. The `schema` parameter is MySQL's database. The `catalog` parameter should always be [`"def"`](https://dev.mysql.com/doc/refman/8.0/en/information-schema-columns-table.html). These naming conventions are included for consistency with other readers.
```bash
export RECAP_SYSTEM__MY_SQL=mysql://mysql:password@localhost:3306
```

### Example
### Python API

```python
from mysql.connector import connect
from recap.readers.mysql import MysqlReader
from recap.clients import create_client

connection = connect(database="my_database", user="my_user", password="my_password")
recap_schema = MysqlReader(connection).to_recap("my_table", "my_database", "def")
with create_client("mysql://mysql:password@localhost:3306") as client:
client.ls("testdb")
```

In this example, `recap_schema` will be a `StructType` that represents the schema of `my_table` in the `my_database` database.
## Format

## Type Conversion
### URLs

This table shows the corresponding Recap types for each MySQL type, along with the associated attributes:
Recap's MySQL client takes a MySQL URL with an optional DB in the path.

### Paths

Recap's MySQL paths are formatted as:

```
[system]/[database]/[table]
```

## Type Conversion

| MySQL Type | Recap Type |
|------------|------------|
Expand All @@ -59,8 +69,7 @@ This table shows the corresponding Recap types for each MySQL type, along with t

## Limitations and Constraints

The conversion functions raise a `ValueError` exception if the conversion is not possible due to the PostgreSQL data type being unknown.

JSON column types are converted to Recap `StringType` with `variable=True` and `bytes_=OCTET_LENGTH`. This is not technically accurate, since the JSON type is a binary type that includes the structure of the JSON data. This is the best approximation.
1. JSON column types are converted to Recap `StringType` with `variable=True` and `bytes_=OCTET_LENGTH`. This is not technically accurate, since the JSON type is a binary type that includes the structure of the JSON data. This is the best approximation.
2. BIT columns are always converted to a BytesType with `variable=False` and `bytes_=8`.

BIT columns are always converted to a BytesType with `variable=False` and `bytes_=8`.
The conversion functions raise a `ValueError` exception if the conversion is not possible.
44 changes: 30 additions & 14 deletions docs/integrations/postgresql.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,47 @@ parent: "Integrations"
1. TOC
{:toc}

The `PostgresqlReader` class is used to convert PostgreSQL table schemas to Recap types. The main method in this class is `to_recap`.
## Connecting

## `to_recap`
### CLI

```python
def to_recap(self, table: str, schema: str, catalog: str) -> StructType
```bash
recap add my_pg postgresql://postgres:password@localhost:5432
```

The `to_recap` method takes in the name of a PostgreSQL table, schema, and catalog, and returns a Recap `StructType` that represents the PostgreSQL table schema.
### Environment Variables

```bash
export RECAP_SYSTEM__MY_PG=postgresql://postgres:password@localhost:5432
```

### Example
### Python API

```python
from psycopg2 import connect
from recap.readers.postgresql import PostgresqlReader
from recap.clients import create_client

connection = connect(database="my_database", user="my_user", password="my_password")
recap_schema = PostgresqlReader(connection).to_recap("my_table", "my_schema", "my_catalog")
with create_client("postgresql://postgres:password@localhost:5432") as client:
client.ls("testdb")
```

In this example, `recap_schema` will be a `StructType` that represents the schema of `my_table` in `my_schema` within `my_catalog`.
## Format

## Type Conversion
### URLs

Recap's PostgreSQL client takes a PostgreSQL URL with an optional DB in the path.

### Paths

This table shows the corresponding Recap types for each PostgreSQL type, along with the associated attributes:
Recap's PostgreSQL paths are formatted as:

```
[system]/[database]/[schema]/[table]
```

{: .note }
The `schema` component is not a data model schema. It's [PostgreSQL's schema](https://www.postgresql.org/docs/current/ddl-schemas.html), which is similar to a namespace. `schema` is usually `public`.

## Type Conversion

| PostgreSQL Type | Recap Type |
|-----------------|------------------------------------|
Expand All @@ -53,4 +69,4 @@ This table shows the corresponding Recap types for each PostgreSQL type, along w

## Limitations and Constraints

The conversion functions raise a `ValueError` exception if the conversion is not possible due to the PostgreSQL data type being unknown.
The conversion functions raise a `ValueError` exception if the conversion is not possible.
Loading

0 comments on commit 0368113

Please sign in to comment.