Skip to content

Commit

Permalink
Update quickstart
Browse files Browse the repository at this point in the history
  • Loading branch information
criccomini committed Oct 10, 2023
1 parent dab37d8 commit 79e7453
Showing 1 changed file with 76 additions and 30 deletions.
106 changes: 76 additions & 30 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,54 +24,48 @@ pip install 'recap-core[all]'
{: .note }
The `[all]` part will install all of Recap's dependencies, including the optional ones for systems like PostgreSQL, Snowflake, BigQuery, and so on. You might not need all of these dependencies, but they'll be available anyway.

## Add a Connection
## Read a Schema

You can use Recap's command line interface (CLI) to add systems for Recap to connect to. If you've got a PostgreSQL database running locally, you can add it to Recap like this:
Recap has two main commands: `ls` and `schema`. The `ls` command lets you list children of a URL. The URL structure depends on the system type. PostgreSQL URLs look like this:

```bash
recap add my_pg postgresql://user:pass@host:port/dbname
```
postgresql://user:pass@host:port/[database]/[schema]/[table]
```

This will add a system called `my_pg` to Recap.

{: .highlight }
Recap "systems" are just named connections to external systems. They're not the same as the systems themselves. Recap doesn't store any data from external systems. It just connects to them in realtime to fetch schemas.

## Read a Schema
{: .note}
The "schema" in the URL is PostgreSQL's [database schema](https://www.postgresql.org/docs/current/ddl-schemas.html), not a table schema. It's usually `public`.

Now that you've got a PostgreSQL database added, we can browse its structure.
Let's list the schemas for a database called `testdb`:

```bash
recap ls
recap ls postgresql://user:pass@host:port/testdb
```

```
[
"my_pg"
"pg_toast",
"pg_catalog",
"public",
"information_schema"
]
```

The `ls` command lists a path in Recap. Since we haven't supplied a path, it's listing systems. Let's keep drilling down:
There are four schemas. The `pg_toast` and `pg_catalog` schemas are internal to PostgreSQL. The `information_schema` schema is a standard schema that contains information about the database. The `public` schema is where our tables are located.

```bash
recap ls my_pg
recap ls postgresql://user:pass@host:port/testdb/public
```

```
[
"postgres",
"template0",
"template1",
"testdb"
"test_types"
]
```

You can see the databases inside my PostgreSQL system. Recap models Postgres paths as `[system]/[database]/[schema]/[table]`, so the the `testdb` database is `my_pg/testdb`.

Browse around a bit to get a feel for how things are structured. Eventually, you'll find a path to a table. In my database, I've got `my_pg/testdb/public/test_types`. Let's read the schema:
This database only has one table, `test_types`. Let's read the schema:

```bash
recap schema my_pg/testdb/public/test_types
recap schema postgresql://user:pass@host:port/testdb/public/test_types
```

```json
Expand All @@ -89,41 +83,93 @@ recap schema my_pg/testdb/public/test_types

This is `test_type`'s schema represented as a Recap schema in JSON. The `schema` command reads a schema at the supplied path, converts it to a Recap schema, and prints the Recap schema as a JSON object.

You can also output the schema in [Avro](https://avro.apache.org), [Protobuf](https://protobuf.dev), or [JSON Schema](https://json-schema.org) format using the `--output-format` switch. Here's the same schema in JSONN schema:

```bash
recap schema postgresql://user:pass@host:port/testdb/public/test_types --output-format=json
```

```json
{
"type": "object",
"properties": {
"test_bigint": {
"default": null,
"type": "integer"
}
}
}
```

## Start Recap's Server

We've been using Recap's CLI to read schemas, but Recap comes with an gateway server as well. The gateway server can list and read schemas over HTTP/JSON. You will find this handy if you're not using Python, or if you want to integrate Recap with other systems.
We've been using Recap's CLI to read schemas, but Recap comes with an HTTP/JSON server as well. The server has two parts:

- A [gateway](/docs/gateway/) to list and read schemas. You will find this handy if you're not using Python, or if you want to integrate Recap with other systems.
- A [registry](/docs/registry/) to store and retrieve schemas. This is useful for caching schemas or acting as a repository when using Recap schemas as a source of truth.

Start the server at [http://localhost:8000](http://localhost:8000):

```bash
recap serve
```

The server exposes `/ls` and `/schema` endpoints that are very similar to the CLI. I've already added a `my_pg` system in the [CLI](#cli) section above, so I can list the system in my Recap gateway:
### Read Schemas with the Gateway

The server exposes `/gateway/ls` and `/gateway/schema` endpoints that are very similar to the CLI:

```bash
$ curl http://localhost:8000/ls/my_pg
$ curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
```

```json
["postgres","template0","template1","testdb"]
["pg_toast","pg_catalog","public","information_schema"]
```

And much like the CLI, I can read my `test_types` schema:

```bash
curl http://localhost:8000/schema/my_pg/testdb/public/test_types
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
```

```json
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
```

{: .warning}
This example includes `user:pass` in the URL. This works, but is not recommended for security reasons. You should configure Recap's server to use the `RECAP_URLS` environment variable instead. See the [gateway configuration](/docs/gateway/#configuration) documentation for more details.

{: .note}
Recap's HTTP/JSON gateway does not require a database or any persistence. It just connects to external systems in realtime to fetch schemas.

### Store Schemas in the Registry

The server exposes a series of `/registry` endpoints with [GET/PUT/POST](/docs/registry/#api) methods.

To store a schema in the registry, use the `POST` method:

```bash
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
http://localhost:8000/registry/some_schema
```

And to read the schema, use the `GET` method on `/registry/[schema_name]`:

```bash
curl http://localhost:8000/registry/some_schema
```

```json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},4]
```

{: .note}
The schema response includes a version number. This is the schema's version in the registry. The registry will increment this number every time you update the schema.

## Next Steps

You've learned how to install Recap, connect it to a PostgreSQL database, read schemas, and start Recap's gateway server.
You've learned how to install Recap, list and read schemas, and use Recap server's gateway and registry endpoints.

Next, you should look at Recap's [integrations](/docs/integrations) page to learn how to use Recap with other systems. If you're planning on running Recap's gateway, check out its configuration options in the [gateway](/docs/gateway) documentation. Finally, see Recap's [API](/docs/api) documentation to learn how to use Recap's Python API.
Next, you should look at Recap's [integrations](/docs/integrations) page to learn how to use Recap with other systems. If you're planning on running Recap's server, check out the [gateway](/docs/gateway/) and [registry](/docs/registry/) documentation. Finally, see Recap's [Python](/docs/python) documentation to learn how to use Recap's Python API.

0 comments on commit 79e7453

Please sign in to comment.