Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres source: fix CDC setup order docs #13949

Merged
merged 2 commits into from
Jun 21, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions docs/integrations/sources/postgres.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,17 +125,7 @@ We recommend using a user specifically for Airbyte's replication so you can mini

We recommend using a `pgoutput` plugin as it is the standard logical decoding plugin in Postgres. In case the replication table contains a lot of big JSON blobs and table size exceeds 1 GB, we recommend using a `wal2json` instead. Please note that `wal2json` may require additional installation for Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc. For more information read [wal2json documentation](https://github.com/eulerto/wal2json).

#### 4. Create replication slot

Next, you will need to create a replication slot. Here is the query used to create a replication slot called `airbyte_slot`:

```text
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
```

If you would like to use `wal2json` plugin, please change `pgoutput` to `wal2json` value in the above query.

#### 5. Create publications and replication identities for tables
#### 4. Create publications and replication identities for tables

For each table you want to replicate with CDC, you should add the replication identity \(the method of distinguishing between rows\) first. We recommend using `ALTER TABLE tbl1 REPLICA IDENTITY DEFAULT;` to use primary keys to distinguish between rows. After setting the replication identity, you will need to run `CREATE PUBLICATION airbyte_publication FOR TABLE <tbl1, tbl2, tbl3>;`. This publication name is customizable. Please refer to the [Postgres docs](https://www.postgresql.org/docs/10/sql-alterpublication.html) if you need to add or remove tables from your publication in the future.

Expand All @@ -145,6 +135,18 @@ Please note that:

The UI currently allows selecting any tables for CDC. If a table is selected that is not part of the publication, it will not replicate even though it is selected. If a table is part of the publication but does not have a replication identity, that replication identity will be created automatically on the first run if the Airbyte user has the necessary permissions.

#### 5. Create replication slot

Next, you will need to create a replication slot. It's important to create the publication first (as in step 4) before creating the replication slot. Otherwise, you can run into exceptions if there is any update to the database between the creation of the two.

Here is the query used to create a replication slot called `airbyte_slot`:

```text
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
```

If you would like to use `wal2json` plugin, please change `pgoutput` to `wal2json` value in the above query.

#### 6. Start syncing

When configuring the source, select CDC and provide the replication slot and publication you just created. You should be ready to sync data with CDC!
Expand Down