Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"no known snapshots" error when db rows are large #74

Closed
csdev opened this issue Jul 31, 2018 · 25 comments
Closed

"no known snapshots" error when db rows are large #74

csdev opened this issue Jul 31, 2018 · 25 comments

Comments

@csdev
Copy link

csdev commented Jul 31, 2018

We are using wal2json with Debezium to replicate some db tables containing large rows. While streaming changes, we have encountered the following error:

2018-07-27T19:54:09.616547179Z ERROR:  no known snapshots
--
2018-07-27T19:54:09.616587252Z CONTEXT:  slot "debezium", output plugin "wal2json", in the change callback, associated LSN 0/16D3850

We think this is a bug with wal2json's handling of TOAST columns, since this behavior only started happening after commit ce82d73

I have a more detailed analysis, including steps to reproduce the issue, up on Debezium's bugtracker:
https://issues.jboss.org/browse/DBZ-842

@notxcain
Copy link

notxcain commented Aug 8, 2018

Can confirm. This hits us in production daily.

@rcoup
Copy link
Contributor

rcoup commented Aug 8, 2018

See comment on the debezium ticket, but I wonder if this is related to TOAST values being split into multiple chunks (PG suggests at ~2000 bytes).

But I'd expect to see the same problem both when the TOAST column does change, as well as when it doesn't (what ce82d73 enables output for). @csdev can you verify/check that?

@eulerto
Copy link
Owner

eulerto commented Aug 9, 2018

@csdev @notxcain I can't reproduce the problem with wal2json. I even try it with a bigger jsonb (~15MB) but no success. Please provide a test case.

@eulerto
Copy link
Owner

eulerto commented Aug 9, 2018

See comment on the debezium ticket, but I wonder if this is related to TOAST values being split into multiple chunks (PG suggests at ~2000 bytes).

@rcoup cannot get a snapshot is not related to cannot get the whole TOAST value.

@c-nichols
Copy link

c-nichols commented Aug 9, 2018

@eulerto I am having trouble reproducing in a testcase, but I can reproduce reliably using pg_recvlogical in one terminal and psql to enter commands in another:

Setup:

  • ubuntu 16.04
  • Postgresql 9.6.9
  • wal2json commit d4c0e81 built locally

Hope that helps. I am not sure what is different about the testcase situation.

@jouir
Copy link

jouir commented Aug 27, 2018

I'm able to reproduce on debian stretch using pgdg distribution (wal2json 1.0-2.pgdg90+2, postgres 9.6.10) with large jsonb data on production and with the previous testcase. A workaround is to disable include-unchanged-toast option.

@anarazel
Copy link

@eulerto You cannot reliably access unchanged toast data. It's not in the WAL stream, and therefore logical decoding cannot access it safely. That's why it's separately marked. There's no way include-unchanged-toast=1 is ever safe.

This was reported to the bug lists at https://www.postgresql.org/message-id/DB5PR0201MB17507769C5709234B25A409BE2350@DB5PR0201MB1750.eurprd02.prod.outlook.com

@anarazel
Copy link

@rcoup ^

@eulerto
Copy link
Owner

eulerto commented Aug 27, 2018

@anarazel I came to the same conclusion while testing logical decoding functions x pg_recvlogical. I'll revert @rcoup patch.

@rcoup
Copy link
Contributor

rcoup commented Aug 27, 2018

@anarazel wow. Is that documented anywhere in PG?

@eulerto
Copy link
Owner

eulerto commented Aug 27, 2018

In the logical replication protocol, function logicalrep_write_tuple (src/backend/replication/logical/proto.c). I'm not sure if "TOAST storage in the WAL" is documented but you can take a look at heapam.c & related files.

@anarazel
Copy link

anarazel commented Aug 28, 2018 via email

eulerto pushed a commit that referenced this issue Aug 29, 2018
Per discussion in issue #74, we can't rely on access unchanged TOAST
data because it is not in the WAL stream.

It reverts commits 947043e
ce82d73 and
d86a13b .
eulerto pushed a commit that referenced this issue Aug 29, 2018
Per issue #74, parameter include-unchanged-toast wasn't safe, hence,
remove it. Since it is part of release 1.0, deprecate this option for
the next release.
@eulerto eulerto closed this as completed Aug 29, 2018
@gburanov
Copy link

@anarazel Hi. Sorry for joining the conversation, but I am working on another logical replication plugin for postgres. I am not specialist in Postgres at all, so maybe I miss some very basic point.

I have the same issue - when I try to get the datum that is internal on disk (here is code)

        struct varlena *s = (struct varlena *)DatumGetPointer(origval);
        struct varlena * ret = heap_tuple_untoast_attr(s);
        Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));

it fails with no known snapshots (from heap_tuple_untoast_attr). My question is why is it that "You cannot reliably access unchanged toast data". I understand that it is not in WAL stream, but I thought WAL library is loaded as a shared library (and therefore got access to the same things Postgres got access to).

So, even theoretically it is not possible to the the varlena on disk from logical replication plugin?

@anarazel
Copy link

@gburanov You also asked via email. I replied there https://www.postgresql.org/message-id/20180831143637.lbxhimikooc4uk7n@alap3.anarazel.de

@qris
Copy link

qris commented Oct 15, 2018

I am not convinced that 66d836d is the correct solution to this problem. I am seeing this issue in 645ab69, which dates from before ce82d73 was applied, so the issue appears to be older than that.

It seems to me that:

  • The unsafe include-unchanged-toast option was always on before (because there was no way to turn it off);
  • Commit ce82d73 made it possible to turn it off (the commit comment is misleading here);
  • Commit 66d836d reverted this, so the option is now always on again.

The following code was in the initial commit (821a54a) and I strongly suspect that this code is wrong:

           /* XXX Unchanged TOAST Datum does not need to be output */
           if (!isnull && typisvarlena && VARATT_IS_EXTERNAL_ONDISK(origval))
           {
                   elog(WARNING, "column \"%s\" has an unchanged TOAST", NameStr(attr->attname));
                   continue;
           }

Please could someone check if it's this code that needs to be removed instead?

@eulerto
Copy link
Owner

eulerto commented Oct 15, 2018

@qris The code is correct. See continue inside the if [1].

[1] https://github.com/eulerto/wal2json/blob/master/wal2json.c#L577

@itscaro
Copy link

itscaro commented Dec 13, 2018

@eulerto can you release a new tag please?

alagalah pushed a commit to streamsets/wal2json that referenced this issue Mar 7, 2019
Per discussion in issue eulerto#74, we can't rely on access unchanged TOAST
data because it is not in the WAL stream.

It reverts commits 947043e
ce82d73 and
d86a13b .
alagalah pushed a commit to streamsets/wal2json that referenced this issue Mar 7, 2019
Per issue eulerto#74, parameter include-unchanged-toast wasn't safe, hence,
remove it. Since it is part of release 1.0, deprecate this option for
the next release.
@ghost
Copy link

ghost commented Apr 26, 2019

We are facing the same issue while using wal2json plugin with AWS Aurora PostgreSQL 10.6 version. I am getting " psycopg2.InternalError: no known snapshots " error while reading replication message asynchronously from replication slot. AWS Aurora PostgreSQL documentation says that "The logical decoding plugin wal2json has been updated to commit 9e962ba". @eulerto Could you please confirm whether #74 is resolved and if yes then what is the commit id for the change ?

@eulerto
Copy link
Owner

eulerto commented Apr 26, 2019

@manerohit388 Yes, it is. commit 66d836d.

eulerto referenced this issue May 10, 2019
This option defines which format to use. Only version 1 is currently
supported. New formats will be available.
@Iskz
Copy link

Iskz commented May 12, 2020

Referring back to this issue, we are hitting this error despite using a newer version of wal2json, and this is installed via rpm below.

https://centos.pkgs.org/6/postgresql-12-x86_64/wal2json12-2.2-1.rhel6.x86_64.rpm.html

We are currently using postgresql 9.6 with Latin1 database on centos 6 connected with debezium connector and approximately 12mil rows of data.(just a single table) issue happens after few minutes of streaming.

Reference:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/debezium/74TqJi62DJc

@eulerto
Copy link
Owner

eulerto commented May 13, 2020

@lskz please open another issue and provide a test case.

@Iskz
Copy link

Iskz commented May 13, 2020

@eulerto unfortunately i'm unable to provide the exact steps to replicate this issue, as it is only happening on our production database whereby the load is higher. Perhaps you can provide some suggestion on some of the steps that we can take or areas we can look into to understand why we're hitting a similar issue.

alejandrosanchezcabana pushed a commit to streamsets/wal2json that referenced this issue Dec 9, 2021
Per issue eulerto#74, parameter include-unchanged-toast wasn't safe, hence,
remove it. Since it is part of release 1.0, deprecate this option for
the next release.
@mertant
Copy link

mertant commented Mar 15, 2024

You cannot reliably access unchanged toast data. It's not in the WAL stream, and therefore logical decoding cannot access it safely.

How is "unchanged toast data" defined? I mean is the toast considered to be unchanged when (A) the actual "after" value is identical to the "before" value, or when (B) no toast column is included in the UPDATE statement.

I'm asking because, in case it turns out to be (B), then one possible way to force "practically unchanged" toast data to go to the WAL stream would be to do UPDATE SET my_toastable_column = my_toastable_column. Which, if so, would be one workaround to this issue worth mentioning, even if a bit hacky.

(I do know that performing UPDATE SET my_column = my_column on regular non-toasted columns can trigger the tuple to be outputted by wal2json, as I've used that for manual forcing of "replays". But I haven't tested it on toasted columns.)

@anarazel
Copy link

@mertant

How is "unchanged toast data" defined? I mean is the toast considered to be unchanged when (A) the actual "after" value is identical to the "before" value, or when (B) no toast column is included in the UPDATE statement.

The former.

(I do know that performing UPDATE SET my_column = my_column on regular non-toasted columns can trigger the tuple to be outputted by wal2json, as I've used that for manual forcing of "replays". But I haven't tested it on toasted columns.)

An "no-change" update still has semantic meaning, because it requires the row to be locked etc. We can't just elide the update without violatng transactional semantics. That's not the case eliding toast changes, as row locking / visibility happens on a per-row, not a per-field basis.

If you need toasted columns included, you need to use REPLICA IDENTITY FULL.

@mertant
Copy link

mertant commented Mar 16, 2024

That's not the case eliding toast changes, as row locking / visibility happens on a per-row, not a per-field basis.

I see. Well, it was a bit of a long shot, but still good to have it explicitly ruled out. Thank you for the answer, @anarazel .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests