Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition_data_*() not work when table PK defined with GENERATED ALWAYS #724

Open
waitingsong opened this issue Dec 22, 2024 · 3 comments
Open

Comments

@waitingsong
Copy link

waitingsong commented Dec 22, 2024

I use pgmq with pg_parten,
got error when calling partition_data_time()

cannot insert a non-DEFAULT value into column "msg_id"
DETAIL:  Column "msg_id" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.
CONTEXT:  SQL statement "WITH partition_data AS (
                DELETE FROM partman_temp_data_storage RETURNING *)
            INSERT INTO pgmq.q_abc0ba_p20241221_153200 (msg_id,read_ct,enqueued_at,vt,message,headers) SELECT msg_id,read_ct,enqueued_at,vt,message,headers FROM partition_data"
PL/pgSQL function partition_data_time(text,integer,interval,numeric,text,boolean,text,text[]) line 245 at EXECUTE

Cause Column "msg_id" is an identity column defined as GENERATED ALWAYS:

CREATE TABLE "pgmq"."Untitled" (
  "msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
  "read_ct" int4 NOT NULL DEFAULT 0,
  "enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
  "vt" timestamptz(6) NOT NULL,
  "message" jsonb,
  "headers" jsonb
)
PARTITION BY RANGE (
  "enqueued_at" "pg_catalog"."timestamptz_ops"
)
;

I committed a PR to handle this situation : #723

@keithf4
Copy link
Collaborator

keithf4 commented Dec 30, 2024

So I'm not able to re-produced this issue when using the p_ignored_columns option. Reading the description of OVERRIDING SYSTEM VALUE, you should only be running into that error when you're actually trying to insert values into that column. The p_ignored_columns option should completely remove that column from any insert statements so that override shouldn't be necessary.
Or are you trying to intentionally override those values? If so, that's a completely different issue and properly handling that would be different than the PR you provided. Your PR makes it so it always overrides no matter what if there's an identity. My assumption when someone has an identity set to GENERATE ALWAYS for any table (not just partitioned) is that no user should be inserting any values into that table without explicitly making that clear. So I would not want that to be the default in partman.

Below is a similar example to what you've given (I just make it all lower case and only set the NOT NULL columns to be the relevant ones to make an easier test case). It works fine ignoring the column and allows the partitioned table to generate the ID columns automatically.

If you can provide me a working example of where you're running into an error, maybe we can figure out a better solution to allow users to override identities if desired.

keith@github724=# CREATE TABLE "pgmq"."untitled" (
  "msg_id" int8 NOT NULL GENERATED ALWAYS AS IDENTITY , -- <----- here
  "read_ct" int4 DEFAULT 0,
  "enqueued_at" timestamptz(6) NOT NULL DEFAULT now(),
  "vt" timestamptz(6),
  "message" jsonb,
  "headers" jsonb
)
PARTITION BY RANGE (
  "enqueued_at" "pg_catalog"."timestamptz_ops"
)
;
CREATE TABLE
Time: 6.201 ms

keith@github724=# create table pgmq.source_untitled (like pgmq.untitled);
CREATE TABLE
Time: 14.238 ms

keith@github724=# alter table pgmq.source_untitled alter msg_id add generated always as identity;
ALTER TABLE
Time: 14.269 ms

keith@github724=# select partman.create_parent('pgmq.untitled', 'enqueued_at', '1 day');
 create_parent 
---------------
 t
(1 row)

Time: 35.883 ms
keith@github724=# \d+ pgmq.untitled
                                                           Partitioned table "pgmq.untitled"
   Column    |            Type             | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description 
-------------+-----------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 msg_id      | bigint                      |           | not null | generated always as identity | plain    |             |              | 
 read_ct     | integer                     |           |          | 0                            | plain    |             |              | 
 enqueued_at | timestamp(6) with time zone |           | not null | now()                        | plain    |             |              | 
 vt          | timestamp(6) with time zone |           |          |                              | plain    |             |              | 
 message     | jsonb                       |           |          |                              | extended |             |              | 
 headers     | jsonb                       |           |          |                              | extended |             |              | 
Partition key: RANGE (enqueued_at)
Partitions: pgmq.untitled_p20241226 FOR VALUES FROM ('2024-12-26 00:00:00-05') TO ('2024-12-27 00:00:00-05'),
            pgmq.untitled_p20241227 FOR VALUES FROM ('2024-12-27 00:00:00-05') TO ('2024-12-28 00:00:00-05'),
            pgmq.untitled_p20241228 FOR VALUES FROM ('2024-12-28 00:00:00-05') TO ('2024-12-29 00:00:00-05'),
            pgmq.untitled_p20241229 FOR VALUES FROM ('2024-12-29 00:00:00-05') TO ('2024-12-30 00:00:00-05'),
            pgmq.untitled_p20241230 FOR VALUES FROM ('2024-12-30 00:00:00-05') TO ('2024-12-31 00:00:00-05'),
            pgmq.untitled_p20241231 FOR VALUES FROM ('2024-12-31 00:00:00-05') TO ('2025-01-01 00:00:00-05'),
            pgmq.untitled_p20250101 FOR VALUES FROM ('2025-01-01 00:00:00-05') TO ('2025-01-02 00:00:00-05'),
            pgmq.untitled_p20250102 FOR VALUES FROM ('2025-01-02 00:00:00-05') TO ('2025-01-03 00:00:00-05'),
            pgmq.untitled_p20250103 FOR VALUES FROM ('2025-01-03 00:00:00-05') TO ('2025-01-04 00:00:00-05'),
            pgmq.untitled_default DEFAULT

keith@github724=# insert into pgmq.source_untitled (enqueued_at) values (generate_series('2024-12-26'::timestamptz, '2025-01-02'::timestamptz, '1 hour'::interval));
INSERT 0 169
Time: 18.768 ms

keith@github724=# call partman.partition_data_proc('pgmq.untitled', p_source_table := 'pgmq.source_untitled', p_ignored_columns := ARRAY['msg_id']);              
NOTICE:  Loop: 1, Rows moved: 24
NOTICE:  Loop: 2, Rows moved: 24
NOTICE:  Loop: 3, Rows moved: 24
NOTICE:  Loop: 4, Rows moved: 24
NOTICE:  Loop: 5, Rows moved: 24
NOTICE:  Loop: 6, Rows moved: 24
NOTICE:  Loop: 7, Rows moved: 24
NOTICE:  Loop: 8, Rows moved: 1
NOTICE:  Total rows moved: 169
NOTICE:  Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
CALL
Time: 8118.288 ms (00:08.118)

keith@github724=# select * from pgmq.untitled;
 msg_id | read_ct |      enqueued_at       |   vt   | message | headers 
--------+---------+------------------------+--------+---------+---------
      1 |  «NULL» | 2024-12-26 00:00:00-05 | «NULL» | «NULL»  | «NULL»
      2 |  «NULL» | 2024-12-26 01:00:00-05 | «NULL» | «NULL»  | «NULL»
      3 |  «NULL» | 2024-12-26 02:00:00-05 | «NULL» | «NULL»  | «NULL»
      4 |  «NULL» | 2024-12-26 03:00:00-05 | «NULL» | «NULL»  | «NULL»
      5 |  «NULL» | 2024-12-26 04:00:00-05 | «NULL» | «NULL»  | «NULL»
      6 |  «NULL» | 2024-12-26 05:00:00-05 | «NULL» | «NULL»  | «NULL»
      7 |  «NULL» | 2024-12-26 06:00:00-05 | «NULL» | «NULL»  | «NULL»
      8 |  «NULL» | 2024-12-26 07:00:00-05 | «NULL» | «NULL»  | «NULL»
      9 |  «NULL» | 2024-12-26 08:00:00-05 | «NULL» | «NULL»  | «NULL»
     10 |  «NULL» | 2024-12-26 09:00:00-05 | «NULL» | «NULL»  | «NULL»
[...]

@keithf4
Copy link
Collaborator

keithf4 commented Dec 31, 2024

Created some new unit tests for version 5.2.4 to test the bug I fixed there. I also had those test use GENERATED IDENTITY columns, one even using it for the partition key, and all seems to be working fine as long as I set p_ignored_columns.

#727

So just let me know if you're still having issues even when using the ignored columns options

@keithf4
Copy link
Collaborator

keithf4 commented Dec 31, 2024

Actually, I think I just found the issue. I was testing this in PG17, but apparently the ability for partitioned tables to properly honor IDENTITY was not added until 17!

https://www.postgresql.org/docs/17/release-17.html

Allow partitioned tables to have identity columns (Ashutosh Bapat) 

I got this error when I tested in PG16.

keith@keith=# SELECT partman.partition_data_id('partman_test.id_taptest_table', '20', p_source_table := 'partman_source.id_taptest_table_source', p_ignored_columns := ARRAY['col1']);
ERROR:  null value in column "col1" of relation "id_taptest_table_p0" violates not-null constraint
DETAIL:  Failing row contains (null, stuff, 2024-12-31 17:49:03.486479-05, stuff3000000001).
CONTEXT:  SQL statement "WITH partition_data AS (
            DELETE FROM ONLY partman_source.id_taptest_table_source WHERE col1 >= 0 AND col1 < 10 RETURNING *)
        INSERT INTO partman_test.id_taptest_table_p0 (col2,col3,col4) SELECT col2,col3,col4 FROM partition_data"
PL/pgSQL function partman.partition_data_id(text,integer,bigint,numeric,text,boolean,text,text[]) line 215 at EXECUTE
Time: 15.045 ms

So, this is actually an issue that has been fixed properly in core as of PG17. I don't think I want to allow an override option to get around this for now. The recommended fix would be to upgrade to PostgreSQL 17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants