-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to output numeric data types as string. #255
Add option to output numeric data types as string. #255
Conversation
I tested this option Before this changeStart $ psql $PGCOPYDB_SOURCE_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 12.12 (Ubuntu 12.12-1.pgdg20.04+1))
postgres=# select * from table_integer;
a | b | c | d
----+--------+-------------+----------------------
9 | 32767 | 2147483647 | 9223372036854775807
10 | -32768 | -2147483648 | -9223372036854775808
(2 rows)
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
(5 rows)
postgres=# INSERT INTO table_integer (b, c, d) VALUES(32767, 2147483647, 9223372036854775807);
INSERT 0 1
postgres=# INSERT INTO table_integer (b, c, d) VALUES(-32768, -2147483648, -9223372036854775808);
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b) VALUES('Infinity', 'Infinity');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b) VALUES('-Infinity', '-Infinity');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES('NaN', 'NaN', 'NaN');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES(123.456, 123456789.012345, 1234567890987654321.1234567890987654321);
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES(-123.456, -123456789.012345, -1234567890987654321.1234567890987654321);
INSERT 0 1 After initial load has finished (in other window) and before apply has started, check the target. This should reflect the inital load data, $ psql $PGCOPYDB_TARGET_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 13.8 (Ubuntu 13.8-1.pgdg20.04+1))
postgres=# select * from table_integer;
a | b | c | d
----+--------+-------------+----------------------
9 | 32767 | 2147483647 | 9223372036854775807
10 | -32768 | -2147483648 | -9223372036854775808
(2 rows)
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
(5 rows) Set shdhama@shdhama:~
$ $PGCOPYDB/src/bin/pgcopydb/pgcopydb stream sentinel set endpos --current -v
13:21:07 28812 INFO Running pgcopydb version 0.9.13.g897182b.dirty from "/home/shdhama/pg/pgcopydb/src/bin/pgcopydb/pgcopydb"
13:21:07 28812 NOTICE [SOURCE] BEGIN;
13:21:07 28812 NOTICE [SOURCE] select current_setting('server_version'), current_setting('server_version_num')::integer;
13:21:07 28812 NOTICE [SOURCE] Postgres version 12.12 (Ubuntu 12.12-1.pgdg20.04+1) (120012)
13:21:07 28812 NOTICE [SOURCE] update pgcopydb.sentinel set endpos = pg_current_wal_flush_lsn();
13:21:07 28812 NOTICE [SOURCE] COMMIT;
13:21:07 28812 NOTICE [SOURCE] select startpos, endpos, apply, write_lsn, flush_lsn, replay_lsn from pgcopydb.sentinel;
13:21:07 28812 INFO pgcopydb sentinel endpos has been set to 6/4E092878
6/4E092878 Now again check target, shdhama@shdhama:~
$ psql $PGCOPYDB_TARGET_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 13.8 (Ubuntu 13.8-1.pgdg20.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
| |
| |
| |
123.456 | 123457000 | 1234570000000000000
-123.456 | -123457000 | -1234570000000000000
(10 rows)
postgres=# select * from table_integer;
a | b | c | d
----+--------+-------------+----------------------
9 | 32767 | 2147483647 | 9223372036854775807
10 | -32768 | -2147483648 | -9223372036854775808
11 | 32767 | 2147480000 | 9223370000000000000
12 | -32768 | -2147480000 | -9223370000000000000
(4 rows) In AfterStart $ psql $PGCOPYDB_SOURCE_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 12.12 (Ubuntu 12.12-1.pgdg20.04+1))
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
(5 rows)
postgres=# select * from table_integer;
a | b | c | d
---+--------+-------------+----------------------
5 | 32767 | 2147483647 | 9223372036854775807
6 | -32768 | -2147483648 | -9223372036854775808
(2 rows)
postgres=# INSERT INTO table_integer (b, c, d) VALUES(32767, 2147483647, 9223372036854775807);
INSERT 0 1
postgres=# INSERT INTO table_integer (b, c, d) VALUES(-32768, -2147483648, -9223372036854775808);
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b) VALUES('Infinity', 'Infinity');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b) VALUES('-Infinity', '-Infinity');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES('NaN', 'NaN', 'NaN');
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES(123.456, 123456789.012345, 1234567890987654321.1234567890987654321);
INSERT 0 1
postgres=# INSERT INTO table_decimal (a, b, c) VALUES(-123.456, -123456789.012345, -1234567890987654321.1234567890987654321);
INSERT 0 1 After initial load has finished (in other window) and before apply has started, check the target. This should reflect the inital load data, shdhama@shdhama:~
$ psql $PGCOPYDB_TARGET_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 13.8 (Ubuntu 13.8-1.pgdg20.04+1))
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
(5 rows)
postgres=# select * from table_integer;
a | b | c | d
---+--------+-------------+----------------------
5 | 32767 | 2147483647 | 9223372036854775807
6 | -32768 | -2147483648 | -9223372036854775808
(2 rows)
Set shdhama@shdhama:~
$ $PGCOPYDB/src/bin/pgcopydb/pgcopydb stream sentinel set endpos --current -v
13:14:41 28041 INFO Running pgcopydb version 0.9.13.g897182b.dirty from "/home/shdhama/pg/pgcopydb/src/bin/pgcopydb/pgcopydb"
13:14:41 28041 INFO pgcopydb sentinel endpos has been set to 6/4E072488
6/4E072488 Now again check target, shdhama@shdhama:~
$ psql $PGCOPYDB_TARGET_PGURI
psql (14.5 (Ubuntu 14.5-2.pgdg20.04+2), server 13.8 (Ubuntu 13.8-1.pgdg20.04+1))
postgres=# select * from table_decimal ;
a | b | c
-----------+-------------------+------------------------------------------
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
Infinity | Infinity |
-Infinity | -Infinity |
NaN | NaN | NaN
123.456 | 123456789.012345 | 1234567890987654321.1234567890987654321
-123.456 | -123456789.012345 | -1234567890987654321.1234567890987654321
(10 rows)
postgres=# select * from table_integer;
a | b | c | d
---+--------+-------------+----------------------
5 | 32767 | 2147483647 | 9223372036854775807
6 | -32768 | -2147483648 | -9223372036854775808
7 | 32767 | 2147483647 | 9223372036854775807
8 | -32768 | -2147483648 | -9223372036854775808
(4 rows)
|
Based on the discussing we have in #245 , this PR needs some adjustments. It should cover all numeric data types. Use only one test file. There are some unrelated changes (blank spaces), remove them. A good name for the new parameter is
|
Data types like `numeric`, `real`, `double precision` supports `Infinity`, `-Infinity` and `NaN` values. Currently these values output as `null` because JSON specification does not recognize them as valid numeric values. This will create problems for the users of wal2json who need these values to maintain data integerity.
1a25952
to
94254f8
Compare
@eulerto I have updated the PR, please review.
I think the current switch case covers all numeric data types. Please correct me if I'm wrong.
Fixed. |
hi @eulerto, gentle ping on this. Thanks! |
i tsvector | ||
); | ||
|
||
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'wal2json'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add illustrations/outputs of these queries with and without using the option 'numeric-data-types-as-string'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
wal2json.c
Outdated
@@ -1257,8 +1272,9 @@ tuple_to_stringinfo(LogicalDecodingContext *ctx, TupleDesc tupdesc, HeapTuple tu | |||
* Data types are printed with quotes unless they are number, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sql/numeric_data_types_as_string.sql
Outdated
COMMIT; | ||
|
||
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '1', 'pretty-print', '1', 'numeric-data-types-as-string', '1'); | ||
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '2', 'pretty-print', '1', 'numeric-data-types-as-string', '1'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option pretty-print
has no effect for v2. Remove it. You should duplicate both pg_logical_slot_peek_changes
to provide the output without this option. It is also recommended to drop the tables you created at the end.
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '1', 'pretty-print', '1', 'numeric-data-types-as-string', '1');
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '1', 'pretty-print', '1');
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '2', 'numeric-data-types-as-string', '1');
SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'format-version', '2');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, done.
f4e89d4
to
cc73952
Compare
I have a few additional comments:
|
bc26a84
to
a54d252
Compare
@eulerto thank you for the review, I've addressed all the comments. |
a54d252
to
b598c1e
Compare
Add option to output numeric data types as string. (eulerto#255)
This fix is not available in the |
No. It will be in the next release. |
@eulerto Any timeline of when the next release will be? |
This change adds a new option `--wal2json-numeric-as-string` that changes wal2json plugin output format to print numeric data types as strings. This is accomplished by passing the `--numeric-data-types-as-string` option to wal2json plugin. This is useful to prevent precision loss when using wal2json plugin to stream changes from a database that uses numeric data types. wal2json plugin version that supports `--numeric-data-types-as-string` option is required to use this pgcopydb option. As of today there is no official wal2json release that supports this option, but it is available on master branch of the project. Relevant changes in wal2json plugin is at eulerto/wal2json#255
* Add option to output numeric as string on wal2json This change adds a new option `--wal2json-numeric-as-string` that changes wal2json plugin output format to print numeric data types as strings. This is accomplished by passing the `--numeric-data-types-as-string` option to wal2json plugin. This is useful to prevent precision loss when using wal2json plugin to stream changes from a database that uses numeric data types. wal2json plugin version that supports `--numeric-data-types-as-string` option is required to use this pgcopydb option. As of today there is no official wal2json release that supports this option, but it is available on master branch of the project. Relevant changes in wal2json plugin is at eulerto/wal2json#255 * Add env to output numeric as string on wal2json PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING can be set to a boolean value that will be used to determine if pgcopydb should set the wal2json option `--numeric-data-types-as-string`. In passing, also add the PGCOPYDB_OUTPUT_PLUGIN env variable to all relevant pages of our documentation.
@eulerto Any timeline of when the next release will be. Most of the customers want to install wal2json from the Linux repositories for their production environment workloads and it is a blocker for them |
@eulerto - I would really appreciate if you could share the timeline for the upcoming release that includes this fix as many customers experience it and is long awaited. |
New version will be released after #273 is fixed. |
Thank you for the update! Do you have any ETA? |
It took some time but the new version was released including this feature. Enjoy! |
Data types like
numeric
,real
,double precision
supportsInfinity
,-Infinity
andNaN
values. Currently, these values output asnull
because JSON specification does not recognize them as valid numeric values. This will create problems for the users of wal2json who need these values to maintain data integrity.Tests
Added tests and tested against Postgres 9.6, 10, 11, 12, 13, and 14.
Fixes: #245